Intelligent Decision-making Academic Forum in the AI Era

图片

报告题目: LLM Alignment Techniques: Stochastic Optimizations in LLM Post-training and Reasoning.

报 告 人: Xi Chen (纽约大学)

报告时间: 2025年10月19日 8:40-9:25

摘要: This talk explores approaches to improving large language model (LLM) post-training and reasoning through stochastic optimization techniques. The first part introduces ComPO, a preference alignment method using comparison oracles in stochastic optimization. The work addresses likelihood displacement issues in traditional direct preference optimization. The second part proposes the spectral policy optimization, a framework that overcomes GRPO's limitations with all-negative-sample groups by introducing response diversity with AI feedback. Both approaches demonstrate significant improvements across various model sizes and benchmarks, representing important advances in LLM post-training via stochastic optimization.

个人简介: Professor Xi Chen is a Professor and the Andre Meyer Faculty Fellow at New York University's Stern School of Business. Additionally, he is an affiliated faculty member at the Courant Institute of Mathematical Sciences and the Center for Data Science at NYU.

Professor Chen's work extensively explores machine learning and its applications in operations management and quantitative fields, including digital advertising, dynamic pricing, online recommendations, and quantitative finance. Additionally, his research extends into blockchain technology, examining mechanism design, tokenomics, and decentralized finance within this domain. He has authored nearly 100 papers featured in top journals and peer-reviewed conferences. Furthermore, Prof. Chen is a co-editor of “The Elements of Joint Learning and Optimization in Operations Management,” published by Springer. He is also preparing to release his book, “Web3: Blockchain, the New Economy, and the Self-Sovereign Internet,” under Cambridge University Press.






报告题目: Auto-Formulating Dynamic Programming Problems with Large Language Models

报 告 人: Linwei Xin(康奈尔大学)

报告时间: 2025年10月19日 9:25-10:10

摘要: Dynamic programming (DP) is a fundamental method in operations research, but formulating DP models has traditionally required expert knowledge of both the problem context and DP techniques. Large Language Models (LLMs) offer the potential to automate this process. However, DP problems pose unique challenges due to their inherently stochastic transitions and the limited availability of training data. These factors make it difficult to directly apply existing LLM-based models or frameworks developed for other optimization problems, such as linear or integer programming. We introduce DP-Bench, the first benchmark covering a wide range of textbook-level DP problems to enable systematic evaluation. We present Dynamic Programming Language Model (DPLM), a 7B-parameter specialized model that achieves performance comparable to state-of-the-art LLMs like OpenAI's o1 and DeepSeek-R1, and surpasses them on hard problems. Central to DPLM's effectiveness is DualReflect, our novel synthetic data generation pipeline, designed to scale up training data from a limited set of initial examples. DualReflect combines forward generation for diversity and backward generation for reliability. Our results reveal a key insight: backward generation is favored in low-data regimes for its strong correctness guarantees, while forward generation, though lacking such guarantees, becomes increasingly valuable at scale for introducing diverse formulations. This trade-off highlights the complementary strengths of both approaches and the importance of combining them. The paper is available: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=535016

个人简介: Linwei Xin is an Associate Professor in the School of Operations Research and Information Engineering (ORIE) at Cornell University. Prior to Cornell, he was an Associate Professor of Operations Management at the University of Chicago Booth School of Business. He specializes in inventory and supply chain management, where he designs cutting-edge models and algorithms that enable organizations to effectively balance supply and demand in various contexts with uncertainty. Xin's research using asymptotic analysis to study stochastic inventory theory is renowned and has been recognized with several prestigious INFORMS paper competition awards, including First Place in the George E. Nicholson Student Paper Competition in 2015 and the Applied Probability Society Best Publication Award in 2019. Xin's recent interest focuses on AI for supply chains, driven by labor shortages, global supply chain disruptions, e-commerce growth, and environmental sustainability. He leverages various tools such as deep learning, LLMs, optimization and probability theory, to address emerging challenges arising from AI-driven automation. His work targets problems in inventory management, robotics and automation in modern warehousing, dual-sourcing, real-time order fulfillment, omnichannel, transportation network design, and solar grazing.



报告题目: Adaptive Semi-Supervised Inference and Statistical Learning for Quantile Regression

报 告 人: 周勇  (华东师范大学)

报告时间: 2025年10月19日 10:25-11:10

摘要: This paper presents a general framework for semi-supervised inference, emphasizing optimality, adaptivity, and debias calibration to enhance robustness. The data are collected from multiple groups, with a small portion having fully labeled observations, while the rest have only unlabeled covariates, which may be either complete or subject to block-wise missingness. The empirical risk function is modeled semiparametrically using an index model with debiasing techniques to maintain model exibility and dimension reduction. A family of two-step imputation-based semi-supervised estimators is proposed, demonstrating improved efficiency compared to their supervised counterparts, particularly under model misspecification  (adaptive), leading to powered inference. Specifically, the proposed estimators are proven to achieve semiparametric variance lower bounds (optimal) when the index model is correctly specified. A perturbation resampling procedure is devised for variance estimation. The finite sample performance is evaluated through extensive simulation studies and applications to a financial credit dataset. Although quantile risk minimization is the primary focus of this paper, the proposed methods can be readily adapted for various empirical risk minimization problems involving data with semi-supervised block-wise missing structures.

个人简介: 周勇教授,国家杰出青年基金获得者,教育部长江学者特聘教授,中国科学院百人计划入选者,国务院政府特殊津贴专家,“新世纪百千万人才工程”国家级人选,国际数理统计学会(IMS)会士。华东师范大学经管学部教授,统计学院院长,统计交叉科学研究院院长。 曾任国务院学位委员会第七届统计学科评议组成员,教育部应用统计专业硕士教学指导委员会委员,中国优选法统筹法与经济数学研究会副理事长。现任中国管理科学学会常务理事,科技部重点研发计划项目首席科学家。

周勇教授主要从事大数据分析与建模、金融计量、风险管理、计量经济学、统计理论和方法等科学研究工作,取得许多有重要学术价值和影响的研究成果。先后承担并完成国家自然科学基金项目,国家杰出青年基金,自然科学基金委重点项目等科学项目10余项,科技部重点研发计划项目1项(首席科学家),曾获得省部级奖励二项。在包括国际顶级期刊《The Annals of Statistics》、《Journal of The American Statistical Association》,《Biometrika》,《JRSSB》及计量经济学顶刊《Journal of Econometrics》和《Journal of Business & Economic Statistics》《管理科学学报》等学术杂志上发表学术论文近200余篇。



报告题目: Predicting the Matching Probability and Ride/Shared Distance for each Dynamic Ridepooling Order

报 告 人: 王晓蕾 (同济大学)

报告时间: 2025年10月19日 11:10-11:55

摘要: 动态拼车是指平台实时响应乘客出行需求、边服务边寻找拼车对象的服务模式。由于需要在拼车对象尚未出现之前就进行定价和派单决策,能否准确预测每个订单的匹配成功率、预期绕行和预期共乘里程对于动态拼车平台的运营效率至关重要。在每位乘客沿途最多只与一位乘客发生拼车,每个起止点(OD)间的拼车需求服从均值给定的泊松分布的假设下,我们基于对不同OD的拼车需求间复杂的匹配和竞争关系的建模,提出了能够同时预测路网中所有OD间拼车成功率、预期绕行里程和预期共乘里程的预测模型。与大量仿真实验的结果比较显示,该方法不仅能够解释不同OD间订单差异化的拼成潜力的形成机理,而且在不同的匹配条件和需求强度场景下均能取得理想的预测效果。

个人简介:王晓蕾,同济大学经济与管理学院教授。2008年本科毕业于中国科技大学(获郭沫若奖学金),2012年博士毕业于香港科技大学(获HKUST SENG PhD Research Excellence Award)。一直致力于城市交通系统优化领域的研究,主要研究兴趣:共享出行服务运营优化以及共享出行下的城市交通管理。在交通领域主要SCI/SSCI期刊发表论文30余篇,其中16篇发表于INFORMS Journal on Computing、Transportation Research Part B、Transportation Science等运筹、交通领域顶刊,篇均引用80+;主持国家自然科学基金重点、优青、面上和青年项目、CCF-滴滴盖亚青年基金项目,创新群体“综合运输系统运营管理”项目骨干成员;世界交通运输大会共享与预约出行技术委员会主席,管理科学与工程学会交通运输管理分会委员,交通领域主要期刊Transportation Research Part E编委。




报告题目: Learning Robust Decision Rules for Censored and Confounded Data

报 告 人: 崔逸凡  (浙江大学)

报告时间: 2025年10月19日 13:30-14:15

摘要: In this talk, we propose two robust criteria for learning optimal treatment rules with censored survival outcomes. The first one aims to identify a treatment rule that maximizes the restricted mean survival time, where the restriction is specified by a given quantile such as the median; the second one focuses on maximizing buffered survival probabilities, with the threshold adaptively adjusted to account for the restricted mean survival time. Moreover, we develop robust treatment rules that enable reliable policy recommendations when unmeasured confounding is present, using the proximal causal inference framework. Simulation studies and real-world applications demonstrate the superior performance of the proposed methods.

个人简介: 崔逸凡,浙江大学长聘副教授(研究员),博士生导师。北卡罗来纳大学教堂山分校统计与运筹专业博士,曾任宾夕法尼亚大学沃顿商学院博士后研究员、新加坡国立大学统计与数据科学系助理教授。国家级青年人才计划入选者(2021)。




报告题目: A Cardinality-Constrained Approach to Combinatorial Bilevel Congestion Pricing

报 告 人: 郭磊  (华东理工大学)

报告时间: 2025年10月19日 14:15-15:00

摘要: Combinatorial bilevel congestion pricing (CBCP), a variant of the mixed (continuous/discrete) network design problems, seeks to minimize the total travel time experienced by all travelers in a road network, by strategically selecting toll locations and determining toll charges. Conventional wisdom suggests that these problems are intractable since they have to be formulated and solved with a significant number of integer variables. Here, we devise a scalable local algorithm for the CBCP problem that guarantees convergence to an approximate Karush-Kuhn-Tucker point. Our approach is novel in that it eliminates the use of integer variables altogether, instead introducing a cardinality constraint that limits the number of toll locations to a user-specified upper bound. The resulting bilevel program with the cardinality constraint is then transformed into a block-separable, single-level optimization problem that can be solved efficiently after penalization and decomposition. We are able to apply the algorithm to solve, in about 20 minutes, a CBCP instance with up to 3,000 links. To the best of our knowledge, no existing algorithm can solve CBCP problems at such a scale while providing any assurance of convergence.

个人简介: 郭磊,中共党员,华东理工大学商学院教授,博士生导师,国家级青年人才计划入选者。大连理工大学运筹学与控制论博士,上海交通大学、加拿大维多利亚大学博士后。曾获得辽宁省优秀博士学位论文、上海市哲学社会科学优秀成果奖等。研究领域为大规模双层规划模型与方法、数据驱动的决策方法等,在Mathematics of Operations Research、Mathematical Programming、SIAM Journal on Optimization等运筹学相关领域国际顶级期刊发表论文11篇。主持国家自科面上与青年项目3项,省部级项目3项;作为骨干成员参与国家自科重点项目2项。





报告题目: Dual-Sourcing Made Easy: Distributionally Robust Optimization of  Inventory System under Independent Demand

报 告 人: 毕晟 (上海财经大学)

报告时间: 2025年10月19日 15:15-15:45

摘要: We generalize Scarf’s classical min–max newsvendor model from a single-period setting to a multi-period inventory system with independent demand across periods. This extension leverages mean–variance analysis to capture the dynamic effects of lead times, yielding closed-form expressions for the optimal base stock level. As a concrete application, we study a single-product, dual-sourcing system with constant lead times and backlogging. We show that the optimal tailored base–surge (TBS) policy admits a tractable closed-form approximation, with the base stock explicitly calibrated to account for lead-time effects. This provides a simple, distribution-free rule for trading off inventory cost against service level in a dual sourcing system.

Empirical validation using data from a multinational food manufacturer demonstrates the model’s practical advantages. Applying our method to historical demand and sourcing data improves service levels and reduces stockouts compared to traditional approaches, while maintaining cost-effectiveness. The model’s capacity to adapt base stock levels to different lead times and demand conditions proved especially valuable in mitigating the impact of supply chain volatility. These findings confirm the theoretical performance of our approach and highlight its potential as a scalable, cost-effective tool for firms facing lead-time demand uncertainty.

个人简介:上海财经大学信息管理与工程学院副教授,2021年博士毕业于新加坡国立大学商学院分析与运营系,2016年于南京大学获得工业工程学士学位。研究方向主要集中在数据驱动最优化,以及其在供应链管理与收益管理等领域的应用。







报告题目: Optimal Robust Pricing with Minimal Information

报 告 人: 王震  (上海财经大学)

报告时间: 2025年10月19日 15:45-16:15

摘要: We study a pricing problem in which the seller observes only a limited number of noisy or censored purchase probabilities at posted prices, which are insufficient to recover the full demand curve. We introduce a robust, nonparametric model that leverages partial quantile information and mild structural assumptions (e.g., MHR or regularity). The resulting formulation is a nonconvex, infinite-dimensional bilevel optimization problem, for which we develop a unified reduction technique that yields a closed-form characterization of worst-case demand distribution, and design a polynomial-time global algorithm by exploiting the unimodality of the outer objective. Our model delivers strong performance guarantees using only coarse quantile data, often based on as few as one to three price points. Besides, our theoretical analysis provides novel structural bounds on consumer surplus shifting, monopoly profit ratio, and worst-case profit, extending classical economics results such as the Condorelli inequality and deriving new tail bounds under minimal information. Numerical experiments demonstrate that our method consistently outperforms ample average approximation, parametric baselines, and moment-based distributionally robust models in both worst-case and Bayesian settings. Moreover, the framework enables efficient pricing experimentation: three to five trials often suffice to achieve near-optimal pricing. These features make the approach particularly well-suited for cold-start and early-stage pricing in e-commerce and service operations.

个人简介: Wang Zhen is an Assistant Professor in the School of Information Management and Engineering, Shanghai University of Finance and Economics . His research interests include decision-making under uncertainty as well as applications in business, economics, finance, and operations.  His research has been published in POM, NRL, SIMAX, WINE. His work has been recognized with several honors, including winning the 2023,2025 POMS-China Best Student Paper Award and Honorable mentions in the 2024 ISCOM Best Student Paper Award.