Deep Policy Iteration with Integer Programming for Inventory Management

计算机科学数学优化启发式库存控制水准点（测量）强化学习运筹学人工智能数学大地测量学地理

作者

Pavithra Harsha,Ashish Jagmohan,Jayant Kalagnanam,Brian Quanz,Divya Singhvi

出处

期刊：Manufacturing & Service Operations Management [Institute for Operations Research and the Management Sciences]
日期：2025-01-06 被引量：1

链接

arxiv.org arxiv.orgdoi.org

标识

DOI：10.1287/msom.2022.0617

摘要

Problem definition: In this paper, we present a reinforcement learning (RL)-based framework for optimizing long-term discounted reward problems with large combinatorial action space and state dependent constraints. These characteristics are common to many operations management problems, for example, network inventory replenishment, where managers have to deal with uncertain demand, lost sales, and capacity constraints that results in more complex feasible action spaces. Our proposed programmable actor RL (PARL) uses a deep-policy iteration method that leverages neural networks to approximate the value function and combines it with mathematical programming and sample average approximation to solve the per-step-action optimally while accounting for combinatorial action spaces and state-dependent constraint sets. Methodology/results: We then show how the proposed methodology can be applied to complex inventory replenishment problems where analytical solutions are intractable. We also benchmark the proposed algorithm against state-of-the-art RL algorithms and commonly used replenishment heuristics and find that the proposed algorithm considerably outperforms existing methods by as much as 14.7% on average in various complex supply chain settings. Managerial implications: We find that this improvement in performance of PARL over benchmark algorithms can be directly attributed to better inventory cost management, especially in inventory constrained settings. Furthermore, in the simpler setting where optimal replenishment policy is tractable or known near optimal heuristics exist, we find that the RL-based policies can learn near optimal policies. Finally, to make RL algorithms more accessible for inventory management researchers, we also discuss the development of a modular Python library that can be used to test the performance of RL algorithms with various supply chain structures. This library can spur future research in developing practical and near-optimal algorithms for inventory management problems. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2022.0617 .

求助该文献

最长约 10秒，即可获得该文献文件

Deep Policy Iteration with Integer Programming for Inventory Management

今日热心研友