Multi-Agent Deep Reinforcement Learning for Multi-Echelon Inventory Management

强化学习钢筋库存管理业务人工智能计算机科学运营管理工程类心理学社会心理学

作者

Xiaotian Liu,Ming Hu,Chunyi Peng,Yaodong Yang

出处

期刊：Social Science Research Network [Social Science Electronic Publishing]
日期：2022-01-01 被引量：9

标识

摘要

We apply Multi-Agent Deep Reinforcement Learning (MADRL) to inventory management problems with multiple echelons and evaluate MADRL's performance to minimize the overall costs of a supply chain. We also examine whether the upfront-only information-sharing mechanism used in MADRL helps alleviate the bullwhip effect in a supply chain. We apply Heterogeneous-Agent Proximal Policy Optimization (HAPPO) on the multi-echelon inventory management problems in both a serial supply chain and a supply chain network. Our results show that policies constructed by HAPPO achieve lower overall costs than policies constructed by single-agent deep reinforcement learning and other heuristic policies. Also, the application of HAPPO results in a less significant bullwhip effect than policies constructed by single-agent deep reinforcement learning where information is not shared among actors. Somewhat surprisingly, when applying HAPPO, the system achieves the lowest overall costs when the minimization target for each actor is a combination of its own costs and the overall costs of the system, and the fully self-interested reward target performs near-optimally, while one would expect using the overall costs of the system as a reward target for each actor would be optimal in training the models. Our results provide a new perspective on the benefit of information sharing inside the supply chain that helps alleviate the bullwhip effect and improve the overall performance of the system. Upfront information sharing and action coordination in model training among actors are essential, with the former more essential, for improving a supply chain's overall performance when applying MADRL. Neither actors being fully self-interested nor actors being fully system-focused leads to the optimal performance of policies learned and constructed by MADRL. Our results also verify MADRL's potential in solving various multi-echelon inventory management problems with complex supply chain structures and in non-stationary market environments.

求助该文献

最长约 10秒，即可获得该文献文件

Multi-Agent Deep Reinforcement Learning for Multi-Echelon Inventory Management

今日热心研友