计算机科学
强化学习
机器人
调度(生产过程)
任务(项目管理)
熵(时间箭头)
时差学习
人工智能
算法
数学优化
数学
量子力学
物理
经济
管理
作者
Hengliang Tang,Anqi Wang,Fei Xue,Jiaxin Yang,Yang Cao
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2021-01-01
卷期号:9: 42568-42582
被引量:30
标识
DOI:10.1109/access.2021.3062457
摘要
In intelligent unmanned warehouse goods-to-man systems, the allocation of tasks has an important influence on the efficiency because of the dynamic performance of AGV robots and orders. The paper presents a hierarchical Soft Actor-Critic algorithm to solve the dynamic scheduling problem of orders picking. The method proposed is based on the classic Soft Actor-Critic and hierarchical reinforcement learning algorithm. In this paper, the model is trained at different time scales by introducing sub-goals, with the top-level learning a policy and the bottom level learning a policy to achieve the sub-goals. The actor of the controller aims to maximize expected intrinsic reward while also maximizing entropy. That is, to succeed at the sub-goals while moving as randomly as possible. Finally, experimental results for simulation experiments in different scenes show that the method can make multi-logistics AGV robots work together and improves the reward in sparse environments about 2.61 times compared to the SAC algorithm.
科研通智能强力驱动
Strongly Powered by AbleSci AI