强化学习
适应(眼睛)
水准点(测量)
加权
计算机科学
人工智能
医学
物理
大地测量学
光学
放射科
地理
作者
Zhitong Zhao,Ya Zhang,Siying Wang,Fan Zhang,Malu Zhang,Wenyu Chen
标识
DOI:10.1016/j.knosys.2024.111719
摘要
Existing multi-agent reinforcement learning methods employ a paradigm of centralized training with decentralized execution (CTDE) to learn cooperative policy among agents via coordination. However, within continuous destruction conditions, the inclusion of information from dead agents significantly undermines the ability to effectively learn cooperative policies in multi-agent systems. In this paper, we first analyze the bias introduced by dead agents under the CTDE paradigm and how it affects cooperation among agents. Following this, we propose q-learning based downsizing adaptive policy (QDAP) framework for cooperative multi-agent reinforcement learning. QDAP actively discerns relevant values from dead agents and utilizes an innovative approach to convert historical trajectories into weighting factors, thereby aiding remaining active agents in learning more appropriate cooperative policies. Moreover, we extend our proposed framework into the CTDE paradigm, facilitating seamless adaptation with the methods of value decomposition. Experimental results demonstrate that QDAP significantly improves learning speed and achieves superior cooperation performance on challenging Starcraft II micromanagement benchmark tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI