强化学习
计算机科学
人工智能
相关性(法律)
机器学习
分解
国家(计算机科学)
预测值
价值(数学)
算法
政治学
生态学
医学
生物
内科学
法学
标识
DOI:10.1016/j.neunet.2023.05.021
摘要
Credit assignment is a crucial issue in multi-agent tasks employing a centralized training and decentralized execution paradigm. While value decomposition has demonstrated strong performance in Q-learning-based approaches and certain Actor–Critic variants, it remains challenging to achieve efficient credit assignment in multi-agent tasks using policy gradient methods due to decomposable value limitations. This paper introduces Predictive Contribution Measurement, an explicit credit assignment method that compares prediction errors among agents and allocates surrogate rewards based on their relevance to global state transitions, with a theoretical guarantee. With multi-agent proximal policy optimization (MAPPO) as a training backend, we propose Predictive Contribution MAPPO (PC-MAPPO). Our experiments demonstrate that PC-MAPPO, with a 10% warm-up phase, outperforms MAPPO, QMIX, and Weighted QMIX on StarCraft multi-agent challenge tasks, particularly in maps requiring heightened cooperation to defeat enemies, such as the map corridor. Employing a pre-trained predictor, PC-MAPPO achieves significantly improved performance on all tested super-hard maps. In parallel training scenarios, PC-MAPPO exhibits superior data efficiency and achieves state-of-the-art performance compared to other methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI