强化学习
计算机科学
时差学习
一套
人工智能
控制(管理)
机器学习
历史
考古
作者
Jayesh K. Gupta,Maxim Egorov,Mykel J. Kochenderfer
标识
DOI:10.1007/978-3-319-71682-4_5
摘要
This work considers the problem of learning cooperative policies in complex, partially observable domains without explicit communication. We extend three classes of single-agent deep reinforcement learning algorithms based on policy gradient, temporal-difference error, and actor-critic methods to cooperative multi-agent systems. To effectively scale these algorithms beyond a trivial number of agents, we combine them with a multi-agent variant of curriculum learning. The algorithms are benchmarked on a suite of cooperative control tasks, including tasks with discrete and continuous actions, as well as tasks with dozens of cooperating agents. We report the performance of the algorithms using different neural architectures, training procedures, and reward structures. We show that policy gradient methods tend to outperform both temporal-difference and actor-critic methods and that curriculum learning is vital to scaling reinforcement learning algorithms in complex multi-agent domains.
科研通智能强力驱动
Strongly Powered by AbleSci AI