强化学习
水准点(测量)
计算机科学
趋同(经济学)
集合(抽象数据类型)
同种类的
人工智能
钢筋
泥灰岩
信息共享
基于Agent的模型
机器学习
数学
程序设计语言
地理
经济
组合数学
古生物学
万维网
大地测量学
构造盆地
生物
社会心理学
经济增长
心理学
作者
Justin K. Terry,Nathaniel Grammel,Ananth Hari,Luís Santos,Benjamin Black,Dinesh Manocha
出处
期刊:Cornell University - arXiv
日期:2020-05-27
被引量:29
摘要
Nonstationarity is a fundamental problem in cooperative multi-agent reinforcement learning (MARL)--each agent must relearn information about the other agent's policies due to the other agents learning, causing information to ring between agents and convergence to be slow. The MAILP model, introduced by Terry and Grammel (2020), is a novel model of information transfer during multi-agent learning. We use the MAILP model to show that increasing training centralization arbitrarily mitigates the slowing of convergence due to nonstationarity. The most centralized case of learning is parameter sharing, an uncommonly used MARL method, specific to environments with homogeneous agents, that bootstraps a single-agent reinforcement learning (RL) methods and learns an identical policy for each agent. We experimentally replicate the result of increased learning centralization leading to better performance on the MARL benchmark set from Gupta et al. (2017). We further apply parameter sharing to 8 modern single-agent deep RL (DRL) methods for the first time in the literature. With this, we achieved the best documented performance on a set of MARL benchmarks and achieved upto 44 times more average reward in as little as 16% as many episodes compared to documented parameter sharing arrangement. We finally offer a formal proof of a set of methods that allow parameter sharing to serve in environments with heterogeneous agents.
科研通智能强力驱动
Strongly Powered by AbleSci AI