计算机科学
一致性(知识库)
单调函数
猛增
符号
匹配(统计)
人工智能
数学
数学分析
统计
算术
作者
Shanqi Liu,Weiwei Liu,Wenzhou Chen,Guanzhong Tian,Jun Chen,Yao Tong,Junjie Cao,Yong Liu
标识
DOI:10.1109/tnnls.2023.3262921
摘要
Recently value-based centralized training with decentralized execution (CTDE) multi-agent reinforcement learning (MARL) methods have achieved excellent performance in cooperative tasks. However, the most representative method among these methods, Q-network MIXing (QMIX), restricts the joint action $Q$ values to be a monotonic mixing of each agent's utilities. Furthermore, current methods cannot generalize to unseen environments or different agent configurations, which is known as ad hoc team play situation. In this work, we propose a novel $Q$ values decomposition that considers both the return of an agent acting on its own and cooperating with other observable agents to address the nonmonotonic problem. Based on the decomposition, we propose a greedy action searching method that can improve exploration and is not affected by changes in observable agents or changes in the order of agents' actions. In this way, our method can adapt to ad hoc team play situation. Furthermore, we utilize an auxiliary loss related to environmental cognition consistency and a modified prioritized experience replay (PER) buffer to assist training. Our extensive experimental results show that our method achieves significant performance improvements in both challenging monotonic and nonmonotonic domains, and can handle the ad hoc team play situation perfectly.
科研通智能强力驱动
Strongly Powered by AbleSci AI