强化学习
计算机科学
动作选择
动作(物理)
人工智能
功能(生物学)
选择(遗传算法)
熵(时间箭头)
心理学
感知
量子力学
进化生物学
生物
物理
神经科学
作者
Feiye Zhang,Qingyu Yang,Dou An
标识
DOI:10.1016/j.neunet.2022.09.012
摘要
Multi-agent deep reinforcement learning algorithms with centralized training with decentralized execution (CTDE) paradigm has attracted growing attention in both industry and research community. However, the existing CTDE methods follow the action selection paradigm that all agents choose actions at the same time, which ignores the heterogeneous roles of different agents. Motivated by the human wisdom in cooperative behaviors, we present a novel leader-following paradigm based deep multi-agent cooperation method (LFMCO) for multi-agent cooperative games. Specifically, we define a leader as someone who broadcasts a message representing the selected action to all subordinates. After that, the followers choose their individual action based on the received message from the leader. To measure the influence of leader's action on followers, we introduced a concept of information gain, i.e., the change of followers' value function entropy, which is positively correlated with the influence of leader's action. We evaluate the LFMCO on several cooperation scenarios of StarCraft2. Simulation results confirm the significant performance improvements of LFMCO compared with four state-of-the-art benchmarks on the challenging cooperative environment.
科研通智能强力驱动
Strongly Powered by AbleSci AI