计算机科学
强化学习
维数之咒
人工神经网络
数学优化
虚构的游戏
博弈论
人工智能
最优控制
数学
纳什均衡
数理经济学
出处
期刊:Neurocomputing
[Elsevier BV]
日期:2021-10-28
卷期号:484: 46-58
被引量:21
标识
DOI:10.1016/j.neucom.2021.01.141
摘要
In this paper, the intelligent design for the pursuit-evasion game with large scale multi-pursuer and multi-evader has been investigated. Due to the vast number of agents, the notorious "Curse of Dimensionality" can seriously challenge the traditional design in multi-player pursuit-evasion game, especially under harsh environment with limited communication resource to support information exchange among multi-players. To address this intractable challenge, the emerging Mean Field Games (MFG) theory has been utilized to solve the optimal pursuit-evasion strategies based on a new form of probability density function (PDF) instead of detailed information from all the other players/agents. As such, not only the information exchange is reduced, but also the computation dimension for the optimal strategy derivation is decreased. Specifically, the MFG has been integrated into the pursuit-evasion game to generate a hierarchical structure where the pursuers and the evaders form two mean field groups separately. To online solve the mean field equations, i.e., two coupled partial differential equations, the actor-critic reinforcement learning mechanism is adopted and further extended to a novel actor-critic-mass-opponent (ACMO) approach. In ACMO, the actor neural network estimates the optimal control, the critic neural network approximates the optimal cost function, the mass neural network learns the agent's group PDF, and the opponent neural network predicts the opponents' average states in the form of PDF that causes maximum cost for the agent's group. The Lyapunov theory is utilized to provide the convergence analysis for all neural networks and the stability analysis for the closed-loop system. Eventually, a series of numerical simulations are conducted to demonstrate the effectiveness of the developed scheme.
科研通智能强力驱动
Strongly Powered by AbleSci AI