动态规划
趋同(经济学)
计算机科学
数学优化
微分博弈
控制(管理)
差速器(机械装置)
过程(计算)
价值(数学)
贝尔曼方程
微分动态规划
强化学习
最优控制
序贯博弈
博弈论
控制理论(社会学)
数学
人工智能
数理经济学
机器学习
工程类
航空航天工程
经济
操作系统
经济增长
作者
Yun Zhang,Lulu Zhang,Yunze Cai
标识
DOI:10.1109/jas.2023.124125
摘要
This paper presents a novel comperative value iteration (VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof. The players are divided into two groups in the learning process and adapt their policies sequentially. Our method removes the dependence of admissible initial policies, which is one of the main drawbacks of the PI-based frameworks. Furthermore, this algorithm enables the players to adapt their control policies without full knowledge of others' system parameters or control laws. The efficacy of our method is illustrated by three examples.
科研通智能强力驱动
Strongly Powered by AbleSci AI