趋同(经济学)
鞍点
计算机科学
非线性系统
零和博弈
数学优化
理论(学习稳定性)
控制理论(社会学)
最优控制
贝尔曼方程
马鞍
零(语言学)
数学
控制(管理)
纳什均衡
人工智能
语言学
哲学
物理
量子力学
几何学
机器学习
经济
经济增长
作者
Kyriakos G. Vamvoudakis,Frank L. Lewis
标识
DOI:10.1109/cdc.2010.5717607
摘要
In this paper we present an online gaming algorithm based on policy iteration to solve the continuous-time (CT) two-player zero-sum game with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the game design HJI equation. This method finds in real-time suitable approximations of the optimal value, and the saddle point control policy and disturbance policy, while also guaranteeing closed-loop stability. The adaptive algorithm is implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of critic, control actor, and disturbance neural networks. We call this online gaming algorithm `synchronous' zero-sum game policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for critic, actor and disturbance networks. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.
科研通智能强力驱动
Strongly Powered by AbleSci AI