强化学习
零和博弈
趋同(经济学)
零(语言学)
纳什均衡
计算机科学
马尔可夫决策过程
数学优化
离散时间和连续时间
马尔可夫链
代数Riccati方程
集合(抽象数据类型)
跳跃
代数数
极限(数学)
数学
马尔可夫过程
Riccati方程
人工智能
微分方程
数学分析
语言学
哲学
统计
物理
量子力学
机器学习
程序设计语言
经济
经济增长
作者
Xuewen Zhang,Hao Shen,Feng Li,Jing Wang
摘要
Abstract This article concentrates on the non‐zero‐sum games problem of discrete‐time Markov jump systems without requiring the system dynamics information. First, the multiplayer non‐zero‐sum games problem can be converted to solve a set of coupled game algebraic Riccati equations, which is difficult to be solved directly. Then, to obtain the optimal control policies, a model‐based algorithm adapting the policy iteration approach is proposed. However, the model‐based algorithm relies on system dynamics information, which has the limitations in practice. Subsequently, an off‐policy reinforcement learning algorithm is given to get rid of the dependence on system dynamics information, which only uses the information of system states and inputs. Moreover, the proof of convergence and Nash equilibrium are also given. Finally, a numerical example is given to demonstrate the effectiveness of the proposed algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI