强化学习
零(语言学)
对偶(语法数字)
钢筋
跟踪(教育)
零和博弈
计算机科学
控制(管理)
数学优化
数学
人工智能
纳什均衡
心理学
社会心理学
艺术
哲学
语言学
教育学
文学类
作者
Xuejie Que,Zhenlei Wang
出处
期刊:IEEE Transactions on Circuits and Systems Ii-express Briefs
[Institute of Electrical and Electronics Engineers]
日期:2024-01-25
卷期号:71 (6): 3146-3150
被引量:1
标识
DOI:10.1109/tcsii.2024.3358676
摘要
The two-player zero-sum game method for solving optimal tracking problems with external disturbance has been extensively explored. However, challenges such as the selection of initial admissible policies and learning errors diminish the accuracy of the Nash equilibrium, even limiting the method's application to some extent. The proposed model-free primal-dual reinforcement learning algorithm utilizes state-input trajectories generated by a set of linearly independent initial vectors to obtain Nash equilibrium without the need for probing noise. Admissible policies for both players are treated as a non-convex constraint and solved from a primal-dual perspective. Simulation results for an inverter confirm that the proposed unbiased learning method not only exhibits superior tracking performance but also demonstrates a faster convergence speed.
科研通智能强力驱动
Strongly Powered by AbleSci AI