强化学习
纳什均衡
完美信息
后悔
虚构的游戏
马尔可夫决策过程
ε平衡
数学优化
相关平衡
计算机科学
最佳反应
反事实思维
水准点(测量)
马尔可夫过程
数理经济学
数学
重复博弈
博弈论
人工智能
均衡选择
机器学习
认识论
哲学
统计
大地测量学
地理
作者
Kangxin He,Haolin Wu,Zhuang Wang,Hui Li
摘要
Finding Nash equilibrium in the domain of imperfect information games as a challenging problem has received much attention. Neural Fictitious Self-Play (NFSP) is a popular model-free machine learning algorithm and has computed approximate Nash equilibrium on such games. However, the deep reinforcement learning method used to approximate the best response in NFSP requires reaching a fully observable Markov state, while the states in imperfect information games are partially observable and non-Markovian, which results in a poor approximation of the best response. Thus, NFSP needs more iterations to converge. In this study, we present a new reinforcement learning method that is inspired by counterfactual regret minimization to relax the Markov requirement by iteratively updating policy according to the regret matching process. Combining this new reinforcement learning algorithm with fictitious play, we further present a novel algorithm to find approximate Nash equilibrium in zero-sum imperfect information games. Experimental results in three benchmark games show that this new algorithm can find approximate Nash equilibrium effectively and converge much faster compared with baseline.
科研通智能强力驱动
Strongly Powered by AbleSci AI