随机博弈
马尔可夫链
马尔可夫决策过程
趋同(经济学)
计算机科学
数学优化
马尔可夫过程
马尔可夫链的例子
遍历理论
增强学习
数理经济学
马尔可夫核
变阶马尔可夫模型
马尔可夫模型
强化学习
数学
人工智能
机器学习
经济
数学分析
统计
经济增长
作者
Richard M. Wheeler,Kumpati S. Narendra
出处
期刊:IEEE Transactions on Automatic Control
[Institute of Electrical and Electronics Engineers]
日期:1986-06-01
卷期号:31 (6): 519-526
被引量:99
标识
DOI:10.1109/tac.1986.1104342
摘要
The principal contribution of this paper is a new result on the decentralized control of finite Markov chains with unknown transition probabilities and rewords. One decentralized decision maker is associated with each state in which two or more actions (decisions) are available. Each decision maker uses a simple learning scheme, requiring minimal information, to update its action choice. It is shown that, if updating is done in sufficiently small steps, the group will converge to the policy that maximizes the long-term expected reward per step. The analysis is based on learning in sequential stochastic games and on certain properties, derived in this paper, of ergodic Markov chains. A new result on convergence in identical payoff games with a unique equilibrium point is also presented.
科研通智能强力驱动
Strongly Powered by AbleSci AI