蒙特卡罗树搜索
对数
数学优化
覆盖树
树(集合论)
马尔可夫决策过程
蒙特卡罗方法
趋同(经济学)
背景(考古学)
计算机科学
后悔
数学
应用数学
马尔可夫过程
人工智能
组合数学
机器学习
统计
树冠聚类算法
相关聚类
生物
数学分析
古生物学
经济
聚类分析
经济增长
作者
Devavrat Shah,Qiaomin Xie,Zhi Xu
出处
期刊:Operations Research
[Institute for Operations Research and the Management Sciences]
日期:2022-03-01
卷期号:70 (6): 3234-3260
被引量:6
标识
DOI:10.1287/opre.2021.2239
摘要
In “Nonasymptotic Analysis of Monte Carlo Tree Search,” D. Shah, Q. Xie, and Z. Xu consider the popular tree-based search strategy, the Monte Carlo Tree Search (MCTS), in the context of the infinite-horizon discounted Markov decision process. They show that MCTS with an appropriate polynomial rather than logarithmic bonus term indeed leads to the desired convergence property. The authors derive the results by establishing a polynomial concentration property of regret for a class of nonstationary multiarm bandits. Furthermore, using this as a building block, they demonstrate that MCTS, combined with nearest neighbor supervised learning, acts as a “policy improvement” operator that can iteratively improve value function approximation.
科研通智能强力驱动
Strongly Powered by AbleSci AI