马尔可夫决策过程
计算机科学
估计
国家(计算机科学)
数学优化
马尔可夫链
状态空间
马尔可夫过程
空格(标点符号)
算法
应用数学
数学
机器学习
统计
经济
管理
操作系统
作者
Siliang Zeng,Mingyi Hong,Alfredo García
出处
期刊:Operations Research
[Institute for Operations Research and the Management Sciences]
日期:2024-09-19
被引量:1
标识
DOI:10.1287/opre.2022.0511
摘要
Researchers have introduced a new algorithm to estimate structural models of dynamic decisions by human agents, addressing the challenge of high computational complexity. Traditionally, this task involves a nested structure: an inner problem identifying an optimal policy and an outer problem maximizing a measure of fit. Previous methods have struggled with large discrete state spaces or high-dimensional continuous state spaces, often sacrificing reward estimation accuracy. The new approach combines policy improvement with a stochastic gradient step for likelihood maximization, ensuring accurate reward estimation without compromising computational efficiency. This single-loop algorithm, designed to handle high-dimensional state spaces, converges to a stationary solution with finite-time guarantees. When the reward is linearly parameterized, it approximates the maximum likelihood estimator sublinearly, offering a robust solution for complex decision modeling tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI