强化学习
计算机科学
钢筋
人工智能
心理学
社会心理学
作者
Thomas M. Moerland,Joost Broekens,Aske Plaat,Catholijn M. Jonker
出处
期刊:Foundations and trends in machine learning
[Now Publishers]
日期:2023-01-01
卷期号:16 (1): 1-118
被引量:243
摘要
Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is an important challenge in artificial intelligence.Two key approaches to this problem are reinforcement learning (RL) and planning.This survey is an integration of both fields, better known as model-based reinforcement learning.Model-based RL has two main steps.First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction.Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop.After these two sections, we also discuss implicit model-based RL as an end-to-end alternative for model learning and planning, and we cover the potential benefits of model-based RL.Along the way, the survey also draws connections to several related RL fields, like hierarchical RL and transfer
科研通智能强力驱动
Strongly Powered by AbleSci AI