强化学习
计算机科学
多样性(控制论)
简单(哲学)
在线和离线
人工智能
机器学习
离线学习
钢筋
在线学习
心理学
社会心理学
认识论
操作系统
万维网
哲学
作者
Yifan Wu,George Tucker,Ofir Nachum
出处
期刊:Cornell University - arXiv
日期:2019-09-25
被引量:111
摘要
In reinforcement learning (RL) research, it is common to assume access to direct online interactions with the environment. However in many real-world applications, access to the environment is limited to a fixed offline dataset of logged experience. In such settings, standard RL algorithms have been shown to diverge or otherwise yield poor performance. Accordingly, recent work has suggested a number of remedies to these issues. In this work, we introduce a general framework, behavior regularized actor critic (BRAC), to empirically evaluate recently proposed methods as well as a number of simple baselines across a variety of offline continuous control tasks. Surprisingly, we find that many of the technical complexities introduced in recent methods are unnecessary to achieve strong performance. Additional ablations provide insights into which design choices matter most in the offline RL setting.
科研通智能强力驱动
Strongly Powered by AbleSci AI