强化学习
计算机科学
维数(图论)
人工智能
时差学习
降维
熵(时间箭头)
机器学习
模仿
还原(数学)
数学
纯数学
几何学
物理
社会心理学
量子力学
心理学
作者
Voot Tangkaratt,Jun Morimoto,Masashi Sugiyama
标识
DOI:10.1016/j.neunet.2016.08.005
摘要
The goal of reinforcement learning is to learn an optimal policy which controls an agent to acquire the maximum cumulative reward. The model-based reinforcement learning approach learns a transition model of the environment from data, and then derives the optimal policy using the transition model. However, learning an accurate transition model in high-dimensional environments requires a large amount of data which is difficult to obtain. To overcome this difficulty, in this paper, we propose to combine model-based reinforcement learning with the recently developed least-squares conditional entropy (LSCE) method, which simultaneously performs transition model estimation and dimension reduction. We also further extend the proposed method to imitation learning scenarios. The experimental results show that policy search combined with LSCE performs well for high-dimensional control tasks including real humanoid robot control.
科研通智能强力驱动
Strongly Powered by AbleSci AI