强化学习
计算机科学
弹道
机器学习
人工智能
最大化
离线学习
潜变量
混合模型
功能(生物学)
期望最大化算法
数学优化
在线学习
最大似然
数学
天文
统计
生物
物理
万维网
进化生物学
作者
Xiaoguang Li,Xin Zhang,Lixin Wang,Ge Yu
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2021-01-01
卷期号:9: 801-812
标识
DOI:10.1109/access.2020.3045300
摘要
Reinforcement learning has been widely applied for sequential decision making problems in various fields of the real world, including recommendation, e-learning, etc. The features of multi-policy, latent mixture environments and offline learning implied by many real applications bring a new challenge for reinforcement learning. To this challenge, the paper proposes a reinforcement learning approach called offline multi-policy gradient for latent mixture environments. The proposed method uses an objective of expected return of trajectory with respect to the joint distribution of trajectory and model, and adopts a multi-policy searching algorithm to find the optimal policies based on expectation maximization. We also prove that the off-policy technique of importance sampling and advantage function can be used by offline multi-policy learning with fixed historical trajectories. The effectiveness of our approach is demonstrated by the experiments on both synthetic and real datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI