A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning

计算机科学 强化学习 增强学习 人工智能 机器学习
作者
Yi Yuan,Zhu Liang Yu,Zhenghui Gu,Yao Yeboah,Wei Wu,Xinyang Deng,Yuanqing Li
出处
期刊:Knowledge Based Systems [Elsevier BV]
卷期号:175: 107-117 被引量:33
标识
DOI:10.1016/j.knosys.2019.03.018
摘要

Deep reinforcement learning (DRL) algorithms with experience replays have been used to solve many sequential learning problems. However, in practice, DRL algorithms still suffer from the data inefficiency problem, which limits their applicability in many scenarios, and renders them inefficient in solving real-world problems. To improve the data efficiency of DRL, in this paper, a new multi-step method is proposed. Unlike traditional algorithms, the proposed method uses a new return function, which alters the discount of future rewards while decreasing the impact of the immediate reward when selecting the current state action. This approach has the potential to improve the efficiency of reward data. By combining the proposed method with classic DRL algorithms, deep Q-networks (DQN) and double deep Q-networks (DDQN), two novel algorithms are proposed for improving the efficiency of learning from experience replay. The performance of the proposed algorithms, expected n-step DQN (EnDQN) and expected n-step DDQN (EnDDQN), are validated using two simulation environments, CartPole and DeepTraffic. The experimental results demonstrate that the proposed multi-step methods greatly improve the data efficiency of DRL agents while further improving the performance of existing classic DRL algorithms when incorporated into their training.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
上官若男应助yuyuyu采纳,获得10
1秒前
Rui_Rui应助梦回采纳,获得10
2秒前
3秒前
亮liang完成签到,获得积分10
3秒前
4秒前
4秒前
LYQ完成签到 ,获得积分10
4秒前
4秒前
传奇3应助xhy采纳,获得10
5秒前
7秒前
共享精神应助今昔采纳,获得10
7秒前
顺心未来发布了新的文献求助10
8秒前
荔枝发布了新的文献求助50
9秒前
CodeCraft应助null采纳,获得10
9秒前
10秒前
洛希极限发布了新的文献求助10
10秒前
海纳百川发布了新的文献求助10
11秒前
张凌完成签到,获得积分10
12秒前
xty完成签到,获得积分10
12秒前
顾矜应助顺心未来采纳,获得10
13秒前
啾咪发布了新的文献求助10
14秒前
帅锦涛完成签到,获得积分10
14秒前
杜若完成签到,获得积分10
16秒前
16秒前
今昔完成签到,获得积分10
16秒前
洛希极限完成签到,获得积分10
17秒前
Superman完成签到,获得积分10
17秒前
18秒前
18秒前
超模咕咕鸡完成签到,获得积分10
19秒前
tao完成签到,获得积分10
19秒前
今昔发布了新的文献求助10
20秒前
ontheway发布了新的文献求助10
21秒前
科目三应助卡卡采纳,获得10
22秒前
24秒前
holo发布了新的文献求助10
24秒前
24秒前
王佳倩完成签到,获得积分10
25秒前
罗先生完成签到,获得积分10
26秒前
SciGPT应助难过盼海采纳,获得10
26秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Cronologia da história de Macau 5000
Petrology and Plate Tectonics 800
Electrode Potentials 550
Matrix Methods in Data Mining and Pattern Recognition 510
Trees of tropical Asia : an illustrated guide to diversity 500
Materials Informatics Molecules, Crystals and Beyond A volume in Acta Materialia Book Series 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 内科学 物理 复合材料 催化作用 细胞生物学 无机化学 光电子学 物理化学 电极 基因
热门帖子
关注 科研通微信公众号,转发送积分 7046427
求助须知:如何正确求助?哪些是违规求助? 8712450
关于积分的说明 18448179
捐赠科研通 6560675
什么是DOI,文献DOI怎么找? 3118619
关于科研通互助平台的介绍 2204565
邀请新用户注册赠送积分活动 2093993