强化学习
异步通信
计算机科学
效率低下
任务(项目管理)
理论(学习稳定性)
人工智能
机器学习
工程类
计算机网络
经济
微观经济学
系统工程
作者
Zhizheng Zhang,Jiale Chen,Zhibo Chen,Weiping Li
出处
期刊:IEEE transactions on cybernetics
[Institute of Electrical and Electronics Engineers]
日期:2019-12-31
卷期号:51 (2): 604-613
被引量:53
标识
DOI:10.1109/tcyb.2019.2939174
摘要
Deep deterministic policy gradient (DDPG) has been proved to be a successful reinforcement learning (RL) algorithm for continuous control tasks. However, DDPG still suffers from data insufficiency and training inefficiency, especially, in computationally complex environments. In this article, we propose asynchronous episodic DDPG (AE-DDPG), as an expansion of DDPG, which can achieve more effective learning with less training time required. First, we design a modified scheme for data collection in an asynchronous fashion. Generally, for asynchronous RL algorithms, sample efficiency or/and training stability diminish as the degree of parallelism increases. We consider this problem from the perspectives of both data generation and data utilization. In detail, we redesign experience replay by introducing the idea of episodic control so that the agent can latch on good trajectories rapidly. In addition, we also inject a new type of noise in action space to enrich the exploration behaviors. Experiments demonstrate that our AE-DDPG achieves higher rewards and requires less time consumption than most popular RL algorithms in learning to run task which has a computationally complex environment. Not limited to the control tasks in the computationally complex environments, AE-DDPG also achieves higher rewards and two-fold to four-fold improvement in sample efficiency on average compared with other variants of DDPG in MuJoCo environments. Furthermore, we verify the effectiveness of each proposed technique component through abundant ablation study.
科研通智能强力驱动
Strongly Powered by AbleSci AI