趋同(经济学)
计算机科学
过程(计算)
方案(数学)
数学优化
样品(材料)
比例(比率)
强化学习
空格(标点符号)
人工智能
数学
物理
数学分析
量子力学
经济
经济增长
操作系统
热力学
作者
Quan‐Yong Fan,Meiying Cai,Bin Xu
出处
期刊:IEEE transactions on neural networks and learning systems
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:: 1-10
被引量:1
标识
DOI:10.1109/tnnls.2024.3395508
摘要
Although deep deterministic policy gradient (DDPG) algorithm gets widespread attention as a result of its powerful functionality and applicability for large-scale continuous control, it cannot be denied that DDPG has problems such as low sample utilization efficiency and insufficient exploration. Therefore, an improved DDPG is presented to overcome these challenges in this article. Firstly, an optimizer based on fractional gradient is introduced into the algorithm network, which is conductive to increase the speed and accuracy of training convergence. On this basis, high-value experience replay based on weight-changed priority is proposed to improve sample utilization efficiency, and aiming to have a stronger exploration of the environment, an optimized exploration strategy for boundary action space is adopted. Finally, our proposed method is tested through the experiments of gym and pybullet platform. According to the results, our method speeds up the learning process, obtains higher average rewards in comparison with other algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI