强化学习
计算机科学
人工神经网络
趋同(经济学)
数学优化
包络线(雷达)
理论(学习稳定性)
贝尔曼方程
算法
人工智能
功能(生物学)
机器学习
数学
生物
进化生物学
电信
经济增长
经济
雷达
作者
Can Hu,Zhengwei Zhu,Lijia Wang,Chenyang Zhu,Yanfei Yang
出处
期刊:Electronics
[MDPI AG]
日期:2022-08-09
卷期号:11 (16): 2479-2479
标识
DOI:10.3390/electronics11162479
摘要
Multi-objective reinforcement learning (MORL) aims to uniformly approximate the Pareto frontier in multi-objective decision-making problems, which suffers from insufficient exploration and unstable convergence. We propose a multi-objective deep reinforcement learning algorithm (envelope with dueling structure, Noisynet, and soft update (EDNs)) to improve the ability of the agent to learn optimal multi-objective strategies. Firstly, the EDNs algorithm uses neural networks to approximate the value function and update the parameters based on the convex envelope of the solution boundary. Then, the DQN structure is replaced with the dueling structure, and the state value function is split into the dominance function and value function to make it converge faster. Secondly, the Noisynet method is used to add exploration noise to the neural network parameters to make the agent have a more efficient exploration ability. Finally, the soft update method updates the target network parameters to stabilize the training procedure. We use the DST environment as a case study, and the experimental results show that the EDNs algorithm has better stability and exploration capability than the EMODRL algorithm. In 1000 episodes, the EDNs algorithm improved the coverage by 5.39% and reduced the adaptation error by 36.87%.
科研通智能强力驱动
Strongly Powered by AbleSci AI