强化学习
运动规划
计算机科学
路径(计算)
人工智能
航空学
航空航天工程
工程类
机器人
计算机网络
作者
Zhengjun Wang,Weifeng Gao,Genghui Li,Zhenkun Wang,Maoguo Gong
出处
期刊:IEEE transactions on emerging topics in computational intelligence
[Institute of Electrical and Electronics Engineers]
日期:2024-06-01
卷期号:8 (3): 2625-2639
被引量:5
标识
DOI:10.1109/tetci.2024.3369485
摘要
Unmanned aerial vehicles (UAVs) are widely used in urban search and rescue, where path planning plays a critical role. This paper proposes an approach using off-policy reinforcement learning (RL) with an improved exploration mechanism (IEM) based on prioritized experience replay (PER) and curiosity-driven exploration to address the time-constrained path planning problem for UAVs operating in complex unknown environments. Firstly, to meet the task's time constraints, we design a rollout algorithm based on PER to optimize the behavior policy and enhance sampling efficiency. Additionally, we address the issue that certain off-policy RL algorithms often get trapped in local optima in environments with sparse rewards by measuring curiosity using the states' unvisited time and generating intrinsic rewards to encourage exploration. Lastly, we introduce IEM into the sampling stage of various off-policy RL algorithms. Simulation experiments demonstrate that, compared to the original off-policy RL algorithms, the algorithms incorporating IEM can reduce the planning time required for rescuing paths and achieve the goal of rescuing all trapped individuals.
科研通智能强力驱动
Strongly Powered by AbleSci AI