强化学习
好奇心
运动规划
任务(项目管理)
采样(信号处理)
计算机科学
路径(计算)
汤普森抽样
人工智能
数学优化
机器学习
工程类
机器人
数学
系统工程
计算机网络
计算机视觉
心理学
社会心理学
贝叶斯概率
滤波器(信号处理)
作者
Zhengjun Wang,Weifeng Gao,Genghui Li,Zhenkun Wang,Maoguo Gong
标识
DOI:10.1109/tetci.2024.3369485
摘要
Unmanned aerial vehicles (UAVs) are widely used in urban search and rescue, where path planning plays a critical role. This paper proposes an approach using off-policy reinforcement learning (RL) with an improved exploration mechanism (IEM) based on prioritized experience replay (PER) and curiosity-driven exploration to address the time-constrained path planning problem for UAVs operating in complex unknown environments. Firstly, to meet the task's time constraints, we design a rollout algorithm based on PER to optimize the behavior policy and enhance sampling efficiency. Additionally, we address the issue that certain off-policy RL algorithms often get trapped in local optima in environments with sparse rewards by measuring curiosity using the states' unvisited time and generating intrinsic rewards to encourage exploration. Lastly, we introduce IEM into the sampling stage of various off-policy RL algorithms. Simulation experiments demonstrate that, compared to the original off-policy RL algorithms, the algorithms incorporating IEM can reduce the planning time required for rescuing paths and achieve the goal of rescuing all trapped individuals.
科研通智能强力驱动
Strongly Powered by AbleSci AI