强化学习
计算机科学
运动规划
路径(计算)
功能(生物学)
数学优化
模拟
实时计算
人工智能
数学
计算机网络
进化生物学
生物
机器人
作者
Yan Li,Xuejun Zhang,Yuanjun Zhu,Ziang Gao
标识
DOI:10.1109/dasc58513.2023.10311219
摘要
Under the demand of urban terminal "Last Mile Delivery" scenario, finding a safe and efficient UAV path planning method is a crucial issue of current research. Nowadays, reinforcement learning is widely used in UAV path planning, but it is difficult to ensure the safety of the learning or execution phases due to the lack of hard constraints. Aiming at the constraints above, this paper studies how to combine safety properties with RL algorithm to find a safe path and proposes a safe reinforcement learning method called Shield-DDPG for UAV path planning. In the method, a protection mechanism Shield is mainly introduced to prevent the algorithm from outputting unsafe actions. Further, the state space, action space, and reward function are specifically improved for efficiency and safety. Then we compare the Shield-DDPG algorithm with the DDPG and RRT algorithm in some different scenarios, and the results show that the proposed algorithm has a better performance. With the proposed path planning method, UAV can learn well to efficiently and safely reach the destination via calling the trained policy. This research is of great importance to UAV operations and practical applications in complex urban airspace.
科研通智能强力驱动
Strongly Powered by AbleSci AI