强化学习
避碰
计算机科学
弹道
碰撞
噪音(视频)
人工智能
任务(项目管理)
集合(抽象数据类型)
基本事实
不完美的
工程类
计算机安全
语言学
图像(数学)
物理
哲学
程序设计语言
系统工程
天文
作者
Dawei Wang,Tingxiang Fan,Tao Han,Jia Pan
出处
期刊:IEEE robotics and automation letters
日期:2020-02-18
卷期号:5 (2): 3098-3105
被引量:103
标识
DOI:10.1109/lra.2020.2974648
摘要
Unlike autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs) have a higher dimensional configuration space, which makes the motion planning of multi-UAVs a challenging task. In addition, uncertainties and noises are more significant in UAV scenarios, which increases the difficulty of autonomous navigation for multi-UAV. In this letter, we proposed a two-stage reinforcement learning (RL) based multi-UAV collision avoidance approach without explicitly modeling the uncertainty and noise in the environment. Our goal is to train a policy to plan a collision-free trajectory by leveraging local noisy observations. However, the reinforcement learned collision avoidance policies usually suffer from high variance and low reproducibility, because unlike supervised learning, RL does not have a fixed training set with ground-truth labels. To address these issues, we introduced a two-stage training method for RL based collision avoidance. For the first stage, we optimize the policy using a supervised training method with a loss function that encourages the agent to follow the well-known reciprocal collision avoidance strategy. For the second stage, we use policy gradient to refine the policy. We validate our policy in a variety of simulated scenarios, and the extensive numerical simulations demonstrate that our policy can generate time-efficient and collision-free paths under imperfect sensing, and can well handle noisy local observations with unknown noise levels.
科研通智能强力驱动
Strongly Powered by AbleSci AI