部分可观测马尔可夫决策过程
强化学习
马尔可夫决策过程
计算机科学
比例(比率)
过程(计算)
人工智能
国家(计算机科学)
遥控水下航行器
马尔可夫过程
实时计算
移动机器人
机器学习
马尔可夫链
机器人
马尔可夫模型
算法
物理
数学
量子力学
统计
操作系统
作者
Chao Wang,Jian Wang,Yuan Shen,Xudong Zhang
出处
期刊:IEEE Transactions on Vehicular Technology
[Institute of Electrical and Electronics Engineers]
日期:2019-01-03
卷期号:68 (3): 2124-2136
被引量:293
标识
DOI:10.1109/tvt.2018.2890773
摘要
In this paper, we propose a deep reinforcement learning (DRL)-based method that allows unmanned aerial vehicles (UAVs) to execute navigation tasks in large-scale complex environments. This technique is important for many applications such as goods delivery and remote surveillance. The problem is formulated as a partially observable Markov decision process (POMDP) and solved by a novel online DRL algorithm designed based on two strictly proved policy gradient theorems within the actor-critic framework. In contrast to conventional simultaneous localization and mapping-based or sensing and avoidance-based approaches, our method directly maps UAVs' raw sensory measurements into control signals for navigation. Experiment results demonstrate that our method can enable UAVs to autonomously perform navigation in a virtual large-scale complex environment and can be generalized to more complex, larger-scale, and three-dimensional environments. Besides, the proposed online DRL algorithm addressing POMDPs outperforms the state-of-the-art.
科研通智能强力驱动
Strongly Powered by AbleSci AI