强化学习
计算机科学
马尔可夫决策过程
人工智能
构造(python库)
比例(比率)
过程(计算)
机器学习
领域(数学分析)
状态空间
自主代理人
国家(计算机科学)
方案(数学)
马尔可夫过程
算法
统计
操作系统
量子力学
物理
数学分析
程序设计语言
数学
作者
Chao Wang,Jian Wang,Jingjing Wang,Xudong Zhang
出处
期刊:IEEE Internet of Things Journal
[Institute of Electrical and Electronics Engineers]
日期:2020-02-11
卷期号:7 (7): 6180-6190
被引量:126
标识
DOI:10.1109/jiot.2020.2973193
摘要
Unmanned aerial vehicles (UAVs) have the potential in delivering Internet-of-Things (IoT) services from a great height, creating an airborne domain of the IoT. In this article, we address the problem of autonomous UAV navigation in large-scale complex environments by formulating it as a Markov decision process with sparse rewards and propose an algorithm named deep reinforcement learning (RL) with nonexpert helpers (LwH). In contrast to prior RL-based methods that put huge efforts into reward shaping, we adopt the sparse reward scheme, i.e., a UAV will be rewarded if and only if it completes navigation tasks. Using the sparse reward scheme ensures that the solution is not biased toward potentially suboptimal directions. However, having no intermediate rewards hinders the agent from efficient learning since informative states are rarely encountered. To handle the challenge, we assume that a prior policy (nonexpert helper) that might be of poor performance is available to the learning agent. The prior policy plays the role of guiding the agent in exploring the state space by reshaping the behavior policy used for environmental interaction. It also assists the agent in achieving goals by setting dynamic learning objectives with increasing difficulty. To evaluate our proposed method, we construct a simulator for UAV navigation in large-scale complex environments and compare our algorithm with several baselines. Experimental results demonstrate that LwH significantly outperforms the state-of-the-art algorithms handling sparse rewards and yields impressive navigation policies comparable to those learned in the environment with dense rewards.
科研通智能强力驱动
Strongly Powered by AbleSci AI