强化学习
计算机科学
趋同(经济学)
理论(学习稳定性)
能量(信号处理)
国家(计算机科学)
状态空间
控制(管理)
人工智能
模拟
数学
机器学习
算法
经济增长
统计
经济
作者
Ran Zhang,Miao Wang,Lin X. Cai
标识
DOI:10.1109/globecom42002.2020.9348219
摘要
Energy-aware control for multiple unmanned aerial vehicles (UAVs) is one of the major research interests in UAV based networking. Yet few existing works have focused on how the network should react around the timing when the UAV lineup is changed. In this work, we study proactive self-remedy of energy-constrained UAV networks when one or more UAVs are short of energy and about to quit for charging. We target at an energy-aware optimal UAV control policy which proactively relocates the UAVs when any UAV is about to quit the network, rather than passively dispatches the remaining UAVs after the quit. Specifically, a deep reinforcement learning (DRL)-based self remedy approach, named SREC-DRL, is proposed to maximize the accumulated user satisfaction scores for a certain period within which at least one UAV will quit the network. To handle the continuous state and action space in the problem, the state-of-the-art algorithm of the actor-critic DRL, i.e., deep deterministic policy gradient (DDPG), is applied with better convergence stability. Numerical results demonstrate that compared with the passive reaction method, the proposed SREC-DRL approach shows a 12.12% gain in accumulative user satisfaction score during the remedy period.
科研通智能强力驱动
Strongly Powered by AbleSci AI