强化学习
计算机科学
异步通信
趋同(经济学)
状态空间
理论(学习稳定性)
控制器(灌溉)
维数(图论)
弹道
国家(计算机科学)
人工智能
动作(物理)
分布式计算
机器学习
计算机网络
算法
数学
生物
量子力学
统计
经济增长
物理
经济
纯数学
农学
天文
作者
Ran Zhang,Miao Wang,Lin X. Cai,Xuemin Shen
标识
DOI:10.1109/twc.2021.3058533
摘要
Multi-Unmanned Aerial Vehicle (UAV) control is one of the major research interests in UAV-based networks. Yet few existing works focus on how the network should optimally react when the UAV lineup and user distribution change. In this work, proactive self-regulation (PSR) of UAV-based networks is investigated when one or more UAVs are about to quit or join the network, with considering dynamic user distribution. We target at an optimal UAV trajectory control policy which proactively relocates the UAVs whenever the UAV lineup is about to change, rather than passively dispatches the UAVs after the change. Specifically, a deep reinforcement learning (DRL)-based self-regulation approach is developed to maximize the accumulated user satisfaction (US) score for a certain period within which at least one UAV will quit or join the network. To handle the changed dimension of the state-action space before and after the lineup changes, the state transition is deliberately designed. To accommodate continuous state and action space, an actor-critic based DRL, i.e., deep deterministic policy gradient (DDPG), is applied with better convergence stability. To effectively promote learning exploration around the timing of lineup change, an asynchronous parallel computing (APC) learning structure is proposed. Referred to as PSR-APC, the developed approach is then extended to the case of dynamic user distribution by incorporating time as one of the agent states. Finally, numerical results are presented to demonstrate the convergence and superiority of PSR-APC over a passive reaction method, and its capability in jointly handling the dynamics of both UAV lineup and user distribution.
科研通智能强力驱动
Strongly Powered by AbleSci AI