强化学习
计算机科学
传播
马尔可夫决策过程
隐蔽的
地铁列车时刻表
水准点(测量)
实时计算
背景(考古学)
部分可观测马尔可夫决策过程
马尔可夫过程
弹道
马尔可夫链
马尔可夫模型
人工智能
机器学习
电信
地理
统计
古生物学
哲学
物理
操作系统
天文
生物
语言学
数学
大地测量学
作者
Jinsong Hu,Mingqian Guo,Riqing Chen,Youjia Chen,Feng Shu,Zhizhang Chen
标识
DOI:10.1109/iccc55456.2022.9880634
摘要
This paper considers covert communications in the context of unmanned aerial vehicle (UAV) networks where the UAV is employed as a transmitter to covertly disseminate data to a group of legitimate receivers on the ground, while ensuring that the data dissemination is not detected by the wardens. Considering the endurance time limit of UAV, our goal is to minimize the UAV's mission completion time by jointly optimizing the trajectory of UAV and the ground receivers' schedule. Since the environment considered is dynamic, the optimization problem is firstly modeled as a Markov decision process. Taking the advantage of the deep reinforcement learning (DRL) to learn dynamically from the environment, we propose a twin-delayed deep deterministic policy gradient (TD3) aided covert data dissemination (TD3-CDD) algorithm. In particular, we developed an advanced reward design mechanism to ensure the effectiveness of the constraints on UAV. Our examination shows that the TD3-CDD algorithm enables the UAV to complete covert data dissemination in a shorter time than a benchmark scheme.
科研通智能强力驱动
Strongly Powered by AbleSci AI