强化学习
计算机科学
弹道
无线
轨迹优化
能量(信号处理)
实时计算
人工智能
电信
统计
物理
数学
天文
作者
Fuhong Song,Mingsen Deng,Huanlai Xing,Yanping Liu,Fei Ye,Zhiwen Xiao
标识
DOI:10.1109/tmc.2024.3384405
摘要
This paper investigates the problem of energy-efficient trajectory optimization with wireless charging (ETWC) in an unmanned aerial vehicle (UAV)-assisted mobile edge computing system. A UAV is dispatched to collect computation tasks from specific ground smart devices (GSDs) within its coverage while transmitting energy to the other GSDs. In addition, a high-altitude platform with a laser beam is deployed in the stratosphere to charge the UAV, so as to maintain its flight mission. The ETWC problem is characterized by multi-objective optimization, aiming to maximize both the energy efficiency of the UAV and the number of tasks collected via optimizing the UAV's flight trajectories. The conflict between the two objectives in the problem makes it quite challenging. Recently, some single-objective reinforcement learning (SORL) algorithms have been introduced to address the aforementioned problem. Nevertheless, these SORLs adopt linear scalarization to define the user utility, thus ignoring the conflict between objectives. Furthermore, in dynamic MEC scenarios, the relative importance assigned to each objective may vary over time, posing significant challenges for conventional SORLs. To solve the challenge, we first build a multi-objective Markov decision process that has a vectorial reward mechanism. There is a corresponding relationship between each component of the reward and one of the two objectives. Then, we propose a new trace-based experience replay scheme to modify sample efficiency and reduce replay buffer bias, resulting in a modified multi-objective reinforcement learning algorithm. The experiment results validate that the proposed algorithm can obtain better adaptability to dynamic preferences and a more favorable balance between objectives compared with several algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI