Deep Deterministic Policy Gradient to Minimize the Age of Information in Cellular V2X Communications

数学优化强化学习马尔可夫决策过程计算机科学维数之咒调度（生产过程）最优化问题启发式拉格朗日松弛马尔可夫过程数学人工智能统计

作者

Zoubeir Mlika,Soumaya Cherkaoui

出处

期刊：IEEE Transactions on Intelligent Transportation Systems [Institute of Electrical and Electronics Engineers]
日期：2022-07-25 卷期号：23 (12): 23597-23612 被引量：13

链接

arxiv.org arxiv.org datacite.orgdoi.org

标识

DOI：10.1109/tits.2022.3190799

摘要

This paper studies the problem of minimizing the age of information (AoI) in cellular vehicle-to-everything communications. To provide minimal AoI and high reliability for vehicles' safety information, NOMA is exploited. We reformulate a resource allocation problem that involves half-duplex transceiver selection, broadcast coverage optimization, power allocation, and resource block scheduling. First, to obtain the optimal solution, we formulate the problem as a mixed-integer nonlinear programming problem and then study its NP-hardness. The NP-hardness result motivates us to design simple solutions. Consequently, we model the problem as a single-agent Markov decision process to solve the problem efficiently using fingerprint deep reinforcement learning techniques such as deep-Q-network (DQN) methods. Nevertheless, applying DQN is not straightforward due to the curse of dimensionality implied by the large and mixed action space that contains discrete and continuous optimization decisions. Therefore, to solve this mixed discrete/continuous problem efficiently, simply and elegantly, we propose a decomposition technique that consists of first solving the discrete subproblem using a matching algorithm based on state-of-the-art stable roommate matching and then solving the continuous subproblem using DRL algorithm that is based on deep deterministic policy gradient DDPG. We validate our proposed method through Monte Carlo simulations where we show that the decomposed matching and DRL algorithm successfully minimizes the AoI and achieves almost 66% performance gain compared to the best benchmarks for various vehicles' speeds, transmission power, or packet sizes. Further, we prove the existence of an optimal value of broadcast coverage at which the learning algorithm provides the optimal AoI.

求助该文献

Deep Deterministic Policy Gradient to Minimize the Age of Information in Cellular V2X Communications

今日热心研友