数学优化
强化学习
马尔可夫决策过程
计算机科学
维数之咒
调度(生产过程)
最优化问题
启发式
拉格朗日松弛
马尔可夫过程
数学
人工智能
统计
作者
Zoubeir Mlika,Soumaya Cherkaoui
出处
期刊:IEEE Transactions on Intelligent Transportation Systems
[Institute of Electrical and Electronics Engineers]
日期:2022-07-25
卷期号:23 (12): 23597-23612
被引量:13
标识
DOI:10.1109/tits.2022.3190799
摘要
This paper studies the problem of minimizing the age of information (AoI) in cellular vehicle-to-everything communications. To provide minimal AoI and high reliability for vehicles' safety information, NOMA is exploited. We reformulate a resource allocation problem that involves half-duplex transceiver selection, broadcast coverage optimization, power allocation, and resource block scheduling. First, to obtain the optimal solution, we formulate the problem as a mixed-integer nonlinear programming problem and then study its NP-hardness. The NP-hardness result motivates us to design simple solutions. Consequently, we model the problem as a single-agent Markov decision process to solve the problem efficiently using fingerprint deep reinforcement learning techniques such as deep-Q-network (DQN) methods. Nevertheless, applying DQN is not straightforward due to the curse of dimensionality implied by the large and mixed action space that contains discrete and continuous optimization decisions. Therefore, to solve this mixed discrete/continuous problem efficiently, simply and elegantly, we propose a decomposition technique that consists of first solving the discrete subproblem using a matching algorithm based on state-of-the-art stable roommate matching and then solving the continuous subproblem using DRL algorithm that is based on deep deterministic policy gradient DDPG. We validate our proposed method through Monte Carlo simulations where we show that the decomposed matching and DRL algorithm successfully minimizes the AoI and achieves almost 66% performance gain compared to the best benchmarks for various vehicles' speeds, transmission power, or packet sizes. Further, we prove the existence of an optimal value of broadcast coverage at which the learning algorithm provides the optimal AoI.
科研通智能强力驱动
Strongly Powered by AbleSci AI