强化学习
计算机科学
服务质量
延迟(音频)
体验质量
资源配置
无线接入网
蜂窝网络
资源管理(计算)
趋同(经济学)
计算机网络
分布式计算
人工智能
基站
电信
经济
移动台
经济增长
作者
Nessrine Hammami,Kim Khoa Nguyen
标识
DOI:10.1109/wcnc51071.2022.9771605
摘要
Recently, Deep Reinforcement Learning (DRL) has increasingly been used to solve complex problems in mobile networks. There are two main types of DRL models: off-policy and on-policy. Both of them have been shown to have advantages. While off-policy models can improve sample efficiency, on-policy models are generally easy to implement and have stable performance. Therefore, it becomes hard to decide the appropriate model in a given scenario. In this paper, we compare an on-policy model: Proximal Policy Optimization (PPO) with an off-policy model: Sample Efficient Actor-Critic with Experience Replay (ACER) in solving a resource allocation problem for a stringent Quality of Service (QoS) application. Results show that for an Open Radio Access Network (O-RAN) with latency-sensitive and latency-tolerant users, both DRL models outperform a greedy algorithm. We also point out that the on-policy model can guarantee a good trade-off between energy consumption and users latency, while the off-policy model provides a faster convergence.
科研通智能强力驱动
Strongly Powered by AbleSci AI