强化学习
计算机科学
可扩展性
网络拓扑
供应
贪婪算法
卷积神经网络
人工智能
计算机网络
算法
数据库
作者
Junior Momo Ziazet,Brigitte Jaumard
标识
DOI:10.1109/icc45855.2022.9839228
摘要
We design an effective and scalable Deep Reinforcement Learning (DRL) approach for the Routing, Modulation and Spectrum Assignment (RMSA) problem in elastic optical networks. We use Convolutional Neural Networks (CNN) to embed the state and Deep Neural Networks (DNN) to learn the policy. We propose a novel state representation and reward function that interestingly guide the agent on assigning appropriate routes and spectrum by incorporating information on the spectrum utilisation and spectrum fragmentation. This gives the agent information about the consequence or cost of each action across the network, reducing the level of knowledge abstraction required for the agent. To show the effectiveness of the reward function and the importance of well-designed state representations, we have designed two state representations: the first with aggregation of spectrum occupancy information and the second without aggregation. The Proximal Policy Optimization (PPO) algorithm is investigated with an actor critic model where an entropy bonus is added to the loss function to ensure sufficient exploration. The proposed solution is compared with a greedy heuristic and a PPO with standard reward and state representation. Numerical results show that the proposed model provides very good solutions and works well on dataset instances with large topologies (up to 75 nodes). The proposed PPO outperformed the baseline algorithms by obtaining the largest throughput on all test instances. In addition, its spectrum usage has the lowest fragmentation.
科研通智能强力驱动
Strongly Powered by AbleSci AI