受电弓
强化学习
悬链线
计算机科学
最大化
火车
功率(物理)
增强学习
模拟
工程类
人工智能
数学优化
地图学
量子力学
结构工程
机械工程
数学
物理
地理
作者
Hui Wang,Zhiwei Han,Wenqiang Liu,Yan‐Bo Wu
标识
DOI:10.1109/tnnls.2022.3219814
摘要
In high-speed railways, the pantograph-catenary system (PCS) is a critical subsystem of the train power supply system. In particular, when the double-PCS (DPCS) is in operation, the passing of the leading pantograph (LP) causes the contact force of the trailing pantograph (TP) to fluctuate violently, affecting the power collection quality of the electric multiple units (EMUs). The actively controlled pantograph is the most promising technique for reducing the pantograph-catenary contact force (PCCF) fluctuation and improving the current collection quality. Based on the Nash equilibrium framework, this study proposes a multiagent reinforcement learning (MARL) algorithm for active pantograph control called cooperative proximity policy optimization (Coo-PPO). In the algorithm implementation, the heterogeneous agents play a unique role in a cooperative environment guided by the global value function. Then, a novel reward propagation channel is proposed to reveal implicit associations between agents. Furthermore, a curriculum learning approach is adopted to strike a balance between reward maximization and rational movement patterns. An existing MARL algorithm and a traditional control strategy are compared in the same scenario to validate the proposed control strategy's performance. The experimental results show that the Coo-PPO algorithm obtains more rewards, significantly suppresses the fluctuation in PCCF (up to 41.55%), and dramatically decreases the TP's offline rate (up to 10.77%). This study adopts MARL technology for the first time to address the coordinated control of double pantographs in DPCS.
科研通智能强力驱动
Strongly Powered by AbleSci AI