强化学习
计算机科学
元学习(计算机科学)
任务(项目管理)
背景(考古学)
人工智能
过程(计算)
适应(眼睛)
离线学习
编码器
构造(python库)
机器学习
在线学习
工程类
多媒体
古生物学
程序设计语言
物理
系统工程
光学
生物
操作系统
作者
Hui Wang,Zhigang Liu,Guiyang Hu,Xufan Wang,Zhiwei Han
出处
期刊:IEEE Transactions on Industrial Informatics
[Institute of Electrical and Electronics Engineers]
日期:2024-05-15
卷期号:20 (8): 10669-10679
被引量:6
标识
DOI:10.1109/tii.2024.3394554
摘要
Previous reinforcement learning (RL) methods suffer significant performance degradation or collapse when deployed to the real world due to the huge sim-real gap. This article proposes a hybrid offline-and-online meta-RL (HOMRL) algorithm that leverages prior task experience to learn and adapt to new pantograph active control tasks in real-world applications. The policy learning process consists of three phases: offline meta-policy pretraining, online adaptation, and fine-tuning. First, we construct an offline meta-RL approach that learns from the massive and heterogeneous static training datasets, eliminating online interaction's high cost and hazard. Second, we combine context-based meta-RL with online fine-tuning to generalize to challenging tasks, while high safety and success rates are critical in railway applications. Finally, the proposed environment-sensitive task encoder (TE) and well-trained agent can adapt to new tasks quickly and efficiently, even in unseen tasks and nonstationary environments. If the new task is similar to the prior data, the contextual meta-learner adapts immediately. If it is too different, it gradually adapts through fine-tuning.
科研通智能强力驱动
Strongly Powered by AbleSci AI