强化学习
计算机科学
人工智能
杠杆(统计)
运动(物理)
一致性(知识库)
计算机视觉
人机交互
作者
Yangru Huang,Peixi Peng,Yifan Zhao,Yunpeng Zhai,Haoran Xu,Yonghong Tian
标识
DOI:10.1109/iccv51070.2023.00023
摘要
Efficient motion and appearance modeling are critical for vision-based Reinforcement Learning (RL). However, existing methods struggle to reconcile motion and appearance information within the state representations learned from a single observation encoder. To address the problem, we present Synergizing Interactive Motion-appearance Understanding (Simoun), a unified framework for vision-based RL Given consecutive observation frames, Simoun deliberately and interactively learns both motion and appearance features through a dual-path network architecture. The learning process collaborates with a structural interactive module, which explores the latent motion-appearance structures from the two network paths to leverage their complementarity. To promote sample efficiency, we further design a consistency-guided curiosity module to encourage the exploration of under-learned observations. During training, the curiosity module provides intrinsic rewards according to the consistency of environmental temporal dynamics, which are deduced from both motion and appearance network paths. Experiments conducted on Deep-Mind control suite and CARLA automatic driving benchmarks demonstrate the effectiveness of Simoun, where it performs favorably against state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI