计算机科学
强化学习
过度拟合
一般化
人工智能
鉴别器
人机交互
语义学(计算机科学)
弹道
水准点(测量)
语言模型
机器学习
人工神经网络
数学分析
电信
物理
数学
大地测量学
天文
探测器
程序设计语言
地理
作者
Jiawei Wang,Teng Wang,Lele Xu,Zichen He,Changyin Sun
出处
期刊:IEEE transactions on neural networks and learning systems
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:: 1-13
标识
DOI:10.1109/tnnls.2024.3398300
摘要
Vision-and-language navigation requires an agent to navigate in a photo-realistic environment by following natural language instructions. Mainstream methods employ imitation learning (IL) to let the agent imitate the behavior of the teacher. The trained model will overfit the teacher's biased behavior, resulting in poor model generalization. Recently, researchers have sought to combine IL and reinforcement learning (RL) to overcome overfitting and enhance model generalization. However, these methods still face the problem of expensive trajectory annotation. We propose a hierarchical RL-based method—discovering intrinsic subgoals via hierarchical (DISH) RL—which overcomes the generalization limitations of current methods and gets rid of expensive label annotations. First, the high-level agent (manager) decomposes the complex navigation problem into simple intrinsic subgoals. Then, the low-level agent (worker) uses an intrinsic subgoal-driven attention mechanism for action prediction in a smaller state space. We place no constraints on the semantics that subgoals may convey, allowing the agent to autonomously learn intrinsic, more generalizable subgoals from navigation tasks. Furthermore, we design a novel history-aware discriminator (HAD) for the worker. The discriminator incorporates historical information into subgoal discrimination and provides the worker with additional intrinsic rewards to alleviate the reward sparsity. Without labeled actions, our method provides supervision for the worker in the form of self-supervision by generating subgoals from the manager. The final results of multiple comparison experiments on the Room-to-Room (R2R) dataset show that our DISH can significantly outperform the baseline in accuracy and efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI