Discovering Intrinsic Subgoals for Vision-and-Language Navigation via Hierarchical Reinforcement Learning

计算机科学强化学习过度拟合一般化人工智能鉴别器人机交互语义学（计算机科学）弹道水准点（测量）语言模型机器学习人工神经网络数学分析电信物理数学大地测量学天文探测器程序设计语言地理

作者

Jiawei Wang,Teng Wang,Lele Xu,Zichen He,Changyin Sun

出处

期刊：IEEE transactions on neural networks and learning systems [Institute of Electrical and Electronics Engineers]
日期：2024-01-01 卷期号：: 1-13

链接

nih.govdoi.org

标识

DOI：10.1109/tnnls.2024.3398300

摘要

Vision-and-language navigation requires an agent to navigate in a photo-realistic environment by following natural language instructions. Mainstream methods employ imitation learning (IL) to let the agent imitate the behavior of the teacher. The trained model will overfit the teacher's biased behavior, resulting in poor model generalization. Recently, researchers have sought to combine IL and reinforcement learning (RL) to overcome overfitting and enhance model generalization. However, these methods still face the problem of expensive trajectory annotation. We propose a hierarchical RL-based method—discovering intrinsic subgoals via hierarchical (DISH) RL—which overcomes the generalization limitations of current methods and gets rid of expensive label annotations. First, the high-level agent (manager) decomposes the complex navigation problem into simple intrinsic subgoals. Then, the low-level agent (worker) uses an intrinsic subgoal-driven attention mechanism for action prediction in a smaller state space. We place no constraints on the semantics that subgoals may convey, allowing the agent to autonomously learn intrinsic, more generalizable subgoals from navigation tasks. Furthermore, we design a novel history-aware discriminator (HAD) for the worker. The discriminator incorporates historical information into subgoal discrimination and provides the worker with additional intrinsic rewards to alleviate the reward sparsity. Without labeled actions, our method provides supervision for the worker in the form of self-supervision by generating subgoals from the manager. The final results of multiple comparison experiments on the Room-to-Room (R2R) dataset show that our DISH can significantly outperform the baseline in accuracy and efficiency.

求助该文献

最长约 10秒，即可获得该文献文件

Discovering Intrinsic Subgoals for Vision-and-Language Navigation via Hierarchical Reinforcement Learning

今日热心研友