计算机科学
人类视觉系统模型
人工智能
计算机视觉
图像(数学)
作者
Mengtang Li,Jie Zhu,Guoheng Huang,Chao Gou
标识
DOI:10.1109/icassp48485.2024.10447354
摘要
Scanpath refers to the trajectory of eye fixations when humans perform visual reasoning. Most existing methods mainly focus on predicting static attention maps, which represent the probability that each pixel in the image is paid attention to by humans. However, human gaze behavior is purposeful and dynamic, especially in the search for specific objects. Inspired by eye-movement mechanism of human vision system, a reinforcement learning method is introduced to imitate the human visual system to predict scanpath in target search. This paper also considers periphery-fovea vision and incorporates eye-movement behavior to improve the accuracy of scanpath prediction. Besides, the Contrastive Language-Image Pretraining (CLIP) text encoder is employed as the task embedding to convert target objects into vectors. Compared with the state-of-the-art (SOTA) models on COCO-Search18 dataset, our proposed method achieves comprehensively superior performance on fixations location and duration prediction.
科研通智能强力驱动
Strongly Powered by AbleSci AI