隐藏字幕
计算机科学
词(群论)
人工智能
过程(计算)
加权
自然语言处理
作者
Wenhui Jiang,Qin Li,Kun Zhan,Yuming Fang,Fei Shen
出处
期刊:Displays
[Elsevier]
日期:2022-05-01
卷期号:: 102238-102238
标识
DOI:10.1016/j.displa.2022.102238
摘要
Machine attention mechanisms are widely used in the task of image captioning. Such mechanisms dynamically focus on different regions to guide the word generation process. However, existing attention models may fail to concentrate on correct regions and mislead the word prediction without explicit supervision. In this study, we exploit the human captioning attention encoding rich information that human beings perceive during captioning, and propose a novel Hybrid Attention Network (HAN) that incorporates the prevailing machine attention mechanisms with human captioning attention. The proposed HAN addresses the problem of “object hallucination” by re-weighting bottom-up attention, and improves the diversity of the generated captioning by complementing top-down attention with human captioning attention. Extensive experiments are conducted on Flickr30K and MS COCO datasets, demonstrating that the proposed method effectively improves the performance of the current image captioning methods. • We explore how human captioning attention can strengthen image captioning. • We improve bottom-up attention with human attention through proposal re-weighting. • We complement top-down attention with human attention through adaptive fusion.
科研通智能强力驱动
Strongly Powered by AbleSci AI