隐藏字幕
计算机科学
变压器
循环神经网络
注意力网络
人工智能
作者
Yiwei Wei,Chunlei Wu,Guohe Li,Haitao Shi
标识
DOI:10.1016/j.engappai.2021.104574
摘要
Attention-based approaches have been firmly established the state of the art in image captioning tasks. However, both the recurrent attention in recurrent neural network (RNN) and the self attention in transformer have limitations. Recurrent attention only takes the external state to decide where to look, while ignoring to discover the internal relationships between image regions. Self attention is just the opposite. To fill this gap, we firstly introduce an Outside-in Attention that makes the external state participate in the interaction of the image regions. And, it prompts the model to learn the dependency inside the image regions, as well as the dependency between image regions and the external state. Then, we investigate a Sequential Transformer Framework (S-Transformer) based on the original Transformer structure, where the decoder is incorporated with the Outside-in Attention and RNN. This framework can help the model to inherit the advantages of both the transformer and recurrent network in sequence modeling. When tested on COCO dataset, the proposed approaches achieve competitive results in single-model and ensemble configurations on both MSCOCO Karpathy test split and the online test server.
科研通智能强力驱动
Strongly Powered by AbleSci AI