嵌入
职位(财务)
计算机科学
人工智能
卷积神经网络
编码(内存)
变压器
模式识别(心理学)
人工神经网络
计算机视觉
工程类
财务
电气工程
经济
电压
作者
Kai Jiang,Peng Peng,Youzao Lian,Weisheng Xu
标识
DOI:10.1016/j.jvcir.2022.103664
摘要
In contrast to Convolutional Neural Networks (CNNs), Vision Transformers (ViT) cannot capture sequence ordering of input tokens and require position embeddings. As a learnable fixed-dimension vector, the position embedding improves accuracy while limiting the migration of the model between different input sizes. Hence, this paper conducts an empirical study on position embeddings of pre-trained models, which mainly focuses on two questions: (1) What do the position embeddings learn from training? (2) How do the position embeddings affect the self-attention modules? This paper analyzes the pattern of position embedding in pre-trained models and finds that the linear combination of Gabor filters and edge markers can fit the learned position embeddings well. The Gabor filters and edge markers can occupy some channels to append the position information, and the edge markers have flowed to values in self-attention modules. The experimental results can guide future work to choose suitable position embeddings.
科研通智能强力驱动
Strongly Powered by AbleSci AI