变压器
计算机科学
分割
编码器
人工智能
卷积神经网络
工程类
电气工程
电压
操作系统
作者
Yang Liu,Yao Zhang,Yixin Wang,Feng Hou,Jin Yuan,Jiang Tian,Y.S. Zhang,Zhongchao Shi,Jianping Fan,Zhiqiang He
出处
期刊:IEEE transactions on neural networks and learning systems
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:: 1-21
被引量:110
标识
DOI:10.1109/tnnls.2022.3227717
摘要
Transformer, an attention-based encoder-decoder model, has already revolutionized the field of natural language processing (NLP). Inspired by such significant achievements, some pioneering works have recently been done on employing Transformer-liked architectures in the computer vision (CV) field, which have demonstrated their effectiveness on three fundamental CV tasks (classification, detection, and segmentation) as well as multiple sensory data stream (images, point clouds, and vision-language data). Because of their competitive modeling capabilities, the visual Transformers have achieved impressive performance improvements over multiple benchmarks as compared with modern convolution neural networks (CNNs). In this survey, we have reviewed over 100 of different visual Transformers comprehensively according to three fundamental CV tasks and different data stream types, where taxonomy is proposed to organize the representative methods according to their motivations, structures, and application scenarios. Because of their differences on training settings and dedicated vision tasks, we have also evaluated and compared all these existing visual Transformers under different configurations. Furthermore, we have revealed a series of essential but unexploited aspects that may empower such visual Transformers to stand out from numerous architectures, e.g., slack high-level semantic embeddings to bridge the gap between the visual Transformers and the sequential ones. Finally, two promising research directions are suggested for future investment. We will continue to update the latest articles and their released source codes at https://github.com/liuyang-ict/awesome-visual-transformers.
科研通智能强力驱动
Strongly Powered by AbleSci AI