人工智能
计算机视觉
计算机科学
变压器
工程类
电气工程
电压
标识
DOI:10.1016/j.patcog.2024.110357
摘要
This paper introduces TrpViT, a novel triple attention vision transformer that efficiently captures both local and global features. The proposed architecture tackles global information acquisition by employing three complementary attention mechanisms in a unique attention block: Window, Dilated, and Channel attention. This attention block extracts spatially local features while expanding the receptive field to capture richer global context. By integrating this attention block with convolution, a new C-C-T-T architecture is formed. We rigorously evaluate TrpViT, demonstrating state-of-the-art performance on various computer vision tasks, including image classification, 2D and 3D object detection, instance segmentation, and low-level image colorization. Notably, TrpViT achieves strong accuracy across all parameter scales without additional data augmentation, highlighting its computational efficiency and effectiveness.
科研通智能强力驱动
Strongly Powered by AbleSci AI