构象异构
人工智能
计算机科学
特征(语言学)
卷积神经网络
模式识别(心理学)
特征提取
目标检测
特征学习
代表(政治)
卷积(计算机科学)
人工神经网络
物理
语言学
量子力学
政治
哲学
分子
法学
政治学
作者
Zhiliang Peng,Zonghao Guo,Huang Wei,Yaowei Wang,Lingxi Xie,Jianbin Jiao,Qi Tian,Qixiang Ye
标识
DOI:10.1109/tpami.2023.3243048
摘要
With convolution operations, Convolutional Neural Networks (CNNs) are good at extracting local features but experience difficulty to capture global representations. With cascaded self-attention modules, vision transformers can capture long-distance feature dependencies but unfortunately deteriorate local feature details. In this paper, we propose a hybrid network structure, termed Conformer, to take both advantages of convolution operations and self-attention mechanisms for enhanced representation learning. Conformer roots in feature coupling of CNN local features and transformer global representations under different resolutions in an interactive fashion. Conformer adopts a dual structure so that local details and global dependencies are retained to the maximum extent. We also propose a Conformer-based detector (ConformerDet), which learns to predict and refine object proposals, by performing region-level feature coupling in an augmented cross-attention fashion. Experiments on ImageNet and MS COCO datasets validate Conformer's superiority for visual recognition and object detection, demonstrating its potential to be a general backbone network.
科研通智能强力驱动
Strongly Powered by AbleSci AI