计算机科学
频道(广播)
架空(工程)
人工智能
特征(语言学)
成对比较
编码(内存)
模式识别(心理学)
特征提取
特征学习
维数之咒
维数(图论)
比例(比率)
降维
代表(政治)
数学
物理
操作系统
哲学
量子力学
语言学
纯数学
计算机网络
政治
政治学
法学
作者
Daliang Ouyang,Su He,Guozhong Zhang,Mingzhu Luo,Huaiyong Guo,Jianming Zhan,Zhijie Huang
标识
DOI:10.1109/icassp49357.2023.10096516
摘要
Remarkable effectiveness of the channel or spatial attention mechanisms for producing more discernible feature representation are illustrated in various computer vision tasks. However, modeling the cross-channel relationships with channel dimensionality reduction may bring side effect in extracting deep visual representations. In this paper, a novel efficient multi-scale attention (EMA) module is proposed. Focusing on retaining the information on per channel and decreasing the computational overhead, we reshape the partly channels into the batch dimensions and group the channel dimensions into multiple sub-features which make the spatial semantic features well-distributed inside each feature group. Specifically, apart from encoding the global information to re-calibrate the channel-wise weight in each parallel branch, the output features of the two parallel branches are further aggregated by a cross-dimension interaction for capturing pixel-level pairwise relationship. We conduct extensive ablation studies and experiments on image classification and object detection tasks with popular benchmarks (e.g., CIFAR-100, ImageNet-1k, MS COCO and VisDrone2019) for evaluating its performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI