计算机科学
特征(语言学)
串联(数学)
卷积(计算机科学)
水准点(测量)
骨料(复合)
插值(计算机图形学)
频道(广播)
人工智能
模式识别(心理学)
对象(语法)
图层(电子)
数据挖掘
特征学习
图像(数学)
人工神经网络
数学
计算机网络
哲学
语言学
材料科学
化学
大地测量学
有机化学
组合数学
复合材料
地理
作者
Hao Li,Changming Song,Dongxu Cheng,Zhenghui Li,Caihong Wu,Kang Chen
标识
DOI:10.1016/j.eswa.2024.123218
摘要
Aggregating features at various levels or scales has been empirically demonstrated to enhance feature representations in object detection. However, existing approaches tend to aggregate features or embed contextual information indiscriminately through simple concatenation or addition, which disregards the misalignment resulting from repeated sampling operations. This paper proposes a feature-aligned network based on YOLOv5 to address the misalignment issues, namely AlignYOLO. The network consists of three primary modules: the self-attention convolution (SAC) module, the feature aggregation and alignment (FAA) module, and the multiscale aligned channel attention (MSACA) module. Firstly, the SAC module comprehensively extracts information by simultaneously employing both convolution and self-attention. Secondly, the FAA module aggregates features across layers and aligns them through the adoption of a learnable interpolation strategy. Lastly, the MSACA module employs multiscale convolution to capture contextual information. The in-layer features are aligned with the learnable interpolation strategy. Additionally, channel attention is leveraged to enhance feature representations. Extensive experiments are conducted on benchmark datasets to evaluate the effectiveness of the proposed method, where AlignYOLO outperforms state-of-the-art detectors.
科研通智能强力驱动
Strongly Powered by AbleSci AI