计算机科学
人工智能
模式识别(心理学)
利用
卷积神经网络
空间语境意识
图形
上下文图像分类
特征(语言学)
机器学习
图像(数学)
理论计算机科学
计算机安全
语言学
哲学
作者
Wei Zhou,Zhijie Zheng,Tao Su,Haifeng Hu
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology
[Institute of Electrical and Electronics Engineers]
日期:2023-06-09
卷期号:34 (1): 342-356
被引量:1
标识
DOI:10.1109/tcsvt.2023.3284812
摘要
Multi-label image classification is a fundamental yet challenging task, which aims to predict the labels associated with a given image. Most of previous methods directly exploit the high-level features from the last layer of convolutional neural network for classification. However, these methods cannot obtain global features due to the limited size of convolutional kernels, and they fail to extract multi-scale features to effectively recognize small-scale objects in the images. Recent studies exploit the graph convolution network to model the label correlations for boosting the classification performance. Despite substantial progress, these methods rely on manually pre-defined graph structures. Besides, they ignore the associations between semantic labels and image regions, and do not fully explore the spatial context of images. To address above issues, we propose a novel Dual Attention Transformer (DATran) model, which adopts a dual-stream architecture that simultaneously learns spatial and channel correlations from multi-label images. Firstly, in order to solve the problem that current methods are difficult to recognize small-size objects, we develop a new multi-scale feature fusion (MSFF) module to generate multi-scale feature representation by jointly integrating both high-level semantics and low-level details. Secondly, we design a prior-enhanced spatial attention (PSA) module to learn the long-range correlation between objects from different spatial positions in images to enhance the model performance. Thirdly, we devise a prior-enhanced channel attention (PCA) module to capture the inter-dependencies between different channel maps, thus effectively improving the correlation between semantic categories. It is worth noting that PSA module and PCA module complement and promote each other to further augment the feature representations. Finally, the outputs of these two attention modules are fused to obtain the final features for classification. Performance evaluation experiments are conducted on MS-COCO 2014, PASCAL VOC 2007 and VG-500 datasets, demonstrating that DATran model achieves better performance than current state-of-the-art models.
科研通智能强力驱动
Strongly Powered by AbleSci AI