DATran: Dual Attention Transformer for Multi-Label Image Classification

计算机科学人工智能模式识别（心理学）利用卷积神经网络空间语境意识图形上下文图像分类特征（语言学）机器学习图像（数学）理论计算机科学计算机安全语言学哲学

作者

Wei Zhou,Zhijie Zheng,Tao Su,Haifeng Hu

出处

期刊：IEEE Transactions on Circuits and Systems for Video Technology [Institute of Electrical and Electronics Engineers]
日期：2023-06-09 卷期号：34 (1): 342-356 被引量：1

标识

DOI：10.1109/tcsvt.2023.3284812

摘要

Multi-label image classification is a fundamental yet challenging task, which aims to predict the labels associated with a given image. Most of previous methods directly exploit the high-level features from the last layer of convolutional neural network for classification. However, these methods cannot obtain global features due to the limited size of convolutional kernels, and they fail to extract multi-scale features to effectively recognize small-scale objects in the images. Recent studies exploit the graph convolution network to model the label correlations for boosting the classification performance. Despite substantial progress, these methods rely on manually pre-defined graph structures. Besides, they ignore the associations between semantic labels and image regions, and do not fully explore the spatial context of images. To address above issues, we propose a novel Dual Attention Transformer (DATran) model, which adopts a dual-stream architecture that simultaneously learns spatial and channel correlations from multi-label images. Firstly, in order to solve the problem that current methods are difficult to recognize small-size objects, we develop a new multi-scale feature fusion (MSFF) module to generate multi-scale feature representation by jointly integrating both high-level semantics and low-level details. Secondly, we design a prior-enhanced spatial attention (PSA) module to learn the long-range correlation between objects from different spatial positions in images to enhance the model performance. Thirdly, we devise a prior-enhanced channel attention (PCA) module to capture the inter-dependencies between different channel maps, thus effectively improving the correlation between semantic categories. It is worth noting that PSA module and PCA module complement and promote each other to further augment the feature representations. Finally, the outputs of these two attention modules are fused to obtain the final features for classification. Performance evaluation experiments are conducted on MS-COCO 2014, PASCAL VOC 2007 and VG-500 datasets, demonstrating that DATran model achieves better performance than current state-of-the-art models.

求助该文献

最长约 10秒，即可获得该文献文件

DATran: Dual Attention Transformer for Multi-Label Image Classification

今日热心研友