计算机科学
模式识别(心理学)
人工智能
图像(数学)
保险丝(电气)
嵌入
特征(语言学)
多标签分类
图形
理论计算机科学
语言学
电气工程
工程类
哲学
作者
Yangtao Wang,Yanzhao Xie,Jiangfeng Zeng,Hanpin Wang,Lisheng Fan,Yufan Song
标识
DOI:10.1016/j.compeleceng.2022.108002
摘要
For multi-label image classification, existing studies either utilize a poor multi-step training workflow to explore the (local) relationships between the image target regions and their corresponding labels with attention mechanism or model the (global) label dependencies via graph convolution network (GCN) but fail to efficiently fuse these image features and label word vectors. To address these problems, we develop Cross-modal Fusion for Multi-label Image Classification with attention mechanism (termed as CFMIC), which combines attention mechanism and GCN to capture the local and global label dependencies simultaneously in an end-to-end manner. CFMIC mainly contains three key modules: (1) a feature extraction module with attention mechanism which helps generate the accurate feature of each input image by focusing on the relationships between image labels and image target regions, (2) a label co-occurrence embedding learning module with GCN which utilizes GCN to learn the relationships between different objects to generate the label co-occurrence embeddings and (3) a cross-modal fusion module with Multi-modal Factorized Bilinear pooling (termed as MFB) which efficiently fuses the above image features and label co-occurrence embeddings. Extensive experiments on MS-COCO and VOC2007 verify CFMIC greatly promotes the convergence efficiency and produces better classification results than the state-of-the-art approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI