Co-Attention Fusion Network for Multimodal Skin Cancer Diagnosis

串联（数学）卷积神经网络模式识别（心理学）保险丝（电气）人工智能计算机科学块（置换群论）深度学习模式模态（人机交互）融合特征提取特征（语言学）代表（政治）图像融合图像（数学）数学哲学法学电气工程社会学工程类几何学组合数学政治语言学社会科学政治学

作者

Xiaoyu He,Yong Wang,Shuang Zhao,Xiang Chen

出处

期刊：Pattern Recognition [Elsevier BV]
日期：2022-08-24 卷期号：133: 108990-108990 被引量：37

标识

DOI：10.1016/j.patcog.2022.108990

摘要

Recently, multimodal image-based methods have shown great performance in skin cancer diagnosis. These methods usually use convolutional neural networks (CNNs) to extract the features of two modalities (i.e., dermoscopy and clinical images), and fuse these features for classification. However, they commonly have the following two shortcomings: 1) the feature extraction processes of the two modalities are independent and lack cooperation, which may lead to limited representation ability of the extracted features, and 2) the multimodal fusion operation is a simple concatenation followed by convolutions, thus causing rough fusion features. To address these two issues, we propose a co-attention fusion network (CAFNet), which uses two branches to extract the features of dermoscopy and clinical images and a hyper-branch to refine and fuse these features at all stages of the network. Specifically, the hyper-branch is composed of multiple co-attention fusion (CAF) modules. In each CAF module, we first design a co-attention (CA) block with a cross-modal attention mechanism to achieve the cooperation of two modalities, which enhances the representation ability of the extracted features through mutual guidance between the two modalities. Following the CA block, we further propose an attention fusion (AF) block that dynamically selects appropriate fusion ratios to conduct the pixel-wise multimodal fusion, which can generate fine-grained fusion features. In addition, we propose a deep-supervised loss and a combined prediction method to obtain a more robust prediction result. The results show that CAFNet achieves the average accuracy of 76.8% on the seven-point checklist dataset and outperforms state-of-the-art methods.

求助该文献

最长约 10秒，即可获得该文献文件

Co-Attention Fusion Network for Multimodal Skin Cancer Diagnosis

今日热心研友