串联(数学)
卷积神经网络
模式识别(心理学)
保险丝(电气)
人工智能
计算机科学
块(置换群论)
深度学习
模式
模态(人机交互)
融合
特征提取
特征(语言学)
代表(政治)
图像融合
图像(数学)
数学
哲学
语言学
社会科学
几何学
组合数学
社会学
政治
法学
政治学
电气工程
工程类
作者
Xiaoyu He,Yong Wang,Shuang Zhao,Xiang Chen
标识
DOI:10.1016/j.patcog.2022.108990
摘要
Recently, multimodal image-based methods have shown great performance in skin cancer diagnosis. These methods usually use convolutional neural networks (CNNs) to extract the features of two modalities (i.e., dermoscopy and clinical images), and fuse these features for classification. However, they commonly have the following two shortcomings: 1) the feature extraction processes of the two modalities are independent and lack cooperation, which may lead to limited representation ability of the extracted features, and 2) the multimodal fusion operation is a simple concatenation followed by convolutions, thus causing rough fusion features. To address these two issues, we propose a co-attention fusion network (CAFNet), which uses two branches to extract the features of dermoscopy and clinical images and a hyper-branch to refine and fuse these features at all stages of the network. Specifically, the hyper-branch is composed of multiple co-attention fusion (CAF) modules. In each CAF module, we first design a co-attention (CA) block with a cross-modal attention mechanism to achieve the cooperation of two modalities, which enhances the representation ability of the extracted features through mutual guidance between the two modalities. Following the CA block, we further propose an attention fusion (AF) block that dynamically selects appropriate fusion ratios to conduct the pixel-wise multimodal fusion, which can generate fine-grained fusion features. In addition, we propose a deep-supervised loss and a combined prediction method to obtain a more robust prediction result. The results show that CAFNet achieves the average accuracy of 76.8% on the seven-point checklist dataset and outperforms state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI