卷积神经网络
变压器
计算机科学
超参数
人工智能
事实上
医学影像学
机器学习
模式识别(心理学)
计算机视觉
工程类
政治学
电气工程
电压
法学
作者
Christos Matsoukas,Johan Fredin Haslum,Magnus Söderberg,Kevin Smith
出处
期刊:Cornell University - arXiv
日期:2021-01-01
被引量:72
标识
DOI:10.48550/arxiv.2108.09038
摘要
Convolutional Neural Networks (CNNs) have reigned for a decade as the de facto approach to automated medical image diagnosis. Recently, vision transformers (ViTs) have appeared as a competitive alternative to CNNs, yielding similar levels of performance while possessing several interesting properties that could prove beneficial for medical imaging tasks. In this work, we explore whether it is time to move to transformer-based models or if we should keep working with CNNs - can we trivially switch to transformers? If so, what are the advantages and drawbacks of switching to ViTs for medical image diagnosis? We consider these questions in a series of experiments on three mainstream medical image datasets. Our findings show that, while CNNs perform better when trained from scratch, off-the-shelf vision transformers using default hyperparameters are on par with CNNs when pretrained on ImageNet, and outperform their CNN counterparts when pretrained using self-supervision.
科研通智能强力驱动
Strongly Powered by AbleSci AI