计算机科学
人工智能
机器学习
任务(项目管理)
Boosting(机器学习)
监督学习
模式识别(心理学)
深度学习
人工神经网络
经济
管理
作者
Ozan Ciga,Tengteng Xu,Anne L. Martel
标识
DOI:10.1016/j.mlwa.2021.100198
摘要
Unsupervised learning has been a long-standing goal of machine learning and is especially important for medical image analysis, where the learning can compensate for the scarcity of labeled datasets. A promising subclass of unsupervised learning is self-supervised learning, which aims to learn salient features using the raw input as the learning signal. In this work, we tackle the issue of learning domain-specific features without any supervision to improve multiple task performances that are of interest to the digital histopathology community. We apply a contrastive self-supervised learning method to digital histopathology by collecting and pretraining on 57 histopathology datasets without any labels. We find that combining multiple multi-organ datasets with different types of staining and resolution properties improves the quality of the learned features. Furthermore, we find using more images for pretraining leads to a better performance in multiple downstream tasks, albeit there are diminishing returns as more unlabeled images are incorporated into the pretraining. Linear classifiers trained on top of the learned features show that networks pretrained on digital histopathology datasets perform better than ImageNet pretrained networks, boosting task performances by more than 28% in F1 scores on average. Interestingly, we did not observe a consistent correlation between the pretraining dataset site or the organ versus the downstream task (e.g., pretraining with only breast images does not necessarily lead to a superior downstream task performance for breast-related tasks). These findings may also be useful when applying newer contrastive techniques to histopathology data. Pretrained PyTorch models are made publicly available at https://github.com/ozanciga/self-supervised-histopathology.
科研通智能强力驱动
Strongly Powered by AbleSci AI