Out-of-distribution generalization from labelled and unlabelled gene expression data for drug response prediction

一般化计算机科学药物基因组学一致性（知识库）学习迁移标记数据药物反应分布（数学）领域（数学分析）人工智能机器学习数据挖掘药品数学生物信息学医学生物数学分析精神科

作者

Hossein Sharifi-Noghabi,Parsa Alamzadeh Harjandi,Olga Zolotareva,Colin C. Collins,Martin Ester

出处

期刊：Nature Machine Intelligence [Springer Nature]
日期：2021-11-11 卷期号：3 (11): 962-972 被引量：21

标识

摘要

Data discrepancy between preclinical and clinical datasets poses a major challenge for accurate drug response prediction based on gene expression data. Different methods of transfer learning have been proposed to address such data discrepancy in drug response prediction for different cancers. These methods generally use cell lines as source domains, and patients, patient-derived xenografts or other cell lines as target domains; however, it is assumed that the methods have access to the target domain during training or fine-tuning, and they can only take labelled source domains as input. The former is a strong assumption that is not satisfied during deployment of these models in the clinic, whereas the latter means these methods rely on labelled source domains that are of limited size. To avoid these assumptions, we formulate drug response prediction in cancer as an out-of-distribution generalization problem, which does not assume that the target domain is accessible during training. Moreover, to exploit unlabelled source domain data—which tends to be much more plentiful than labelled data—we adopt a semi-supervised approach. We propose Velodrome, a semi-supervised method of out-of-distribution generalization that takes labelled and unlabelled data from different resources as input and makes generalizable predictions. Velodrome achieves this goal by introducing an objective function that combines a supervised loss for accurate prediction, an alignment loss for generalization and a consistency loss to incorporate unlabelled samples. Our experimental results demonstrate that Velodrome outperforms state-of-the-art pharmacogenomics and transfer learning baselines on cell lines, patient-derived xenografts and patients. Finally, we showed that Velodrome models generalize to different tissue types that were well-represented, under-represented or completely absent in the training data. Overall, our results suggest that Velodrome may guide precision oncology more accurately.

求助该文献

最长约 10秒，即可获得该文献文件

Out-of-distribution generalization from labelled and unlabelled gene expression data for drug response prediction

今日热心研友