计算机科学
范畴变量
人口
基因组学
预处理器
人工智能
机器学习
数据挖掘
生物
遗传学
基因组
基因
人口学
社会学
作者
Honggang Zhao,Wenlu Wang
标识
DOI:10.1109/icdmw58026.2022.00052
摘要
Many factors impact trait prediction from genotype data. One of the major confounding factors comes from the presence of population structure among sampled individuals, namely population stratification. When exists, it will lead to biased quantitative phenotype prediction, therefore hampering the unambiguous conclusions about prediction and limiting the downstream usage like disease evaluation or epidemiology survey. Population stratification is an implicit bias that can not be easily removed by data preprocessing. With the purpose of training a phenotype prediction model, we propose an adversarial training framework that ensures the genomics encoder is agnostic to sample populations. For better generalization, our adversarial training framework is orthogonal to the genomics encoder and phenotype prediction model. We experimentally ascertain our debiasing framework by testing on a real-world yield (phenotype) prediction dataset with soybean genomics. The developed frame-work is designed for general genomic data (e.g., human, livestock, and crops) while the phenotype can be either continuous or categorical variables.
科研通智能强力驱动
Strongly Powered by AbleSci AI