主成分分析
修剪
降维
连锁不平衡
计算机科学
表达式(计算机科学)
选择(遗传算法)
特征选择
基因
计算生物学
人工智能
生物
遗传学
基因型
单倍型
农学
程序设计语言
作者
Jeremy Watts,Elexis Allen,Ahmad Mitoubsi,Anahita Khojandi,James Eales,Theodore Papamarkou
标识
DOI:10.1109/embc40787.2023.10340962
摘要
The majority of genes have a genetic component to their expression. Elastic nets have been shown effective at predicting tissue-specific, individual-level gene expression from genotype data. We apply principal component analysis (PCA), linkage disequilibrium pruning, or the combination of the two to reduce, or generate, a lower-dimensional representation of the genetic variants used as inputs to the elastic net models for the prediction of gene expression. Our results show that, in general, elastic nets attain their best performance when all genetic variants are included as inputs; however, a relatively low number of principal components can effectively summarize the majority of genetic variation while reducing the overall computation time. Specifically, 100 principal components reduce the computational time of the models by over 80% with only an 8% loss in R 2 . Finally, linkage disequilibrium pruning does not effectively reduce the genetic variants for predicting gene expression. As predictive models are commonly made for over 27,000 genes for more than 50 tissues, PCA may provide an effective method for reducing the computational burden of gene expression analysis.
科研通智能强力驱动
Strongly Powered by AbleSci AI