主成分分析
表达式(计算机科学)
数据挖掘
计算机科学
统计分析
领域(数学)
校长(计算机安全)
数据分析
多元分析
计算生物学
数据科学
机器学习
人工智能
生物
统计
数学
纯数学
程序设计语言
操作系统
作者
Hristo Todorov,David Fournier,Susanne Gerber
出处
期刊:Genomics and computational biology
[Kernel Press UG]
日期:2018-01-30
卷期号:4 (2): 100041-100041
被引量:41
标识
DOI:10.18547/gcb.2018.vol4.iss2.e100041
摘要
Advances in computational power have enabled research to generate significant amounts of data related to complex biological problems. Consequently, applying appropriate data analysis techniques has become paramount to tackle this complexity. However, theoretical understanding of statistical methods is necessary to ensure that the correct method is used and that sound inferences are made based on the analysis. In this article, we elaborate on the theory behind principal components analysis (PCA), which has become a favoured multivariate statistical tool in the field of omics-data analysis. We discuss the necessary prerequisites and steps to produce statistically valid results and provide guidelines for interpreting the output. Using PCA on gene expression data from a mouse experiment, we demonstrate that the main distinctive pattern in the data is associated with the transgenic mouse line and is not related to the mouse gender. A weaker association of the pattern with the genotype was also identified.
科研通智能强力驱动
Strongly Powered by AbleSci AI