Trials and tribulations of ‘omics data analysis: assessing quality of SIMCA-based multivariate models using examples from pulmonary medicine

单变量 假阳性悖论 组学 错误发现率 多元统计 邦费罗尼校正 计算机科学 预处理器 规范化(社会学) 数据挖掘 贝叶斯概率 多重比较问题 计数数据 统计 生物信息学 机器学习 人工智能 数学 生物 泊松分布 生物化学 社会学 基因 人类学
作者
Åsa M. Wheelock,Craig E. Wheelock
出处
期刊:Molecular BioSystems [Royal Society of Chemistry]
卷期号:9 (11): 2589-2589 被引量:288
标识
DOI:10.1039/c3mb70194h
摘要

Respiratory diseases are multifactorial heterogeneous diseases that have proved recalcitrant to understanding using focused molecular techniques. This trend has led to the rise of 'omics approaches (e.g., transcriptomics, proteomics) and subsequent acquisition of large-scale datasets consisting of multiple variables. In 'omics technology-based investigations, discrepancies between the number of variables analyzed (e.g., mRNA, proteins, metabolites) and the number of study subjects constitutes a major statistical challenge. The application of traditional univariate statistical methods (e.g., t-test) to these "short-and-wide" datasets may result in high numbers of false positives, while the predominant approach of p-value correction to account for these high false positive rates (e.g., FDR, Bonferroni) are associated with significant losses in statistical power. In other words, the benefit in decreased false positives must be counterbalanced with a concomitant loss in true positives. As an alternative, multivariate statistical analysis (MVA) is increasingly being employed to cope with 'omics-based data structures. When properly applied, MVA approaches can be powerful tools for integration and interpretation of complex 'omics-based datasets towards the goal of identifying biomarkers and/or subphenotypes. However, MVA methods are also prone to over-interpretation and misuse. A common software used in biomedical research to perform MVA-based analyses is the SIMCA package, which includes multiple MVA methods. In this opinion piece, we propose guidelines for minimum reporting standards for a SIMCA-based workflow, in terms of data preprocessing (e.g., normalization, scaling) and model statistics (number of components, R2, Q2, and CV-ANOVA p-value). Examples of these applications in recent COPD and asthma studies are provided. It is expected that readers will gain an increased understanding of the power and utility of MVA methods for applications in biomedical research.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Return完成签到,获得积分10
刚刚
1秒前
1秒前
哎吆歪完成签到,获得积分10
1秒前
HMO_eee发布了新的文献求助10
3秒前
tdtk发布了新的文献求助10
3秒前
一点点完成签到,获得积分10
4秒前
坦率发布了新的文献求助10
4秒前
缇娜完成签到,获得积分10
4秒前
包容新蕾发布了新的文献求助10
5秒前
5秒前
哎吆歪发布了新的文献求助10
6秒前
缇娜发布了新的文献求助10
10秒前
Pawn发布了新的文献求助10
11秒前
Theprisoners举报嘿嘿求助涉嫌违规
11秒前
顾矜应助James采纳,获得10
12秒前
zwy完成签到 ,获得积分10
14秒前
14秒前
15秒前
梁漂亮完成签到 ,获得积分10
15秒前
jun_luo完成签到,获得积分20
17秒前
Rondab应助微笑的语芙采纳,获得10
17秒前
deallyxyz应助ajxtt采纳,获得10
18秒前
Lucas应助阿拉采纳,获得10
20秒前
Julien完成签到,获得积分10
20秒前
jun_luo发布了新的文献求助10
20秒前
Pawn完成签到,获得积分10
21秒前
22秒前
22秒前
24秒前
俏皮安双完成签到,获得积分10
25秒前
落后易绿发布了新的文献求助10
29秒前
30秒前
James发布了新的文献求助10
31秒前
lw不好找完成签到,获得积分10
32秒前
冲啊皮卡丘完成签到,获得积分10
33秒前
文文完成签到,获得积分10
33秒前
snowskating发布了新的文献求助10
36秒前
36秒前
cc完成签到,获得积分10
37秒前
高分求助中
The Mother of All Tableaux: Order, Equivalence, and Geometry in the Large-scale Structure of Optimality Theory 3000
A new approach to the extrapolation of accelerated life test data 1000
Problems of point-blast theory 400
北师大毕业论文 基于可调谐半导体激光吸收光谱技术泄漏气体检测系统的研究 390
Phylogenetic study of the order Polydesmida (Myriapoda: Diplopoda) 370
Robot-supported joining of reinforcement textiles with one-sided sewing heads 320
The Cambridge Handbook of Social Theory 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 3999380
求助须知:如何正确求助?哪些是违规求助? 3538707
关于积分的说明 11275016
捐赠科研通 3277597
什么是DOI,文献DOI怎么找? 1807615
邀请新用户注册赠送积分活动 883967
科研通“疑难数据库(出版商)”最低求助积分说明 810101