False Discovery Rate Control via Data Splitting

错误发现率 计算机科学 数学 数据挖掘 统计 生物 生物化学 基因
作者
Chenguang Dai,Buyu Lin,Xin Xing,Jun S. Liu
标识
DOI:10.1080/01621459.2022.2060113
摘要

Selecting relevant features associated with a given response variable is an important problem in many scientific fields. Quantifying quality and uncertainty of a selection result via false discovery rate (FDR) control has been of recent interest. This article introduces a data-splitting method (referred to as "DS") to asymptotically control the FDR while maintaining a high power. For each feature, DS constructs a test statistic by estimating two independent regression coefficients via data splitting. FDR control is achieved by taking advantage of the statistic's property that, for any null feature, its sampling distribution is symmetric about zero; whereas for a relevant feature, its sampling distribution has a positive mean. Furthermore, a Multiple Data Splitting (MDS) method is proposed to stabilize the selection result and boost the power. Surprisingly, with the FDR under control, MDS not only helps overcome the power loss caused by data splitting, but also results in a lower variance of the false discovery proportion (FDP) compared with all other methods in consideration. Extensive simulation studies and a real-data application show that the proposed methods are robust to the unknown distribution of features, easy to implement and computationally efficient, and are often the most powerful ones among competitors especially when the signals are weak and correlations or partial correlations among features are high. Supplementary materials for this article are available online.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
乐乐应助贾舒涵采纳,获得10
1秒前
2秒前
2秒前
司徒诗蕾发布了新的文献求助10
3秒前
jeniwu发布了新的文献求助10
5秒前
green完成签到,获得积分10
5秒前
xx完成签到,获得积分10
7秒前
7秒前
科研通AI5应助陈cc采纳,获得10
8秒前
渤海少年发布了新的文献求助10
9秒前
TYJ发布了新的文献求助10
10秒前
YH2完成签到,获得积分10
11秒前
13秒前
子乔完成签到,获得积分10
13秒前
14秒前
李迅迅发布了新的文献求助10
14秒前
Aaaa完成签到,获得积分20
14秒前
闪闪应助猩心采纳,获得10
14秒前
16秒前
18秒前
18秒前
Jasper应助qin采纳,获得10
20秒前
渤海少年发布了新的文献求助10
21秒前
无敌剑士123完成签到,获得积分10
21秒前
21秒前
LHP发布了新的文献求助10
24秒前
小许要顺利毕业完成签到,获得积分10
25秒前
典雅问寒应助yxt采纳,获得10
25秒前
25秒前
科研小白发布了新的文献求助10
25秒前
火星上宛秋完成签到 ,获得积分10
26秒前
26秒前
27秒前
27秒前
木杉完成签到,获得积分10
28秒前
昔年若许完成签到,获得积分10
28秒前
拉拉发布了新的文献求助10
28秒前
领导范儿应助塇塇采纳,获得10
28秒前
英姑应助Li采纳,获得30
31秒前
31秒前
高分求助中
All the Birds of the World 4000
Production Logging: Theoretical and Interpretive Elements 3000
Les Mantodea de Guyane Insecta, Polyneoptera 2000
Machine Learning Methods in Geoscience 1000
Resilience of a Nation: A History of the Military in Rwanda 888
Musculoskeletal Pain - Market Insight, Epidemiology And Market Forecast - 2034 666
Crystal Nonlinear Optics: with SNLO examples (Second Edition) 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3734620
求助须知:如何正确求助?哪些是违规求助? 3278545
关于积分的说明 10010093
捐赠科研通 2995206
什么是DOI,文献DOI怎么找? 1643271
邀请新用户注册赠送积分活动 781024
科研通“疑难数据库(出版商)”最低求助积分说明 749214