计算机科学
高斯分布
人工智能
算法
量子力学
物理
作者
Asher Spector,Lucas Janson
摘要
Model-X knockoffs (J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 (2018) 551–577) allows analysts to perform feature selection using almost any machine learning algorithm while provably controlling the expected proportion of false discoveries. This procedure involves constructing synthetic variables, called knockoffs, which effectively act as controls during feature selection. The gold standard for constructing knockoffs has been to minimize the mean absolute correlation (MAC) between features and their knockoffs, but, surprisingly, we prove this procedure can be powerless in extremely easy settings, including Gaussian linear models with correlated exchangeable features. The key problem is that minimizing the MAC creates joint dependencies between the features and knockoffs, which allow machine learning algorithms to reconstruct the effect of the features on the response using the knockoffs. To improve power, we propose generating knockoffs which minimize the reconstructability (MRC) of the features, and we demonstrate our proposal for Gaussian features by showing it is computationally efficient, robust, and powerful. We also prove that certain MRC knockoffs minimize a notion of estimation error in Gaussian linear models. Through extensive simulations, we show MRC knockoffs often dramatically outperform MAC-minimizing knockoffs, and we find no settings in which MAC-minimizing knockoffs outperform MRC knockoffs by more than a slight margin. We implement our methods and many others from the knockoffs literature in a new python package knockpy.
科研通智能强力驱动
Strongly Powered by AbleSci AI