混淆
观察研究
流行病学
选择偏差
统计
选择(遗传算法)
样本量测定
工具变量
变量(数学)
回归分析
计量经济学
对比度(视觉)
临床研究设计
计算机科学
医学
数学
临床试验
机器学习
人工智能
数学分析
病理
内科学
作者
Christian Staerk,Alliyah U Byrd,Andreas Mayr
摘要
Variable selection in regression models is a particularly important issue in epidemiology, where one usually encounters observational studies. In contrast to randomized trials or experiments, confounding is often not controlled by the study design, but has to be accounted for by suitable statistical methods. For instance, when risk factors should be identified with unconfounded effect estimates, multivariable regression techniques can help to adjust for confounders. We investigated the current practice of variable selection in four major epidemiological journals in 2019 and found that the majority of articles used subject-matter knowledge to determine a priori the set of included variables. In comparison with previous reviews from 2008 and 2015, fewer articles applied data-driven variable selection. Furthermore, for most articles the main aim of analysis was hypothesis-driven effect estimation in rather low-dimensional data situations (i.e., large sample size compared to the number of variables). Based on our results we discuss the role of data-driven variable selection in epidemiology.
科研通智能强力驱动
Strongly Powered by AbleSci AI