过度拟合
回归
回归分析
逻辑回归
预测建模
样本量测定
统计
机器学习
医学
计算机科学
计量经济学
数据挖掘
人工智能
数学
人工神经网络
作者
Harrell Fe,Lee Kl,David B. Matchar,Reichert Ta
出处
期刊:PubMed
日期:1985-10-01
卷期号:69 (10): 1071-77
被引量:556
摘要
Multiple regression models have wide applicability in predicting the outcome of patients with a variety of diseases. However, many researchers are using such models without validating the necessary assumptions. All too frequently, researchers also "overfit" the data by developing models using too many predictor variables and insufficient sample sizes. Models developed in this way are unlikely to stand the test of validation on a separate patient sample. Without attempting such a validation, the researcher remains unaware that overfitting has occurred. When the ratio of the number of patients suffering endpoints to the number of potential predictors is small (say less than 10), data reduction methods are available that can greatly improve the performance of regression models. Regression models can make more accurate predictions than other methods such as stratification and recursive partitioning, when model assumptions are thoroughly examined; steps are taken (ie, choosing another model or transforming the data) when assumptions are violated; and the method of model formulation does not result in overfitting the data.
科研通智能强力驱动
Strongly Powered by AbleSci AI