数据集
回归分析
计算机科学
数据挖掘
集合(抽象数据类型)
度量(数据仓库)
统计
交叉验证
回归
数学
余数
算法
算术
程序设计语言
出处
期刊:Technometrics
[Informa]
日期:1977-11-01
卷期号:19 (4): 415-428
被引量:1174
标识
DOI:10.1080/00401706.1977.10489581
摘要
Methods to determine the validity of regression models include comparison of model predictions and coefficients with theory, collection of new data to check model predictions. comparison of results with theoretical model calculations, and data splitting or cross-validation in which a portion of the data is used to estimate the model coefficients, and the remainder of the data is used to measure the prediction accuracy of the model. An expository review of these methods is presented. It is concluded that data splitting is an effective method of model validation when it is not practical to collect new data to test the model. The DUPLEX algorithm, developed by R. W. Kennard, is recommended for dividing the data into the estimation set and prediction set when there is no obvious variable such as time to use as a basis to split the data. Several examples are included to illustrate the various methods of model validation.
科研通智能强力驱动
Strongly Powered by AbleSci AI