摘要
Metamodels, or surrogate models, have been proposed in the literature to reduce the resources (time/cost) invested in the design and optimization of engineering systems whose behavior is modeled using complex computer codes, in an area commonly known as simulation-based design optimization. Following the seminal paper of Sacks et al. (1989, “Design and Analysis of Computer Experiments,” Stat. Sci., 4(4), pp. 409–435), researchers have developed the field of design and analysis of computer experiments (DACE), focusing on different aspects of the problem such as experimental design, approximation methods, model fitting, model validation, and metamodeling-based optimization methods. Among these, model validation remains a key issue, as the reliability and trustworthiness of the results depend greatly on the quality of approximation of the metamodel. Typically, model validation involves calculating prediction errors of the metamodel using a data set different from the one used to build the model. Due to the high cost associated with computer experiments with simulation codes, validation approaches that do not require additional data points (samples) are preferable. However, it is documented that methods based on resampling, e.g., cross validation (CV), can exhibit oscillatory behavior during sequential/adaptive sampling and model refinement, thus making it difficult to quantify the approximation capabilities of the metamodels and/or to define rational stopping criteria for the metamodel refinement process. In this work, we present the results of a simulation experiment conducted to study the evolution of several error metrics during sequential model refinement, to estimate prediction errors, and to define proper stopping criteria without requiring additional samples beyond those used to build the metamodels. Our results show that it is possible to accurately estimate the predictive performance of Kriging metamodels without additional samples, and that leave-one-out CV errors perform poorly in this context. Based on our findings, we propose guidelines for choosing the sample size of computer experiments that use sequential/adaptive model refinement paradigm. We also propose a stopping criterion for sequential model refinement that does not require additional samples.