马氏距离
协方差
偏最小二乘回归
均方误差
估计员
数学
分位数
统计
数据集
计算机科学
集合(抽象数据类型)
分位数回归
人工智能
模式识别(心理学)
程序设计语言
作者
Xudong Huang,Guangzao Huang,Xiaojing Chen,Zhonghao Xie,Shujat Ali,Xi Chen,Lei‐ming Yuan,Wen Shi
标识
DOI:10.1016/j.chemolab.2024.105120
摘要
Partial least squares (PLS) regression is a linear regression technique that performs well with high-dimensional regressors. Similar to many other supervised learning techniques, PLS is susceptible to the problem that the prediction and training data are drawn from different distributions, which deteriorates the PLS performance. To address this problem, an adaptive strategy via the minimum covariance determinant (MCD) estimator is proposed to improve the PLS model, which aims to find an appropriate training set for the adaptive construction of an accurate PLS model to fit the prediction data. In this study, an h-subset of the merged set of prediction and training data with the smallest covariance determinant is found via the MCD estimator, and the prediction and training data with Mahalanobis distances to the h-subset less than or equal to a cutoff that is the square root of a quantile of the chi-squared distribution are assumed to have the same distribution, then a PLS model is built on these training data. The proposed method is applied to three real-world datasets and compared with the results of classic PLS, the most significant improvement is obtained for the m5 prediction data in the corn dataset, where the root mean square error of prediction (RMSEP) is reduced from 0.149 to 0.023. For other datasets, our method can also perform better than PLS. The experimental results show the effectiveness of our method.
科研通智能强力驱动
Strongly Powered by AbleSci AI