Optimizing machine learning models for predicting soil pH and total P in intact soil profiles with visible and near-infrared reflectance (VNIR) spectroscopy
Machine learning (ML) models have recently been used in visible and near-infrared reflectance (VNIR) spectroscopy applications. However, the predictive performance of ML models is data-specific and depends strongly on the selected hyperparameters. This study aimed to test the hyperparameter optimization methods on the three ML models (cubist regression tree, Cubist; support vector machine regression, SVMR; and extreme gradient boosting, XGBoost) for predicting the soil pH and total phosphorus (TP) in intact soil profiles to a depth of 100 ± 5 cm. The VNIR spectra of nineteen intact soil profiles from several typical soil types in China were recorded. To determine the optimal hyperparameters of these ML models, a new Bayesian optimization (BO) strategy was introduced and compared to the standard grid search (GS) approach. The accuracy of the models was compared with the partial least squares regression (PLSR) model in terms of the root mean square error (RMSE), the coefficient of determination (R2), and Lin's concordance correlation coefficient (LCC). Overall, the results showed that the BO-based models performed similarly to the GS-based models for soil pH and TP predictions. However, the BO method was more efficient for tuning the hyperparameter values and had a considerably lower computational cost than the GS method. The tested ML models performed better than the PLSR models in all cases. Among the three ML techniques, the SVMR model achieved the best performance in terms of predicting soil pH and TP. When the SVMR model was used on the testing set, the RMSE and R2 for soil pH were 0.26–0.27 and 0.97, respectively, while those for TP were 0.06 g kg−1 and 0.85–0.87, respectively. Both soil properties were predicted with excellent agreement (LCC ≥ 0.92). It can be concluded that the SVMR model coupled with the BO method is suitable for accurately predicting soil pH and TP in intact soil profiles with VNIR spectroscopy.