Zhihong Wang,Huanchen Wang,Tingxi Yu,Wuping Zhang,Jiwan Han,Fuzhong Li
标识
DOI:10.1117/12.2671691
摘要
Genomic selection (GS) to estimate genomic estimated breeding values (GEBVs) of individuals by using high-density molecular markers covering a genome-wide range combined with phenotypic records or pedigree information has revolutionized animal and plant breeding. Support vector machines (SVM) have been shown to be an important method for implementing genomic selection, showing excellent prediction performance on a variety of traits, but the choice of hyperparameters and kernel functions has an important impact on the prediction performance. In this study, we integrated four kernel functions of SVM to construct a multiple kernel ensemble (MKE) learning framework and combined gradient boosting decision tree (GBDT), genomic best linear unbiased prediction (GBLUP) and random forest (RF) to predict GEBVs for three economic traits of milk fat percentage (MFP), milk yield (MY), and somatic cell score (SCS) in German Holstein dairy cattle. We also constructed an Optuna hyperparameter optimization (HO) framework and compared the prediction performance and time to find the optimal parameters with two commonly used grid search and random search methods. The results show that the MKE framework outperforms the single kernel SVM as well as several other machine learning (ML) algorithms, with an average improvement of 10% in prediction accuracy for the three traits. Besides, the MKE framework with Optuna optimization has the best predictive performance on each trait. Therefore, we believed that MKE is an efficient and stable GS method for phenotypes prediction.