作者
Shaopan Ye,Xiyi Zhou,Zhuojian Lai,Mhd Ikhwanuddin,Hongyu Ma
摘要
Genotype imputation is an attractive approach to obtain whole genome sequencing (WGS) data at a low cost. However, the availability of imputed WGS data mainly depended on imputation accuracy. Balancing influencing factors to enhance imputation accuracy is crucial, particularly in aquaculture. In the present study, we downloaded 361 whole genome re-sequencing data of Nile tilapia to construct different reference panels for determining the influence of several key factors on imputation accuracy systematically, including the reference panel type, the haplotype phasing and imputation software, the reference panel size, the key individual selection strategies, and the composition of the combined reference panel. Results showed that the imputation accuracy has no significant difference (P = 0.3) using pre-phasing data obtained from Beagle5, Eagle2, and Shapeit4, but Beagle5 has the highest computational efficiency. But for imputation software, both Beagle5 and Impute5 were more suitable for combined and external reference panels with a large reference size, and Minimac4 was suitable for internal reference panels, especially for a small reference size. Furthermore, it would always improve the imputation accuracy by increasing the size of the reference panel, nevertheless, a larger combined reference size does not necessarily result in a higher imputation accuracy. When the number of external individuals increased from 5 to 250, the average imputation accuracy of the combined reference panel descended from 0.942 to 0.899 for Minimac4, which is always higher than the internal reference panel (0.866). Compared with minimizing the average distance to the closest leaf (ADCL) and randomly selecting individuals (RAN), it always had slightly higher accuracy using maximizing the expected genetic relationship (REL) method to select key individuals to construct an internal reference panel for imputation. However, it has zero or negative growth on imputation accuracy when using selection strategies to select internal or external individuals to construct a combined reference panel for imputation. In conclusion, using a combined reference panel provided greater imputation accuracy, but the optimal genotype imputation strategy needs to balance the actual situation carefully and comprehensively. Our work sheds light on how to design and execute the genotype imputation in aquaculture.