摘要
Ensuring food security for the ever-growing population is a common mission and a great challenge for agricultural scientists worldwide. Historically, advances in crop breeding and management practices have contributed substantially to crop productivity. Indeed, the substantial increase in global grain yields over the last eight decades is largely due to the adoption of hybrids. However, the rate of increase of hybrid yields began to slow down in the early 2000s, and since then, it has reached a plateau for many crops and regions (https://faostat.fao.org). Therefore, we must find solutions to accelerating genetic gain and boost hybrid development, for which developing new breeding technologies provides novel creative opportunities. Traditionally, parental lines of hybrids were obtained by continuous selfing and in-parallel selection for six to eight generations. Alternatively, in many crops, doubled-haploid (DH) technology can produce completely homozygous lines in two steps via haploid induction and chromosome doubling (Jacquier et al., 2021Jacquier N.M.A. Gilles L.M. Martinant J.-P. Rogowsky P.M. Widiez T. Maize in planta haploid inducer lines: a cornerstone for doubled haploid technology.in: Segui-Simarro J.M. Doubled Haploid Technology: Volume 2: Hot Topics, Apiaceae, Brassicaceae, Solanaceae. Springer US, New York, NY2021: 25-48Google Scholar). Subsequent phenotypic selection of DH lines in hybrid breeding normally comprises DH seed increase, line per se observation, and one or two stages of testcross evaluation with one or more testers in a multi-environment trial (MET). However, as the number of DH lines increases, the MET approach may be excessively large and thus unaffordable. Therefore, a key question that must be addressed in modern hybrid crop breeding now is how to evaluate DH lines effectively and efficiently. Genomic selection (GS) technology has been developed to evaluate individuals in a testing set based on genomic estimated breeding values obtained from a prediction model constructed from a training set (Crossa et al., 2017Crossa J. Perez-Rodriguez P. Cuevas J. Montesinos-Lopez O. Jarquin D. de Los Campos G. Burgueno J. Gonzalez-Camacho J.M. Perez-Elizalde S. Beyene Y. et al.Genomic selection in plant breeding: methods, models, and perspectives.Trends Plant Sci. 2017; 22: 961-975Google Scholar). This technology, with high-coverage molecular markers, is suitable for quantitative trait prediction and can partially replace field testing by genotyping only. Several biometric experimental designs (i.e., augmented design, α-lattice design) and statistical models (i.e., GBLUP, Bayesian-based models) have been implemented, accounting for different facets of genetic architecture and its interaction with environments (Crossa et al., 2017Crossa J. Perez-Rodriguez P. Cuevas J. Montesinos-Lopez O. Jarquin D. de Los Campos G. Burgueno J. Gonzalez-Camacho J.M. Perez-Elizalde S. Beyene Y. et al.Genomic selection in plant breeding: methods, models, and perspectives.Trends Plant Sci. 2017; 22: 961-975Google Scholar). Although the integration of DH and GS has been proposed, it has not been fully developed for different breeding designs in the current hybrid breeding pipeline. Here, we propose the concept of GS evolution schemes (from GS 1.0 to GS 4.0), and briefly describe how GS can be integrated into the DH breeding pipeline using maize as an example (Figure 1). In GS 1.0, closely related training and testing sets from a full-sib DH population result in the absence of population structure and higher linkage disequilibrium, thus contributing to a satisfactory level of prediction accuracy (Krchov and Bernardo, 2015Krchov L.-M. Bernardo R. Relative efficiency of genomewide selection for testcross performance of doubled haploid lines in a maize breeding program.Crop Sci. 2015; 55: 2091-2099Google Scholar). This GS scheme is the first choice for many breeding programs without historical data, even though its application is largely restricted in breeding because: (1) breeders have to build prediction models for each breeding population; (2) in many cases, the DH population is not large enough to establish a reasonably good prediction model; and (3) genotype × year interaction effects cannot be considered properly. However, this scheme can be used in the following situations to speed up the selection process compared with phenotypic selection alone: (1) when populations have a large number of DHs that cannot be fully tested in the field; (2) when breeders want to repeat the crosses with high expectations of breeding value, in which case the GS model can be established during the first run and used to predict more DH lines produced later on; and (3) when seed availability for the DH lines varies significantly, such that lines with sufficient seeds can be used for testcross testing aiming for the GS model to predict those without enough seeds, thus saving one generation of seed increase. In GS 2.0, a set of half-sib DH populations is pooled as a training set. Considering, for example, a set of DH populations containing either parental line P1 or P2 can be pooled together as a training population to predict the cross P1 × P2 (Brandariz and Bernardo, 2019Brandariz S.P. Bernardo R. Small ad hoc versus large general training populations for genomewide selection in maize biparental crosses.Theor. Appl. Genet. 2019; 132: 347-353Google Scholar). The composition of the training population is one of the key factors affecting prediction accuracy, which has been shown to decline when full-sib DH lines were replaced by half-sib DH lines, whereas statistically significantly better results were achieved when both parents of the test population were included in the training population (Riedelsheimer et al., 2013Riedelsheimer C. Endelman J.B. Stange M. Sorrells M.E. Jannink J.L. Melchinger A.E. Genomic predictability of interconnected biparental maize populations.Genetics. 2013; 194: 493-503Google Scholar). There are two possible situations in GS 2.0 (Figure 1). One is that the DH population in the testing set is different from that in the training set, and the other is that one DH population appears in both sets, which has a practical advantage by including DH lines with enough seeds in the training set and the remaining in the testing set. GS 2.0 can overcome the disadvantages of the small training population size in GS 1.0 by pooling several populations together. However, similar to GS 1.0, the main drawback of GS 2.0 is the reduced heritability and thus decreased prediction ability because of limited testing environments. In GS 3.0, the training and testing sets usually contain different breeding populations produced across years in a breeding program. Compared with GS 2.0, the training and testing sets normally share fewer common parents in GS 3.0. However, in a maize breeding program, inbred lines can be traced back to a few key founder lines, and the DH lines from different breeding populations across years are usually closely related, whereby they can be treated as partial full-sibs or half-sibs and used to train the GS model in GS 3.0 (Wang et al., 2020Wang N. Wang H. Zhang A. Liu Y. Yu D. Hao Z. Ilut D. Glaubitz J.C. Gao Y. Jones E. et al.Genomic prediction across years in a maize doubled haploid breeding program to accelerate early-stage testcross testing.Theor. Appl. Genet. 2020; 133: 2869-2879Google Scholar). With the accumulation of historical data, a large training population can be formed, including much wider genetic and environmental diversity spaces compared with GS 1.0 and GS 2.0; thus, the predictive accuracy can be higher (Zhao et al., 2021Zhao Y. Thorwarth P. Jiang Y. Philipp N. Schulthess A.W. Gils M. Boeven P.H.G. Longin C.F.H. Schacht J. Ebmeyer E. et al.Unlocking big data doubled the accuracy in predicting the grain yield in hybrid wheat.Sci. Adv. 2021; 7: eabf9106Google Scholar). In addition, the genetic value can be estimated in GS 3.0 for all newly produced DH lines using historical data without seed increase and MET, significantly shortening the breeding cycle. Therefore, GS 3.0 has a strong impact not only on reducing the breeding cycle, but also on cost savings, which will strongly encourage breeders to allocate more resources for genotyping while maintaining a reasonable size for phenotyping (Figure 1). In GS 1.0, 2.0, and 3.0, DH line evaluation is usually based on the prediction of general combining ability by crossing the DH lines with a few testers. All selected DH lines will be validated further in the field before coding as an inbred line for hybrid development. There is a clear border between the line- and the hybrid-development phases. Most resources are usually invested in the first phase to select only a few to dozens from thousands of DH lines. There are two potential shortcomings with this strategy: (1) outstanding combinations may be eliminated or never be tested in the traditional procedure and (2) single crosses from two good DH lines can be evaluated only after the testcross evaluation, which delays variety release. In GS 4.0, it is proposed that line evaluation and hybrid development be combined as one step by genomic prediction of early-stage single crosses where the DH lines from one heterotic group are randomly crossed with those from the complementary heterotic group (Kadam et al., 2016Kadam D.C. Potts S.M. Bohn M.O. Lipka A.E. Lorenz A.J. Genomic prediction of single crosses in the early stages of a maize hybrid breeding pipeline.G3 (Bethesda). 2016; 6: 3443-3453Google Scholar). Not only the additive effect, but also the intra- (dominance) and interlocus (epistasis) interaction effects, as well as genotype × environment interaction effects could be considered in the hybrid prediction (de Los Campos et al., 2020de Los Campos G. Perez-Rodriguez P. Bogard M. Gouache D. Crossa J. A data-driven simulation platform to predict cultivars' performances under uncertain weather conditions.Nat. Commun. 2020; 11: 4876Google Scholar). Hybrid development can be greatly accelerated, as all possible combinations can be predicted from the beginning. The evolution from GS 1.0 to GS 4.0 represents the path of a paradigm shift in hybrid breeding due to three major changes. First, in contrast to multi-step phenotypic selection, the GS strategy depends on the trained model to predict the performance of all DH lines first, followed by their validation in the field; hence, the concept is changed from “selective breeding” to “predictive breeding” (Cooper et al., 2014Cooper M. Messina C.D. Podlich D. Totir L.R. Baumgarten A. Hausmann N.J. Wright D. Graham G. Predicting the future of plant breeding: complementing empirical evaluation with genetic prediction.Crop Pasture Sci. 2014; 65: 311-336Google Scholar). Second, GS strategy can make use of historical breeding data, and the value of the data is well exemplified in the evolution from GS 1.0 through GS 4.0 (Figure 1). With the increasing accumulation of data and the breakthrough of new statistical methods, prediction accuracy should be greatly improved through model optimization. Furthermore, this creates the opportunity of sharing the accumulated data across breeding programs for open-source breeding (Xu et al., 2020Xu Y. Liu X. Fu J. Wang H. Wang J. Huang C. Prasanna B.M. Olsen M.S. Wang G. Zhang A. Enhancing genetic gain through genomic selection: from livestock to plants.Plant Commun. 2020; 1: 100005Google Scholar). Last, the GS strategy has a strong capacity to integrate line evaluation with hybrid development, which will completely revolutionize the breeding pipeline and shorten the breeding cycle. Currently, the reported prediction accuracy for complex traits such as grain yield remains intermediate to low. Nevertheless, the integration of information techniques and the development of modern breeding techniques will bring new opportunities for GS. High-throughput and low-cost genotyping are the main drivers for a wide application of GS technology (Xu et al., 2020Xu Y. Liu X. Fu J. Wang H. Wang J. Huang C. Prasanna B.M. Olsen M.S. Wang G. Zhang A. Enhancing genetic gain through genomic selection: from livestock to plants.Plant Commun. 2020; 1: 100005Google Scholar). In the future, genotyping technology, especially multiplexing technology, might include as many samples and markers as possible in every genotyping. Furthermore, advanced experimental designs such as sparse testing and corresponding models have been developed to increase the testing capacity as well as the prediction accuracy (Jarquin et al., 2020Jarquin D. Howard R. Crossa J. Beyene Y. Gowda M. Martini J.W.R. Covarrubias Pazaran G. Burgueno J. Pacheco A. Grondona M. et al.Genomic prediction enhanced sparse testing for multi-environment trials.G3 (Bethesda). 2020; 10: 2725-2739Google Scholar). With modern equipment for high-throughput phenotyping, such as that involved in photographic and spectroscopy technologies, more and more precise phenotypic information can be obtained. Selection will benefit from integrating the information on secondary traits related to agronomic traits, such as disease resistance (Araus et al., 2018Araus J.L. Kefauver S.C. Zaman-Allah M. Olsen M.S. Cairns J.E. Translating high-throughput phenotyping into genetic gain.Trends Plant Sci. 2018; 23: 451-466Google Scholar). Genome editing is a powerful tool for engineering phenotypic variation by creating allelic variation rapidly and efficiently, especially when targeted genetic variation has been depleted completely. It is very promising for the successful combination of genome editing and GS to create and select new variants (Hickey et al., 2019Hickey L.T. A N.H. Robinson H. Jackson S.A. Leal-Bertioli S.C.M. Tester M. Gao C. Godwin I.D. Hayes B.J. Wulff B.B.H. Breeding crops to feed 10 billion.Nat. Biotechnol. 2019; 37: 744-754Google Scholar). Finally, as one of the most powerful technologies in the era of artificial intelligence, machine learning, with the power of integrating multiple data sources, would greatly improve prediction accuracy (Yan et al., 2021Yan J. Xu Y. Cheng Q. Jiang S. Wang Q. Xiao Y. Ma C. Yan J. Wang X. LightGBM: accelerated genomically designed crop breeding through ensemble learning.Genome Biol. 2021; 22: 271Google Scholar). In summary, DH technology has become an efficient method for line production in maize and will be widely available for other major crops soon. Nonetheless, high-throughput evaluation of DH lines remains a bottleneck, which can be largely solved by integration with GS strategies. On the other hand, with the development of speed breeding, thousands of recombinant inbred lines (RILs) can also be produced in major crops, and the proposed GS schemes can thus be extended to RIL breeding to select superior RILs for their hybrids. Full adoption of GS in different breeding programs (GS 1.0 to GS 4.0) will bring a paradigm shift in hybrid crop breeding, fast-forwarding genetic gain to meet the challenge of attaining food security for the ever-increasing human population in the uncertain context of global climate change. This work was supported by the National Key Research and Development Program of China (2020YFE0202300), Agricultural Science and Technology Innovation Program of CAAS (ZDRW202004), and Project of Hainan Yazhou Bay Seed Lab (B21HJ0223).