摘要
Chapter 8 The Bayesian Paradigm in Molecular Phylogeny Nicolas RODRIGUE, Nicolas RODRIGUE Carleton University, Ottawa, CanadaSearch for more papers by this author Nicolas RODRIGUE, Nicolas RODRIGUE Carleton University, Ottawa, CanadaSearch for more papers by this author Gilles Didier, Gilles DidierSearch for more papers by this authorStéphane Guindon, Stéphane GuindonSearch for more papers by this author Book Author(s):Gilles Didier, Gilles DidierSearch for more papers by this authorStéphane Guindon, Stéphane GuindonSearch for more papers by this author First published: 12 April 2024 https://doi.org/10.1002/9781394284252.ch8 AboutPDFPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShareShare a linkShare onEmailFacebookTwitterLinkedInRedditWechat Summary The applications of probabilistic methods were initially developed within a maximum likelihood (ML) framework. Accommodating for multiple substitutions along a branch in a phylogenetic tree is a major advantage of probabilistic methods. This chapter discusses the technical limitations of the ML framework in building rich molecular evolutionary models, and how the computational development environment of Bayesian models overcomes them. It introduces the basic principles of Bayesian phylogenetic inference, namely the Monte Carlo-based sampling numerical methods commonly used for approximating the probabilities involved, and possible ways to summarize the model posterior distribution parameters. The chapter explains the principle of demarginalization based on two examples, often resulting in faster Monte Carlo sampling, as well as the implementation of substitution models including a non-analytic likelihood function. It also discusses the possible areas for future research in Bayesian molecular phylogeny and the necessary work to access its full potential at the genomic scale. References Adam , P.S. , Borrel , G. , Brochier-Armanet , C. , Gribaldo , S. ( 2017 ). The growing tree of Archaea: New perspectives on their diversity, evolution and ecology . The ISME Journal , 11 , 2407 – 2425 . 10.1038/ismej.2017.122 PubMedWeb of Science®Google Scholar Antunes , L.S. , Poppleton , D. , Klingl , A. , Criscuolo , A. , Dupuy , B. , Brochier-Armanet , C. , Beloin , C. , Gribaldo , S. ( 2016 ). Phylogenomic analysis supports the ancestral presence of LPS-outer membranes in the firmicutes . Elife , 5 , e14589 . 10.7554/eLife.14589 PubMedWeb of Science®Google Scholar Baele , G. , Lemey , P. , Rambaut , A. , Suchard , M.A. ( 2017 ). Adaptive mcmc in Bayesian phylogenetics: An application to analyzing partitioned data in BEAST . Bioinformatics , 33 , 1798 – 1805 . 10.1093/bioinformatics/btx088 CASPubMedWeb of Science®Google Scholar Blanquart , S. and Lartillot , N. ( 2006 ). A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution . Molecular Biology and Evolution , 23 , 2058 – 2071 . 10.1093/molbev/msl091 CASPubMedWeb of Science®Google Scholar Bollback , J.P. ( 2005 ). Posterior mapping and posterior predictive distributions . In Statistical Methods in Molecular Evolution , R. Nielsen (ed.). Springer , New York . 10.1007/0-387-27733-1_16 Google Scholar Brown , J.M. and Thomson , R.C. ( 2018 ). Evaluating model performance in evolutionary biology . Annual Review of Ecology, Evolution and Systematics , 49 , 95 – 114 . 10.1146/annurev-ecolsys-110617-062249 Web of Science®Google Scholar Brown , M.W. , Heiss , A.A. , Kamikawa , R. , Inagaki , Y. , Yabuki , A. , Tice , A.K. , Shiratori , T. , Ishida , K.-I. , Hashimoto , T. , Simpson , A.G. et al. ( 2018 ). Phylogenomics places orphan protistan lineages in a novel eukaryotic super-group . Genome Biology and Evolution , 10 , 427 – 433 . 10.1093/gbe/evy014 CASPubMedWeb of Science®Google Scholar Fan , Y. , Wu , R. , Chen , M.-H. , Kuo , L. , Lewis , P.O. ( 2011 ). Choosing among partition models in Bayesian phylogenetics . Molecular Biology and Evolution , 28 , 523 – 532 . 10.1093/molbev/msq224 CASPubMedWeb of Science®Google Scholar Felsenstein , J. ( 1981 ). Evolutionary trees from DNA sequences: A maximum likelihood approach . Journal of Molecular Evolution , 17 ( 6 ), 368 – 376 . 10.1007/BF01734359 CASPubMedWeb of Science®Google Scholar Felsenstein , J. ( 2004 ). Inferring Phylogenies . Sinauer Associates , Sunderland, MA . Google Scholar Foster , P.G. ( 2004 ). Modeling compositional heterogeneity . Systematic Biology , 53 ( 3 ), 485 – 495 . 10.1080/10635150490445779 PubMedWeb of Science®Google Scholar Gelman , A. ( 2013 ). Two simple examples for understanding posterior p-values whose distributionsare far from uniform . Electronic Journal of Statistics , 7 , 2595 – 2602 . 10.1214/13-EJS854 Web of Science®Google Scholar Gelman , A. , Meng , X.L. , Stern , H. ( 1996 ). Posterior predicive assessment of model fitness via realised discrepancies . Statistica Sinica , 6 , 733 – 807 . Web of Science®Google Scholar Hastings , W.K. ( 1970 ). Monte Carlo sampling methods using Markov chains and their applications . Biometrika , 57 , 97 – 109 . 10.1093/biomet/57.1.97 Web of Science®Google Scholar Huelsenbeck , J.P. , Ronquist , F. , Nielsen , R. , Bollback , J.P. ( 2001 ). Bayesian inference of phylogeny and its impact on evolutionary biology . Science , 294 ( 5550 ), 2310 – 2314 . 10.1126/science.1065889 CASPubMedWeb of Science®Google Scholar Huelsenbeck , J.P. , Jain , S. , Frost , S.D. , Pond , S.L.K. ( 2006 ). A Dirichlet process model for detecting positive selection in protein-coding DNA sequences . Proceedings of the National Academy of Science of the USA , 103 , 6263 – 6268 . 10.1073/pnas.0508279103 CASPubMedWeb of Science®Google Scholar Jeffreys , H. ( 1935 ). Some tests of significance, treated by the theory of probability . Proceedings of the Cambridge Philosophical Society , 31 , 203 – 222 . 10.1017/S030500410001330X Web of Science®Google Scholar Kass , R. and Raftery , A. ( 1995 ). Bayes factors and model uncertainty . Journal of the American Statistical Association , 90 , 773 – 795 . 10.1080/01621459.1995.10476572 Web of Science®Google Scholar Lakner , C. , Van Der Mark , P. , Huelsenbeck , J.P. , Larget , B. , Ronquist , F. ( 2008 ). Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics . Systematic Biology , 57 , 86 – 103 . 10.1080/10635150801886156 PubMedWeb of Science®Google Scholar Lanave , C. , Preparata , G. , Saccone , C. , Serio , G. ( 1984 ). A new method for calculating evolutionary substitution rates . Journal of Molecular Evolution , 20 , 86 – 93 . 10.1007/BF02101990 CASPubMedWeb of Science®Google Scholar Larget , B. and Simon , D. ( 1999 ). Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees . Molecular Biology and Evolution , 16 , 750 – 759 . 10.1093/oxfordjournals.molbev.a026160 CASWeb of Science®Google Scholar Lartillot , N. ( 2006 ). Conjugate Gibbs sampling for Bayesian phylogenetic models . Journal of Computational Biology , 13 , 1701 – 1722 . 10.1089/cmb.2006.13.1701 CASPubMedWeb of Science®Google Scholar Lartillot , N. and Philippe , H. ( 2004 ). A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process . Molecular Biology and Evolution , 21 , 1095 – 1109 . 10.1093/molbev/msh112 CASPubMedWeb of Science®Google Scholar Lartillot , N. and Philippe , H. ( 2006 ). Computing Bayes factors using thermodynamic integration . Systematic Biology , 55 , 195 – 207 . 10.1080/10635150500433722 PubMedWeb of Science®Google Scholar Lartillot , N. and Poujol , R. ( 2010 ). A phylogenetic model for investigating correlated evolution of substitution rates and continuous phenotypic characters . Molecular Biology and Evolution , 28 , 729 – 744 . 10.1093/molbev/msq244 CASPubMedWeb of Science®Google Scholar Lartillot , N. , Brinkmann , H. , Philippe , H. ( 2007 ). Suppression of long branch attraction artefacts in the animal phylogeny using a site-heterogeneous model . BMC Evolutionary Biology , 7 ( Suppl 1 ), S4 . 10.1186/1471-2148-7-S1-S4 CASPubMedWeb of Science®Google Scholar Lartillot , N. , Rodrigue , N. , Stubbs , D. , Richer , J. ( 2013 ). Phylobayes MPI: Phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment . Systematic Biology , 62 , 611 – 615 . 10.1093/sysbio/syt022 CASPubMedWeb of Science®Google Scholar Meng , X.-L. ( 1994 ). Posterior predictive p-values . Annals of Statistics , 22 , 1142 – 1160 . 10.1214/aos/1176325622 Web of Science®Google Scholar Metropolis , S. , Rosenbluth , A.W. , Rosenbluth , M.N. , Teller , A.H. , Teller , E. ( 1953 ). Equation of state calculation by fast computing machines . Journal of Chemical Physics , 21 , 1087 – 1092 . 10.1063/1.1699114 CASPubMedWeb of Science®Google Scholar Nielsen , R. ( 2002 ). Mapping mutations on phylogenies . Systematic Biology , 51 ( 5 ), 729 – 739 . 10.1080/10635150290102393 PubMedWeb of Science®Google Scholar Rambaut , A. , Drummond , A.J. , Xie , D. , Baele , G. , Suchard , M.A. ( 2018 ). Posterior summarization in Bayesian phylogenetics using tracer 1.7 . Systematic Biology , 67 , 901 – 904 . 10.1093/sysbio/syy032 CASPubMedWeb of Science®Google Scholar Robert , C.P. and Casella , G. ( 2004 ). Monte Carlo Statistical Methods . Springer , New York . 10.1007/978-1-4757-4145-2 Google Scholar Robinson , D.M. , Jones , D.T. , Kishino , H. , Goldman , N. , Thorne , J.L. ( 2003 ). Protein evolution with dependence among codons due to tertiary structure . Molecular Biology and Evolution , 18 , 1692 – 1704 . 10.1093/molbev/msg184 Google Scholar Rodrigue , N. and Aris-Brosou , S. ( 2011 ). Fast Bayesian choice of phylogenetic models: Prospecting data augmentation-based thermodynamic integration . Systematic Biology , 60 , 881 – 887 . 10.1093/sysbio/syr065 PubMedWeb of Science®Google Scholar Rodrigue , N. and Lartillot , N. ( 2016 ). Detecting adaptation in protein-coding genes using a Bayesian site-heterogeneous mutation-selection codon substitution model . Molecular Biology and Evolution , 34 , 204 – 214 . 10.1093/molbev/msw220 PubMedWeb of Science®Google Scholar Rodrigue , N. , Lartillot , N. , Bryant , D. , Philippe , H. ( 2005 ). Site interdependence attributed to tertiary structure in amino acid sequence evolution . Gene , 347 , 207 – 217 . 10.1016/j.gene.2004.12.011 CASPubMedWeb of Science®Google Scholar Rodrigue , N. , Philippe , H. , Lartillot , N. ( 2006 ). Assessing site-interdependent phylogenetic models of sequence evolution . Molecular Biology and Evolution , 23 , 1762 – 1775 . 10.1093/molbev/msl041 CASPubMedWeb of Science®Google Scholar Rodrigue , N. , Philippe , H. , Lartillot , N. ( 2007 ). Exploring fast computational strategies for probabilistic phylogenetic analysis . Systematic Biology , 56 , 711 – 726 . 10.1080/10635150701611258 PubMedWeb of Science®Google Scholar Rodrigue , N. , Lartillot , N. , Philippe , H. ( 2008a ). Bayesian comparisons of codon substitution models . Genetics , 180 , 1579 – 1591 . 10.1534/genetics.108.092254 CASPubMedWeb of Science®Google Scholar Rodrigue , N. , Philippe , H. , Lartillot , N. ( 2008b ). Uniformization for sampling realizations of Markov processes: Applications to Bayesian implementations of codon substitution models . Bioinformatics , 24 , 56 – 62 . 10.1093/bioinformatics/btm532 CASPubMedWeb of Science®Google Scholar Rodrigue , N. , Philippe , H. , Lartillot , N. ( 2009 ). Computational methods for evaluating phylogenetic models of coding sequence evolution with dependence between codons . Molecular Biology and Evolution , 26 , 1663 – 1676 . 10.1093/molbev/msp078 CASPubMedWeb of Science®Google Scholar Rubin , D.B. ( 1984 ). Bayesianly justifiable and relevant frequency calculations for the applied statistician . Annals of Statistics , 4 , 1151 – 1172 . Google Scholar Simion , P. , Philippe , H. , Baurain , D. , Jager , M. , Richter , D.J. , Di Franco , A. , Roure , B. , Satoh , N. , Queinnec , E. , Ereskovsky , A. et al. ( 2017 ). A large and consistent phylogenomic dataset supports sponges as the sister group to all other animals . Current Biology , 27 , 958 – 967 . 10.1016/j.cub.2017.02.031 CASPubMedWeb of Science®Google Scholar Xie , W. , Lewis , P.O. , Fan , Y. , Kuo , L. , Chen , M.-H. ( 2011 ). Improving marginal likelihood estimation for Bayesian phylogenetic model selection . Systematic Biology , 60 , 150 – 160 . 10.1093/sysbio/syq085 PubMedWeb of Science®Google Scholar Yang , Z. ( 1993 ). Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites . Molecular Biology and Evolution , 10 , 1396 – 1401 . CASPubMedWeb of Science®Google Scholar Yang , Z. ( 1994 ). Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods . Journal of Molecular Evolution , 39 , 306 – 14 . 10.1007/BF00160154 CASPubMedWeb of Science®Google Scholar Models and Methods for Biological Evolution: Mathematical Models and Algorithms to Study Evolution ReferencesRelatedInformation