作者
Yafei Mao,William T. Harvey,David Porubský,Katherine M. Munson,Kendra Hoekzema,Alexandra P. Lewis,Peter A. Audano,Allison N. Rozanski,Xiangyu Yang,Shilong Zhang,David Gordon,Xiaoxi Wei,Glennis A. Logsdon,Marina Haukness,Philip C. Dishuck,Hyeonsoo Jeong,Ricardo del Rosario,Vanessa L. Bauer,Will T. Fattor,Gregory K. Wilkerson,Qing Lü,Benedict Paten,Guoping Feng,Sara L. Sawyer,Wesley C. Warren,Lucia Carbone,Evan E. Eichler
摘要
To better understand the pattern of primate genome structural variation, we sequenced and assembled using multiple long-read sequencing technologies the genomes of eight nonhuman primate species, including New World monkeys (owl monkey and marmoset), Old World monkey (macaque), Asian apes (orangutan and gibbon), and African ape lineages (gorilla, bonobo, and chimpanzee). Compared to the human genome, we identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. Across 50 million years of primate evolution, we estimate that 819.47 Mbp or ~27% of the genome has been affected by SVs based on analysis of these primate lineages. We identify 1,607 structurally divergent regions (SDRs) wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (CARDs, ABCD7, OLAH) and new lineage-specific genes are generated (e.g., CKAP2, NEK5) and have become targets of rapid chromosomal diversification and positive selection (e.g., RGPDs). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species for the first time.