作者
Zilin Li,Xihao Li,Hufeng Zhou,Sheila M. Gaynor,Margaret Sunitha Selvaraj,Theodore Arapoglou,Corbin Quick,Yaowu Liu,Han Chen,Ryan Sun,Rounak Dey,Donna K. Arnett,Lawrence F. Bielak,Joshua C. Bis,Thomas Blackwell,John Blangero,Eric Boerwinkle,Donald W. Bowden,Jennifer A. Brody,Brian E. Cade,Matthew P. Conomos,Adolfo Correa,L. Adrienne Cupples,Joanne E. Curran,Paul S. de Vries,Ravindranath Duggirala,Barry I. Freedman,Harald H.H. Göring,Xiuqing Guo,Rita R. Kalyani,Charles Kooperberg,B Král,Leslie A. Lange,Ani Manichaikul,Lisa W. Martin,Braxton D. Mitchell,May E. Montasser,Alanna C. Morrison,Take Naseri,Jeffrey R. O’Connell,Nicholette D. Palmer,Patricia A. Peyser,Bruce M. Psaty,Laura M. Raffield,Susan Redline,Alexander P. Reiner,Muagututi’a Sefuiva Reupena,Kenneth Rice,Stephen S. Rich,Jennifer A. Smith,Kent D. Taylor,Ramachandran S. Vasan,Daniel E. Weeks,James G. Wilson,Lisa R. Yanek,Wei Zhao,Jerome I. Rotter,Cristen J. Willer,Pradeep Natarajan,Gina M. Peloso,Xihong Lin
摘要
Abstract Large-scale whole-genome sequencing studies have enabled analysis of noncoding rare variants’ (RVs) associations with complex human traits. Variant set analysis is a powerful approach to study RV association, and a key component of it is constructing RV sets for analysis. However, existing methods have limited ability to define analysis units in the noncoding genome. Furthermore, there is a lack of robust pipelines for comprehensive and scalable noncoding RV association analysis. Here we propose a computationally-efficient noncoding RV association-detection framework that uses STAAR (variant-set test for association using annotation information) to group noncoding variants in gene-centric analysis based on functional categories. We also propose SCANG (scan the genome)-STAAR, which uses dynamic window sizes and incorporates multiple functional annotations, in a non-gene-centric analysis. We furthermore develop STAARpipeline to perform flexible noncoding RV association analysis, including gene-centric analysis as well as fixed-window-based and dynamic-window-based non-gene-centric analysis. We apply STAARpipeline to identify noncoding RV sets associated with four quantitative lipid traits in 21,015 discovery samples from the Trans-Omics for Precision Medicine (TOPMed) program and replicate several noncoding RV associations in an additional 9,123 TOPMed samples.