瓶颈
计算机科学
集合(抽象数据类型)
人口
数据科学
生物
领域(数学分析)
原始数据
吞吐量
大数据
数据挖掘
无线
数学分析
社会学
嵌入式系统
人口学
程序设计语言
电信
数学
作者
Bonnie Berger,Yun William Yu
标识
DOI:10.1038/s41576-022-00551-z
摘要
Genome sequencing and analysis allow researchers to decode the functional information hidden in DNA sequences as well as to study cell to cell variation within a cell population. Traditionally, the primary bottleneck in genomic analysis pipelines has been the sequencing itself, which has been much more expensive than the computational analyses that follow. However, an important consequence of the continued drive to expand the throughput of sequencing platforms at lower cost is that often the analytical pipelines are struggling to keep up with the sheer amount of raw data produced. Computational cost and efficiency have thus become of ever increasing importance. Recent methodological advances, such as data sketching, accelerators and domain-specific libraries/languages, promise to address these modern computational challenges. However, despite being more efficient, these innovations come with a new set of trade-offs, both expected, such as accuracy versus memory and expense versus time, and more subtle, including the human expertise needed to use non-standard programming interfaces and set up complex infrastructure. In this Review, we discuss how to navigate these new methodological advances and their trade-offs. In this Review, Berger and Yu discuss how the sheer amounts of sequence data create bottlenecks in downstream analytical pipelines that must be overcome by new analysis strategies, each with their own trade-offs for properties such as speed, accuracy and applicability.
科研通智能强力驱动
Strongly Powered by AbleSci AI