基因组
生物
计算生物学
顺序装配
深度学习
计算机科学
鉴定(生物学)
理论计算机科学
人工智能
遗传学
基因
基因表达
转录组
植物
作者
Lovro Vrček,Xavier Bresson,Laurent Thomas,Martin Schmitz,Kenji Kawaguchi,Mile Šikić
出处
期刊:Genome Research
[Cold Spring Harbor Laboratory]
日期:2024-10-29
卷期号:: gr.279307.124-gr.279307.124
标识
DOI:10.1101/gr.279307.124
摘要
The critical stage of every de novo genome assembler is identifying paths in assembly graphs that correspond to the reconstructed genomic sequences. The existing algorithmic methods struggle with this, primarily due to repetitive regions causing complex graph tangles, leading to fragmented assemblies. Here, we introduce GNNome, a framework for path identification based on geometric deep learning that enables training models on assembly graphs without relying on existing assembly strategies. By leveraging only the symmetries inherent to the problem, GNNome reconstructs assemblies from PacBio HiFi reads with contiguity and quality comparable to those of the state-of-the-art tools across several species. With every new genome assembled telomere-to-telomere, the amount of reliable training data at our disposal increases. Combining the straightforward generation of abundant simulated data for diverse genomic structures with the AI approach makes the proposed framework a plausible cornerstone for future work on reconstructing complex genomes with different ploidy and aneuploidy degrees. To facilitate such developments, we make the framework and the best-performing model publicly available, provided as a tool that can directly be used to assemble new haploid genomes.
科研通智能强力驱动
Strongly Powered by AbleSci AI