基因组
计算机科学
顺序装配
理论计算机科学
计算生物学
深度学习
纳米孔测序
连续性
图形
生物
人工智能
遗传学
基因
基因表达
转录组
操作系统
作者
Lovro Vrček,Xavier Bresson,Laurent Thomas,Martin Schmitz,Kenji Kawaguchi,Mile Šikić
标识
DOI:10.1101/2024.03.11.584353
摘要
The critical stage of every de novo genome assembler is identifying paths in assembly graphs that correspond to the reconstructed genomic sequences. The existing algorithmic methods struggle with this, primarily due to repetitive regions causing complex graph tangles, leading to fragmented assemblies. Here, we introduce GNNome, a framework for path identification based on geometric deep learning that enables training models on assembly graphs without relying on existing assembly strategies. By leveraging symmetries inherent to the problem, GNNome reconstructs assemblies with similar or superior contiguity compared to the state-of-the-art tools across several species, sequenced with PacBio HiFi or Oxford Nanopore. With every new genome assembled telomere-to-telomere, the amount of reliable training data at our disposal increases. Combining the straightforward generation of abundant simulated data for diverse genomic structures with the AI approach makes the proposed framework a plausible cornerstone for future work on reconstructing complex genomes with different ploidy and aneuploidy degrees. To facilitate such developments, we make the framework and the best-performing model publicly available, provided as a tool that can directly be used to assemble new haploid genomes.
科研通智能强力驱动
Strongly Powered by AbleSci AI