噬菌体
背景(考古学)
基因组
计算生物学
生物
计算机科学
遗传学
基因
大肠杆菌
古生物学
标识
DOI:10.1038/s41467-024-53759-4
摘要
Inspired by the success of large language models (LLMs), we develop a long-context generative model for genomes. Our multiscale transformer model, megaDNA, is pre-trained on unannotated bacteriophage genomes with nucleotide-level tokenization. We demonstrate the foundational capabilities of our model including the prediction of essential genes, genetic variant effects, regulatory element activity and taxonomy of unannotated sequences. Furthermore, it generates de novo sequences up to 96 K base pairs, which contain potential regulatory elements and annotated proteins with phage-related functions.
科研通智能强力驱动
Strongly Powered by AbleSci AI