噬菌体
背景(考古学)
基因组
计算机科学
计算生物学
生物
遗传学
基因
古生物学
大肠杆菌
标识
DOI:10.1101/2023.12.18.572218
摘要
Abstract Inspired by the success of large language models, we develop a long-context generative model for genomes. Our multiscale transformer model was pre-trained on unannotated bacteriophage genomes with byte-level tokenization. We demonstrate the foundational capabilities of our model including the prediction of essential genes, genetic variant effects, regulatory element activity and taxonomy of unannotated sequences. Furthermore, it generates de novo sequences up to 96K base pairs, which contain functional regulatory elements and novel proteins with phage-related functions.
科研通智能强力驱动
Strongly Powered by AbleSci AI