基础(证据)
核酸
计算机科学
计算生物学
化学
生物
生物化学
地理
考古
作者
Yong He,Pan Fang,Yongtao Shan,Yuanfei Pan,Yanhong Wei,Yichang Chen,Yi‐Hao Chen,Yi Liu,Zhenyu Zeng,Zhan Zhou,Feng Zhu,Edward C. Holmes,Jieping Ye,Jun Li,Yuelong Shu,Mǎng Shī,Zhaorong Li
标识
DOI:10.1101/2024.05.10.592927
摘要
In recent years, significant advancements have been observed in the domain of Natural Language Processing(NLP) with the introduction of pre-trained foundational models, paving the way for utilizing similar AI technologies to interpret the language of biology. In this research, we introduce “LucaOne”, a novel pre-trained foundational model designed to integratively learn from the genetic and proteomic languages, encapsulating data from 169,861 species en-compassing DNA, RNA, and proteins. This work illuminates the potential for creating a biological language model aimed at universal bioinformatics appli-cation. Remarkably, through few-shot learning, this model efficiently learns the central dogma of molecular biology and demonstrably outperforms com-peting models. Furthermore, in tasks requiring inputs of DNA, RNA, proteins, or a combination thereof, LucaOne exceeds the state-of-the-art performance using a streamlined downstream architecture, thereby providing empirical ev-idence and innovative perspectives on the potential of foundational models to comprehend complex biological systems.
科研通智能强力驱动
Strongly Powered by AbleSci AI