生成模型
生成语法
进化生物学
模式
边疆
功能(生物学)
荧光
计算机科学
计算生物学
序列(生物学)
分子进化
荧光蛋白
人工智能
生物
绿色荧光蛋白
地理
遗传学
基因
系统发育学
物理
人类学
社会学
考古
量子力学
作者
Thomas Hayes,Roshan Rao,Halil Akin,Nicholas J. Sofroniew,Deniz Oktay,Zeming Lin,Robert Verkuil,Vincent Q. Tran,Jonathan Deaton,Marius Wiggert,Rohil Badkundri,Irhum Shafkat,Jun Gong,Alexander Derry,R. Molina,Neil Thomas,Yousuf A. Khan,Chetan Mishra,Carolyn Kim,Liam J. Bartie
标识
DOI:10.1101/2024.07.01.600583
摘要
Abstract More than three billion years of evolution have produced an image of biology encoded into the space of natural proteins. Here we show that language models trained on tokens generated by evolution can act as evolutionary simulators to generate functional proteins that are far away from known proteins. We present ESM3, a frontier multimodal generative language model that reasons over the sequence, structure, and function of proteins. ESM3 can follow complex prompts combining its modalities and is highly responsive to biological alignment. We have prompted ESM3 to generate fluorescent proteins with a chain of thought. Among the generations that we synthesized, we found a bright fluorescent protein at far distance (58% identity) from known fluorescent proteins. Similarly distant natural fluorescent proteins are separated by over five hundred million years of evolution.
科研通智能强力驱动
Strongly Powered by AbleSci AI