人工智能
无监督学习
代表(政治)
机器学习
计算机科学
蛋白质结构预测
生成模型
蛋白质三级结构
生成语法
生物
蛋白质结构
政治学
生物化学
政治
法学
作者
Alexander Rives,Joshua Meier,Tom Sercu,Siddharth Goyal,Zeming Lin,Jason Liu,Demi Guo,Myle Ott,C. Lawrence Zitnick,Jerry Ma,Rob Fergus
标识
DOI:10.1073/pnas.2016239118
摘要
Significance Learning biological properties from sequence data is a logical step toward generative and predictive artificial intelligence for biology. Here, we propose scaling a deep contextual language model with unsupervised learning to sequences spanning evolutionary diversity. We find that without prior knowledge, information emerges in the learned representations on fundamental properties of proteins such as secondary structure, contacts, and biological activity. We show the learned representations are useful across benchmarks for remote homology detection, prediction of secondary structure, long-range residue–residue contacts, and mutational effect. Unsupervised representation learning enables state-of-the-art supervised prediction of mutational effect and secondary structure and improves state-of-the-art features for long-range contact prediction.
科研通智能强力驱动
Strongly Powered by AbleSci AI