计算机科学
代表(政治)
蛋白质设计
构造(python库)
自然语言处理
人工智能
情报检索
蛋白质结构
程序设计语言
政治学
核磁共振
政治
物理
法学
作者
Shengchao Liu,Yutao Zhu,Jiarui Lu,Xu Zhao,Weili Nie,Anthony Gitter,Chaowei Xiao,Jian Tang,Hongyu Guo,Anima Anandkumar
出处
期刊:Cornell University - arXiv
日期:2023-01-01
被引量:17
标识
DOI:10.48550/arxiv.2302.04611
摘要
Current AI-assisted protein design mainly utilizes protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in the text format describing proteins' high-level functionalities. Yet, whether the incorporation of such text data can help protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multi-modal framework that leverages textual descriptions for protein design. ProteinDT consists of three subsequent steps: ProteinCLAP which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality, and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441K text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90\% accuracy for text-guided protein generation; (2) best hit ratio on 10 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks.
科研通智能强力驱动
Strongly Powered by AbleSci AI