序列空间
计算机科学
蛋白质测序
功能(生物学)
蛋白质工程
生成语法
蛋白质家族
人工智能
肽序列
序列(生物学)
生物
计算生物学
空格(标点符号)
遗传学
酶
生物化学
基因
数学
巴拿赫空间
操作系统
纯数学
作者
Donatas Repecka,Vykintas Jauniškis,Laurynas Karpus,Elzbieta Rembeza,Jan Zrimec,Simona Povilonienė,Irmantas Rokaitis,Audrius Laurynėnas,Wissam Abuajwa,Otto Savolainen,Rolandas Meškys,Martin K. M. Engqvist,Aleksej Zelezniak
摘要
ABSTRACT De novo protein design for catalysis of any desired chemical reaction is a long standing goal in protein engineering, due to the broad spectrum of technological, scientific and medical applications. Currently, mapping protein sequence to protein function is, however, neither computationionally nor experimentally tangible 1,2 . Here we developed ProteinGAN, a specialised variant of the generative adversarial network 3 that is able to ‘learn’ natural protein sequence diversity and enables the generation of functional protein sequences. ProteinGAN learns the evolutionary relationships of protein sequences directly from the complex multidimensional amino acid sequence space and creates new, highly diverse sequence variants with natural-like physical properties. Using malate dehydrogenase as a template enzyme, we show that 24% of the ProteinGAN-generated and experimentally tested sequences are soluble and display wild-type level catalytic activity in the tested conditions in vitro , even in highly mutated (>100 mutations) sequences. ProteinGAN therefore demonstrates the potential of artificial intelligence to rapidly generate highly diverse novel functional proteins within the allowed biological constraints of the sequence space.
科研通智能强力驱动
Strongly Powered by AbleSci AI