蛋白质设计
序列空间
计算机科学
蛋白质工程
蛋白质结构预测
拓扑(电路)
蛋白质测序
持久同源性
不变(物理)
健身景观
人工智能
蛋白质结构
数学
算法
生物
肽序列
遗传学
离散数学
组合数学
酶
人口
人口学
社会学
巴拿赫空间
基因
生物化学
数学物理
标识
DOI:10.1038/s43588-022-00394-y
摘要
While protein engineering, which iteratively optimizes protein fitness by screening the gigantic mutational space, is constrained by experimental capacity, various machine learning models have substantially expedited protein engineering. Three-dimensional protein structures promise further advantages, but their intricate geometric complexity hinders their applications in deep mutational screening. Persistent homology, an established algebraic topology tool for protein structural complexity reduction, fails to capture the homotopic shape evolution during the filtration of a given data. This work introduces a Topology-offered protein Fitness (TopFit) framework to complement protein sequence and structure embeddings. Equipped with an ensemble regression strategy, TopFit integrates the persistent spectral theory, a new topological Laplacian, and two auxiliary sequence embeddings to capture mutation-induced topological invariant, shape evolution, and sequence disparity in the protein fitness landscape. The performance of TopFit is assessed by 34 benchmark datasets with 128,634 variants, involving a vast variety of protein structure acquisition modalities and training set size variations.
科研通智能强力驱动
Strongly Powered by AbleSci AI