特征选择
计算机科学
特征(语言学)
一般化
人工智能
特征工程
蛋白质工程
适应度函数
过程(计算)
机器学习
序列(生物学)
模式识别(心理学)
深度学习
生物
数学
哲学
语言学
遗传算法
数学分析
生物化学
遗传学
酶
操作系统
作者
Shouxin Zhang,Zhixuan Li,Qianyue Wang,Hanlin Wu,Manli Yang,Fengguang Zhao,Mingkui Tan,Shuangyan Han
摘要
Abstract Machine learning (ML) constructs predictive models by understanding the relationship between protein sequences and their functions, enabling efficient identification of protein sequences with high fitness values without falling into local optima, like directional evolution. However, how to extract the most pertinent functional feature information from a limited number of protein sequences is vital for optimizing the performance of ML models. Here, we propose scut_ProFP (Protein Fitness Predictor), a predictive framework that integrates feature combination and feature selection techniques. Feature combination offers comprehensive sequence information, while feature selection searches for the most beneficial features to enhance model performance, enabling accurate sequence‐to‐function mapping. Compared to similar frameworks, scut_ProFP demonstrates superior performance and is also competitive with more complex deep learning models—ECNet, EVmutation, and UniRep. In addition, scut_ProFP enables generalization from low‐order mutants to high‐order mutants. Finally, we utilized scut_ProFP to simulate the engineering of the fluorescent protein CreiLOV and highly enriched mutants with high fluorescence based on only a small number of low‐fluorescence mutants. Essentially, the developed method is advantageous for ML in protein engineering, providing an effective approach to data‐driven protein engineering. The code and datasets for scut_ProFP are available at https://github.com/Zhang66-star/scut_ProFP .
科研通智能强力驱动
Strongly Powered by AbleSci AI