特征选择
堆积
选择(遗传算法)
计算生物学
计算机科学
特征(语言学)
人工智能
生物
化学
语言学
哲学
有机化学
作者
Qian Cao,Xufeng Xiao,Yannan Bin,Jianping Zhao,Chun-Hou Zheng
出处
期刊:Current Bioinformatics
[Bentham Science]
日期:2024-10-28
卷期号:20
标识
DOI:10.2174/0115748936330198240924110742
摘要
Background: Phage therapy has a broad application prospect as a novel therapeutic method, and Phage Virion Proteins (PVP) can recognize the host and bind to surface receptors, which is of great significance for the development of antimicrobial drugs for the treatment of infectious diseases caused by bacteria. In recent years, several PVP predictors based on machine learning have been developed, which usually use a single feature to train the learner. In contrast, higher dimensional feature representations tend to contain more potential sequence information. Methods: In this work, we construct a stacking model PredPVP for PVP prediction by combining multiple features and using feature selection methods. Specifically, the sequence is first encoded using seven features. For this high-dimensional feature representation, three feature selection methods wereutilized to remove redundant features, then integrated with eight machine learning algorithms. Finally, probability features and class features (PCFs) generated by 24 base models were put into logistic regression (LR) to train the model. Results: The results of the independent test set indicate that PredPVP has higher performance compared to other existing predictors, with an AUC of 93.4%. Conclusion: We expect PredPVP to be used as a tool for large-scale PVP recognition, providing a new way for the development of novel antimicrobials and accelerating its application in actual treatment. The datasets and source codes used in this study are available at https://github.com/caoqian23/PredPVP.
科研通智能强力驱动
Strongly Powered by AbleSci AI