作者
Changli Feng,Haiyan Wei,Chugui Xu,Bin Feng,Xiaorong Zhu,Jing Liu,Quan Zou
摘要
Protein classification is a crucial field in bioinformatics. The development of a comprehensive tool that can perform feature evaluation, visualization, automated machine learning, and model interpretation would significantly advance research in protein classification. However, there is a significant gap in the literature regarding tools that integrate all these essential functionalities. This paper presents iProps, a novel Python-based software package, meticulously crafted to fulfill these multifaceted requirements. iProps is distinguished by its proficiency in feature extraction, evaluation, automated machine learning, and interpretation of classification models. Firstly, iProps fully leverages evolutionary information and amino acid reduction information to propose or extend several numerical protein features that are independent of sequence length, including SC-PSSM, ORDip, TRC, CTDC-E, CKSAAGP-E, and so forth; at the same time, it also implements the calculation of 17 other numerical features within the software. iProps also provides feature combination operations for the aforementioned features to generate more hybrid features, and has added data balancing sampling processing as well as built-in classifier settings, among other functionalities. Thus, It can discern the most effective protein class recognition feature from a multitude of candidates, utilizing three automated machine learning algorithms to identify the most optimal classifiers and parameter settings. Furthermore, iProps generates a detailed explanatory report that includes 23 informative graphs derived from three interpretable models. To assess the performance of iProps, a series of numerical experiments were conducted using two well-established datasets. The results demonstrated that our software achieved superior recognition performance in every case. Beyond its contributions to bioinformatics, iProps broadens its applicability by offering robust data analysis tools that are beneficial across various disciplines, capitalizing on its automated machine learning and model interpretation capabilities. As an open-source platform, iProps is readily accessible and features an intuitive user interface, ensuring ease of use for individuals, even those without a background in programming. The source code of the software is available for download at the following website: https://github.com/LigosQ/iProps and https://gitee.com/LigosQ/iProps.