特征选择
特征(语言学)
计算机科学
癌症
机器学习
集成学习
模式识别(心理学)
人工智能
医学
哲学
语言学
内科学
作者
Jilong Bian,Xuan Liu,Guanghui Dong,Hou Chang,Shan Huang,Dandan Zhang
标识
DOI:10.1016/j.compbiomed.2024.108063
摘要
Cancer is a serious malignant tumor and is difficult to cure. Chemotherapy, as a primary treatment for cancer, causes significant harm to normal cells in the body and is often accompanied by serious side effects. Recently, anti-cancer peptides (ACPs) as a type of protein for treating cancers dominated research into the development of new anti-tumor drugs because of their ability to specifically target and destroy cancer cells. The screening of proteins with cancer-inhibiting properties from a large pool of proteins is key to the development of anti-tumor drugs. However, it is expensive and inefficient to accurately identify protein functions only through biological experiments due to their complex structure. Therefore, we propose a new prediction model ACP-ML to effectively predict ACPs. In terms of feature extraction, DPC, PseAAC, CTDC, CTDT and CS-Pse-PSSM features were used and the most optimal feature set was selected by comparing combinations of these features. Then, a two-step feature selection process using MRMD and RFE algorithms was performed to determine the most crucial features from the most optimal feature set for identifying ACPs. Furthermore, we assessed the classification accuracy of single learning models and different strategies-based ensemble models through ten-fold cross-validation. Ultimately, a voting-based ensemble learning method is developed to predict ACPs. To validate its effectiveness, two independent test sets were used to perform tests, achieving accuracy of 90.891 % and 92.578 % respectively. Compared with existing anticancer peptide prediction algorithms, the proposed feature processing method is more effective, and the proposed ensemble model ACP-ML exhibits stronger generalization capability and higher accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI