人工智能
随机森林
机器学习
支持向量机
计算机科学
伪氨基酸组成
分类器(UML)
计算生物学
模式识别(心理学)
氨基酸
二肽
生物
生物化学
作者
Nadia Nadia,Ekta Gandotra,Narendra Kumar
标识
DOI:10.4015/s1016237222500508
摘要
The nucleotide-binding domain leucine-rich repeat-containing (NLR) proteins plays significant role in the intestinal tissue repair and innate immunity. It recently added to the members of innate immunity effectors molecules. It also plays an essential role in intestinal microbiota and recently emerged as a crucial hit for developing ulcerative colitis (UC) and colitis-associated cancer (CAC). A machine learning-based approach for predicting NLR proteins has been developed. In this study, we present a comparison of three supervised machine learning algorithms. Using ProtR and POSSUM Packages, the features are extracted for the dataset used in this work. The models are trained with the input compositional features generated using dipeptide composition, amino acid composition, etc., as well as Position Specific Scoring Matrix (PSSM) based compositions. The dataset consists of 390 proteins for the negative and positive datasets. The five-fold cross-validation (CV) is used to optimize Sequential Minimal Optimization (SMO) library of Support Vector Machine (LIBSVM) and Random Forest (RF) parameters, and the best model was selected. The proposed work performs rationally well with an accuracy of 90.91% and 93.94% for RF as the best classifier for the Amino Acid Composition (AAC) and PSE_PSSM-based model. We believe that this method is a reliable, rapid and useful prediction method for NLR Protein.
科研通智能强力驱动
Strongly Powered by AbleSci AI