计算机科学
计算生物学
支持向量机
伪氨基酸组成
水准点(测量)
人工智能
组蛋白
赖氨酸
生物
模式识别(心理学)
遗传学
氨基酸
基因
亚细胞定位
大地测量学
地理
作者
Wang‐Ren Qiu,Bi‐Qian Sun,Hua Tang,Jian Huang,Hao Lin
标识
DOI:10.1016/j.artmed.2017.02.007
摘要
Lysine crotonylation (Kcr) is a newly discovered histone posttranslational modification, which is specifically enriched at active gene promoters and potential enhancers in mammalian cell genomes. Although lysine crotonylation sites can be correctly identified with high-resolution mass spectrometry, the experimental methods are time-consuming and expensive. Therefore, it is necessary to develop computational methods to deal with this problem. We proposed a new encoding scheme named position weight amino acid composition to extract sequence information of histone around crotonylation sites. We chose protein data from Uniprot database. A series of steps were used to construct a strict and objective benchmark dataset for training and testing the proposed method. All samples were characterized by a significant number of features derived from position weight amino acid composition. The support vector machine was used to perform classification. Based on a series of experiments, we found that the sensitivity (Sn), specificity (Sp), accuracy (Acc), and Matthew’s correlation coefficient (MCC) were respectively 71.69%, 98.7%, 94.43%, and 0.778 in jackknife cross-validation. Comparison results demonstrated that our proposed model was better than random forest algorithm. We also performed the feature analysis on samples. Identification of the Kcr sites in histone is an indispensable step for decoding protein function. Therefore, the method can promote the deep understanding of the physiological roles of crotonylation and provide useful information for developing drugs to treat various diseases associated with crotonylation.
科研通智能强力驱动
Strongly Powered by AbleSci AI