蛋白质-蛋白质相互作用
计算机科学
鉴定(生物学)
机器学习
数据挖掘
人工智能
化学
生物
生物化学
植物
作者
Bing Wang,Changqing Mei,Yuanyuan Wang,Yuming Zhou,Mu-Tian Cheng,Chun-Hou Zheng,Lei Wang,Jun Zhang,Peng Chen,Yan Xiong
标识
DOI:10.1109/tcbb.2019.2953908
摘要
Protein-protein interactions play essential roles in various biological progresses. Identifying protein interaction sites can facilitate researchers to understand life activities and therefore will be helpful for drug design. However, the number of experimental determined protein interaction sites is far less than that of protein sites in protein-protein interaction or protein complexes. Therefore, the negative and positive samples are usually imbalanced, which is common but bring result bias on the prediction of protein interaction sites by computational approaches. In this work, we presented three imbalance data processing strategies to reconstruct the original dataset, and then extracted protein features from the evolutionary conservation of amino acids to build a predictor for identification of protein interaction sites. On a dataset with 10,430 surface residues but only 2,299 interface residues, the imbalance dataset processing strategies can obviously reduce the prediction bias, and therefore improve the prediction performance of protein interaction sites. The experimental results show that our prediction models can achieve a better prediction performance, such as a prediction accuracy of 0.758, or a high F-measure of 0.737, which demonstrated the effectiveness of our method.
科研通智能强力驱动
Strongly Powered by AbleSci AI