作者
Mingyou Liu,Hongmei Liu,Tao Wu,Yingxue Zhu,Yuwei Zhou,Ziru Huang,Changcheng Xiang,Jian Huang
摘要
The ongoing COVID-19 pandemic has caused dramatic loss of human life. There is an urgent need for safe and efficient anti-coronavirus infection drugs. Anti-coronavirus peptides (ACovPs) can inhibit coronavirus infection. With high-efficiency, low-toxicity, and broad-spectrum inhibitory effects on coronaviruses, they are promising candidates to be developed into a new type of anti-coronavirus drug. Experiment is the traditional way of ACovPs' identification, which is less efficient and more expensive. With the accumulation of experimental data on ACovPs, computational prediction provides a cheaper and faster way to find anti-coronavirus peptides' candidates. In this study, we ensemble several state-of-the-art machine learning methodologies to build nine classification models for the prediction of ACovPs. These models were pre-trained using deep neural networks, and the performance of our ensemble model, ACP-Dnnel, was evaluated across three datasets and independent dataset. We followed Chou's 5-step rules. (1) we constructed the benchmark datasets data1, data2, and data3 for training and testing, and introduced the independent validation dataset ACVP-M; (2) we analyzed the peptides sequence composition feature of the benchmark dataset; (3) we constructed the ACP-Dnnel model with deep convolutional neural network (DCNN) merged the bi-directional long short-term memory (BiLSTM) as the base model for pre-training to extract the features embedded in the benchmark dataset, and then, nine classification algorithms were introduced to ensemble together for classification prediction and voting together; (4) tenfold cross-validation was introduced during the training process, and the final model performance was evaluated; (5) finally, we constructed a user-friendly web server accessible to the public at http://150.158.148.228:5000/ . The highest accuracy (ACC) of ACP-Dnnel reaches 97%, and the Matthew's correlation coefficient (MCC) value exceeds 0.9. On three different datasets, its average accuracy is 96.0%. After the latest independent dataset validation, ACP-Dnnel improved at MCC, SP, and ACC values 6.2%, 7.5% and 6.3% greater, respectively. It is suggested that ACP-Dnnel can be helpful for the laboratory identification of ACovPs, speeding up the anti-coronavirus peptide drug discovery and development. We constructed the web server of anti-coronavirus peptides' prediction and it is available at http://150.158.148.228:5000/ .