摘要
In the context of the recently emerging COVID-19 pandemic, we developed a deep learning model that can be used to predict the inhibitory activity of 3CLpro in severe acute respiratory syndrome coronavirus (SARS-CoV) for unknown compounds during the virtual screening process. This paper proposes a novel deep learning-based method to implement virtual screening with convolutional neural network (CNN) architecture. The descriptors represent chemical molecules, and these descriptors are input into the CNN framework to train a model and predict active compounds. When compared to other machine learning methods, including random forest, naive Bayes, decision tree, and support vector machine, the proposed CNN model's evaluation of the test set showed an accuracy of 0.86, a sensitivity of 0.45, a specificity of 0.96, a precision of 0.73, a recall of 0.45, an F-measure of 0.55, and a ROC of 0.71. The CNN model screened 17 out of 918 phytochemical compounds; 60 out of 423 from the natural product NCI divset IV; 17,831 out of 112,267 from the ZINC natural product database; and 315 out of 1556 FDA-approved drugs as anti-SARS-CoV agents. Further, to prioritize drug-like compounds, Lipinski's rule of five was applied to screen anti-SARS-CoV compounds (excluding FDA-approved drugs), resulting in 10, 59, and 14,025 hit molecules. Out of 10 phytochemical compounds, 9 anti-SARS-CoV agents belonged to the flavonoid group. In conclusion, the proposed CNN model can prove useful for developing novel target-specific anti-SARS-CoV compounds.