作者
Maximilian Grimm,Yang Liu,Xiaocong Yang,Chunya Bu,Zhi‐Xiong Jim Xiao,Yang Cao
摘要
Ligand-similarity-based virtual screening is one of the most applicable computer-aided drug design techniques. The current methodology relies heavily on several descriptors of molecular features, including atoms (zero-dimensional, 0D), the presence or absence of structural features (one-dimensional, 1D), topological descriptors (two-dimensional, 2D), geometry and volume (three-dimensional, 3D), or stereoelectronic and stereodynamic properties (four-dimensional, 4D). These descriptors have been frequently used in virtual screening; however, they are usually used independently without integration, which may hinder effective and precise virtual screening. In this study, we developed a multifeature integration algorithm named LigMate, which employs a Hungarian algorithm-based matching and a machine learning-based nonlinear combination of various descriptors, including the new relevant descriptors focusing on the maximum common substructures (maximum common substructure score, MCSS), the relative distance of atoms from the ligand mass center (intraligand distance score, ILDS), as well as the ring differences (ring score, RS). In the benchmark tests, LigMate achieved an overall enrichment factor of the first percent (EF1) of 36.14 and an area under the curve (AUC) value of 0.81 on the DUD-E data set, as well as an EF1 of 15.44 and an AUC of 0.69 on the maximum unbiased validation (MUV) data set, outperforming the control methods that are based on single descriptors. Thus, our study provides a new framework for multiple feature integration, which can benefit ligand-similarity-based virtual screening. LigMate is freely available for noncommercial users at http://cao.labshare.cn/ligmate/.