下部结构
聚类分析
星团(航天器)
计算机科学
相似性(几何)
优先次序
指纹(计算)
数据挖掘
数据库
选择(遗传算法)
模式识别(心理学)
人工智能
工程类
图像(数学)
程序设计语言
管理科学
结构工程
作者
Martin Ståhl,Harald Mauser
摘要
We present an efficient method to cluster large chemical databases in a stepwise manner. Databases are first clustered with an extended exclusion sphere algorithm based on Tanimoto coefficients calculated from Daylight fingerprints. Substructures are then extracted from clusters by iterative application of a maximum common substructure algorithm. Clusters with common substructures are merged through a second application of an exclusion sphere algorithm. In a separate step, singletons are compared to cluster substructures and added to a cluster if similarity is sufficiently high. The method identifies tight clusters with conserved substructures and generates singletons only if structures are truly distinct from all other library members. The method has successfully been applied to identify the most frequently occurring scaffolds in databases, for the selection of analogues of screening hits and in the prioritization of chemical libraries offered by commercial vendors.
科研通智能强力驱动
Strongly Powered by AbleSci AI