计算机科学
成对比较
相似性(几何)
蛋白质结构数据库
数据挖掘
情报检索
序列(生物学)
计算
比例(比率)
理论计算机科学
人工智能
算法
序列数据库
生物化学
量子力学
生物
遗传学
基因
图像(数学)
物理
化学
作者
Matthias Leinweber,Thomas Fober,Marc Strickert,Lars Baumgärtner,G. Klebe,Bernd Freisleben,Eyke Hüllermeier
出处
期刊:IEEE Transactions on Knowledge and Data Engineering
[Institute of Electrical and Electronics Engineers]
日期:2016-01-21
卷期号:28 (6): 1423-1434
被引量:8
标识
DOI:10.1109/tkde.2016.2520484
摘要
CavBase is a database containing information about the three-dimensional geometry and the physicochemical properties of putative protein binding sites. Analyzing CavBase data typically involves computing the similarity of pairs of binding sites. In contrast to sequence alignment, however, a structural comparison of protein binding sites is a computationally challenging problem, making large scale studies difficult or even infeasible. One possibility to overcome this obstacle is to precompute pairwise similarities in an all-against-all comparison, and to make these similarities subsequently accessible to data analysis methods. Pairwise similarities, once being computed, can also be used to equip CavBase with a neighborhood structure. Taking advantage of this structure, methods for problems such as similarity retrieval can be implemented efficiently. In this paper, we tackle the problem of performing an all-against-all comparison using CavBase, consisting of more than 200,000 protein cavities, by means of parallel computation and cloud computing techniques. We present the conceptual design and technical realization of a large-scale study to create a similarity database called CavSimBase. We illustrate how CavSimBase is constructed, is accessed, and is used to answer biological questions by data analysis and similarity retrieval.
科研通智能强力驱动
Strongly Powered by AbleSci AI