Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms

史密斯-沃特曼算法 生物 序列比对 序列数据库 相似性(几何) 蛋白质测序 序列(生物学) 多序列比对 灵敏度(控制系统) 生物信息学 计算生物学 算法 计算机科学 遗传学 肽序列 人工智能 基因 电子工程 图像(数学) 工程类
作者
William H. Pearson
出处
期刊:Genomics [Elsevier]
卷期号:11 (3): 635-650 被引量:519
标识
DOI:10.1016/0888-7543(91)90071-l
摘要

The sensitivity and selectivity of the FASTA and the Smith-Waterman protein sequence comparison algorithms were evaluated using the superfamily classification provided in the National Biomedical Research Foundation/Protein Identification Resource (PIR) protein sequence database. Sequences from each of the 34 superfamilies in the PIR database with 20 or more members were compared against the protein sequence database. The similarity scores of the related and unrelated sequences were determined using either the FASTA program or the Smith-Waterman local similarity algorithm. These two sets of similarity scores were used to evaluate the ability of the two comparison algorithms to identify distantly related protein sequences. The FASTA program using the ktup = 2 sensitivity setting performed as well as the Smith-Waterman algorithm for 19 of the 34 superfamilies. Increasing the sensitivity by setting ktup = 1 allowed FASTA to perform as well as Smith-Waterman on an additional 7 superfamilies. The rigorous Smith-Waterman method performed better than FASTA with ktup = 1 on 8 superfamilies, including the globins, immunoglobulin variable regions, calmodulins, and plastocyanins. Several strategies for improving the sensitivity of FASTA were examined. The greatest improvement in sensitivity was achieved by optimizing a band around the best initial region found for every library sequence. For every superfamily except the globins and immunoglobulin variable regions, this strategy was as sensitive as a full Smith-Waterman. For some sequences, additional sensitivity was achieved by including conserved but nonidentical residues in the lookup table used to identify the initial region.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
万能图书馆应助章如豹采纳,获得10
2秒前
2秒前
Ava应助北阳采纳,获得10
2秒前
jk0707完成签到 ,获得积分10
2秒前
2秒前
小黄doge完成签到,获得积分10
2秒前
3秒前
3秒前
天高任鸟飞完成签到,获得积分10
3秒前
彭于晏应助快来拾糖采纳,获得10
4秒前
4秒前
dingyn-2完成签到,获得积分10
4秒前
完美世界应助kti采纳,获得10
5秒前
5秒前
李小狼不浪完成签到,获得积分10
5秒前
枯叶蝶发布了新的文献求助10
7秒前
乐乐应助RAY采纳,获得30
7秒前
7秒前
zyszys发布了新的文献求助10
7秒前
Akim应助驰驰采纳,获得10
8秒前
芒果发布了新的文献求助10
8秒前
9秒前
itszoefff发布了新的文献求助10
10秒前
DIDIDA发布了新的文献求助10
10秒前
11秒前
12秒前
12秒前
12秒前
王青青发布了新的文献求助10
12秒前
Punch完成签到,获得积分10
13秒前
英姑应助标致的世立采纳,获得10
13秒前
13秒前
zyszys完成签到,获得积分10
14秒前
当当羊.发布了新的文献求助10
15秒前
15秒前
15秒前
快来拾糖发布了新的文献求助10
16秒前
17秒前
summer发布了新的文献求助10
17秒前
改改完成签到,获得积分10
17秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Mechanistic Modeling of Gas-Liquid Two-Phase Flow in Pipes 2500
Kelsen’s Legacy: Legal Normativity, International Law and Democracy 1000
Conference Record, IAS Annual Meeting 1977 610
Interest Rate Modeling. Volume 3: Products and Risk Management 600
Interest Rate Modeling. Volume 2: Term Structure Models 600
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3543314
求助须知:如何正确求助?哪些是违规求助? 3120695
关于积分的说明 9343843
捐赠科研通 2818781
什么是DOI,文献DOI怎么找? 1549765
邀请新用户注册赠送积分活动 722233
科研通“疑难数据库(出版商)”最低求助积分说明 713090