主题(音乐)
成对比较
结构母题
保守序列
计算生物学
生物
化学
进化生物学
物理
遗传学
肽序列
计算机科学
生物化学
人工智能
基因
声学
作者
Jackson C. Halpin,Amy E. Keating
摘要
Abstract Protein–protein interactions are often mediated by a modular peptide recognition domain binding to a short linear motif (SLiM) in the disordered region of another protein. To understand the features of SLiMs that are important for binding and to identify motif instances that are important for biological function, it is useful to examine the evolutionary conservation of motifs across homologous proteins. However, the intrinsically disordered regions (IDRs) in which SLiMs reside evolve rapidly. Consequently, multiple sequence alignment (MSA) of IDRs often misaligns SLiMs and underestimates their conservation. We present PairK (pairwise k‐mer alignment), an MSA‐free method to align and quantify the relative local conservation of subsequences within an IDR. Lacking a ground truth for conservation, we tested PairK on the task of distinguishing biologically important motif instances from background motifs, under the assumption that biologically important motifs are more conserved. The method outperforms both standard MSA‐based conservation scores and a modern LLM‐based conservation score predictor. PairK can quantify conservation over wider phylogenetic distances than MSAs, indicating that some SLiMs are more conserved than MSA‐based metrics imply. PairK is available as an open‐source python package at https://github.com/jacksonh1/pairk . It is designed to be easily adapted for use with other SLiM tools and for diverse applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI