核定位序列
NLS公司
人工智能
判别式
计算机科学
信号肽
支持向量机
多元统计
假阳性悖论
分类器(UML)
鉴定(生物学)
计算生物学
模式识别(心理学)
机器学习
自然语言处理
肽序列
核心
生物
生物化学
基因
细胞生物学
植物
作者
Yun Guo,Yang Yang,Yan Huang,Hong‐Bin Shen
标识
DOI:10.1016/j.ab.2019.113565
摘要
Nuclear localization signals (NLSs) are peptides that target proteins to the nucleus by binding to carrier proteins in the cytoplasm that transport their cargo across the nuclear membrane. Accurate identification of NLSs can help elucidate the functions of nuclear protein complexes. The currently known NLS predictors are usually specific to certain species or largely dependent on prior knowledge of NLS basic residues. Thus, a more general predictor is highly desired to reduce the potentially high false positives or false negatives in discovering new NLSs. Here, we report a new method, INSP (Identification Nucleus Signal Peptide), to effectively identify NLS mainly based on statistical knowledge and machine learning algorithms. In our NLS machine learning model, we considered the query protein sequence as text and extracted the sequence context features using a natural language model. These word-vector features encode discriminative knowledge of NLS motif frequency and are thus useful for model recognition. The output of the machine learning model will be fused with statistical knowledge of the query sequence to build a final multivariate regression model for NLS peptide identification. The experimental results demonstrate a promising performance of the new INSP approach. INSP is freely available at: www.csbio.sjtu.edu.cn/bioinf/INSP/for academic use.
科研通智能强力驱动
Strongly Powered by AbleSci AI