最大熵原理
概率逻辑
RNA剪接
熵(时间箭头)
计算机科学
剪接
序列(生物学)
Kullback-Leibler散度
最大熵马尔可夫模型
算法
人工智能
最大熵概率分布
计算生物学
序列母题
统计模型
集合(抽象数据类型)
数学
选择性拼接
核糖核酸
概率分布
模式识别(心理学)
数据挖掘
联合熵
隐马尔可夫模型
作者
G Yeo,Christopher B. Burge
标识
DOI:10.1089/1066527041410418
摘要
We propose a framework for modeling sequence motifs based on the maximum entropy principle (MEP). We recommend approximating short sequence motif distributions with the maximum entropy distribution (MED) consistent with low-order marginal constraints estimated from available data, which may include dependencies between nonadjacent as well as adjacent positions. Many maximum entropy models (MEMs) are specified by simply changing the set of constraints. Such models can be utilized to discriminate between signals and decoys. Classification performance using different MEMs gives insight into the relative importance of dependencies between different positions. We apply our framework to large datasets of RNA splicing signals. Our best models out-perform previous probabilistic models in the discrimination of human 5' (donor) and 3' (acceptor) splice sites from decoys. Finally, we discuss mechanistically motivated ways of comparing models.
科研通智能强力驱动
Strongly Powered by AbleSci AI