DNA结合位点
DNA测序
序列(生物学)
计算机科学
基因组学
DNA
计算生物学
转录因子
特征(语言学)
人工智能
模式识别(心理学)
结合位点
序列母题
数据挖掘
基因组
生物
基因
遗传学
发起人
基因表达
语言学
哲学
作者
Yangyang Li,Jie Liu,Hao Liu
标识
DOI:10.1145/3469877.3497696
摘要
Knowing transcription factor binding sites (TFBS) is essential to model underlying binding mechanisms and cellular functions. Studies have shown that in addition to the DNA sequence, the shape information of DNA is also an important factor affecting its activity. Here, we developed a CNN model to integrate 3D DNA shape information derived using a high-throughput method for predicting TF binding sites (TFBSs). We identify the best performing architectures by varying CNN window size, kernels, hidden nodes and hidden layers. The performance of the two types of data and their combination was evaluated using 69 different ChIP-seq [1] experiments. Our results showed that the model integrating shape information and sequence information compared favorably to the sequence-based model This work combines knowledge from structural biology and genomics, and DNA shape features improved the description of TF binding specificity.
科研通智能强力驱动
Strongly Powered by AbleSci AI