转录因子
计算生物学
序列母题
卷积神经网络
染色质
DNA测序
抄写(语言学)
生物
遗传学
主题(音乐)
DNA
计算机科学
DNA结合位点
人工智能
发起人
基因
基因表达
语言学
物理
哲学
声学
作者
An Zheng,Michael Lamkin,Hanqing Zhao,Cynthia Wu,Hao Su,Melissa Gymrek
标识
DOI:10.1038/s42256-020-00282-y
摘要
Transcription factors bind DNA by recognizing specific sequence motifs, which are typically 6–12 bp long. A motif can occur many thousands of times in the human genome, but only a subset of those sites are actually bound. Here we present a machine-learning framework leveraging existing convolutional neural network architectures and model interpretation techniques to identify and interpret sequence context features most important for predicting whether a particular motif instance will be bound. We apply our framework to predict binding at motifs for 38 transcription factors in a lymphoblastoid cell line, score the importance of context sequences at base-pair resolution and characterize context features most predictive of binding. We find that the choice of training data heavily influences classification accuracy and the relative importance of features such as open chromatin. Overall, our framework enables novel insights into features predictive of transcription factor binding and is likely to inform future deep learning applications to interpret non-coding genetic variants. The transcription process of DNA is highly complex and while short DNA sequence motifs recognized by transcription factors are well known, less is known about the context in the DNA sequence that determines whether a transcription factor will actually bind its motif. Zheng and colleagues present a method that uses convolutional neural networks to identify sequence features that help predict whether transcribing proteins can bind to their target sequences in DNA.
科研通智能强力驱动
Strongly Powered by AbleSci AI