机器学习
基因组学
人工智能
计算机科学
度量(数据仓库)
数据科学
计算生物学
随机森林
选择(遗传算法)
数据挖掘
生物
基因组
基因
遗传学
标识
DOI:10.1177/11779322241249562
摘要
In recent years, several machine learning (ML) approaches have been proposed to predict gene expression signal and chromatin features from the DNA sequence alone. These models are often used to deduce and, to some extent, assess putative new biological insights about gene regulation, and they have led to very interesting advances in regulatory genomics. This article reviews a selection of these methods, ranging from linear models to random forests, kernel methods, and more advanced deep learning models. Specifically, we detail the different techniques and strategies that can be used to extract new gene-regulation hypotheses from these models. Furthermore, because these putative insights need to be validated with wet-lab experiments, we emphasize that it is important to have a measure of confidence associated with the extracted hypotheses. We review the procedures that have been proposed to measure this confidence for the different types of ML models, and we discuss the fact that they do not provide the same kind of information.
科研通智能强力驱动
Strongly Powered by AbleSci AI