基因组学
表观遗传学
计算机科学
软件
机器学习
功能基因组学
人工智能
GSM演进的增强数据速率
数据科学
生物
基因组
遗传学
基因
基因表达
DNA甲基化
程序设计语言
作者
Sean Whalen,Jacob Schreiber,William Stafford Noble,Katherine S. Pollard
标识
DOI:10.1038/s41576-021-00434-9
摘要
The scale of genetic, epigenomic, transcriptomic, cheminformatic and proteomic data available today, coupled with easy-to-use machine learning (ML) toolkits, has propelled the application of supervised learning in genomics research. However, the assumptions behind the statistical models and performance evaluations in ML software frequently are not met in biological systems. In this Review, we illustrate the impact of several common pitfalls encountered when applying supervised ML in genomics. We explore how the structure of genomics data can bias performance evaluations and predictions. To address the challenges associated with applying cutting-edge ML methods to genomics, we describe solutions and appropriate use cases where ML modelling shows great potential.
科研通智能强力驱动
Strongly Powered by AbleSci AI