概化理论
计算机科学
代表(政治)
集合(抽象数据类型)
训练集
领域(数学分析)
反应性(心理学)
适用范围
样板房
生物信息学
机器学习
生化工程
数据挖掘
人工智能
数据科学
数量结构-活动关系
化学
工程类
数学
法学
程序设计语言
替代医学
量子力学
政治学
生物化学
病理
数学分析
物理
统计
基因
政治
医学
作者
Priyanka Raghavan,Brittany C. Haas,Madeline E. Ruos,Jules Schleinitz,Abigail G. Doyle,Sarah E. Reisman,Matthew S. Sigman,Connor W. Coley
出处
期刊:ACS central science
[American Chemical Society]
日期:2023-12-08
标识
DOI:10.1021/acscentsci.3c01163
摘要
Models can codify our understanding of chemical reactivity and serve a useful purpose in the development of new synthetic processes via, for example, evaluating hypothetical reaction conditions or in silico substrate tolerance. Perhaps the most determining factor is the composition of the training data and whether it is sufficient to train a model that can make accurate predictions over the full domain of interest. Here, we discuss the design of reaction datasets in ways that are conducive to data-driven modeling, emphasizing the idea that training set diversity and model generalizability rely on the choice of molecular or reaction representation. We additionally discuss the experimental constraints associated with generating common types of chemistry datasets and how these considerations should influence dataset design and model building.
科研通智能强力驱动
Strongly Powered by AbleSci AI