Metrics for Benchmarking and Uncertainty Quantification: Quality, Applicability, and Best Practices for Machine Learning in Chemistry

标杆管理 计算机科学 机器学习 质量(理念) 数据科学 管理科学 工程类 化学 认识论 哲学 经济 管理
作者
Gaurav Vishwakarma,Aditya Sonpal,Johannes Hachmann
出处
期刊:Trends in chemistry [Elsevier BV]
卷期号:3 (2): 146-156 被引量:49
标识
DOI:10.1016/j.trechm.2020.12.004
摘要

As machine learning (ML) is gaining an increasingly prominent role in chemical research, so is the need to assess the quality and applicability of ML models, compare different ML models, and develop best-practice guidelines for their design and utilization. Statistical loss function metrics and uncertainty quantification techniques are key issues in this context. Different analyses highlight different facets of a model’s performance, and a compilation of metrics, as opposed to a single metric, allows for a well-rounded understanding of what can be expected from a model. They also allow us to identify unexplored regions of chemical space and pursue their survey. Metrics can thus make an important contribution to further democratize ML in chemistry; promote best practices; provide context to predictions and methodological developments; lend trust, legitimacy, and transparency to results from ML studies; and ultimately advance chemical domain knowledge. This review aims to draw attention to two issues of concern when we set out to make machine learning work in the chemical and materials domain, that is, statistical loss function metrics for the validation and benchmarking of data-derived models, and the uncertainty quantification of predictions made by them. They are often overlooked or underappreciated topics as chemists typically only have limited training in statistics. Aside from helping to assess the quality, reliability, and applicability of a given model, these metrics are also key to comparing the performance of different models and thus for developing guidelines and best practices for the successful application of machine learning in chemistry. This review aims to draw attention to two issues of concern when we set out to make machine learning work in the chemical and materials domain, that is, statistical loss function metrics for the validation and benchmarking of data-derived models, and the uncertainty quantification of predictions made by them. They are often overlooked or underappreciated topics as chemists typically only have limited training in statistics. Aside from helping to assess the quality, reliability, and applicability of a given model, these metrics are also key to comparing the performance of different models and thus for developing guidelines and best practices for the successful application of machine learning in chemistry. in a binary classification problem, each sample belongs to either one class or the other (i.e., it has a known probability of 1.0 for one class and 0.0 for the other). A classifier model can estimate the probability of a sample belonging to each class. The binary cross-entropy is used as a metric to assess the difference between the two probability distributions and thus the uncertainty of a classifier’s prediction. (Also see cross-entropy, categorical cross-entropy, and log loss.) for multiclass classification problems, that is, for problems involving more than two categories (classes) of data, the cross-entropy measures the difference between the probability distribution of a sample belonging to one class and the probability distribution of that sample not belonging to that class (i.e. belonging to any of the other classes). This metric is known as categorical cross-entropy. (Also see binary cross-entropy.) a measure of the difference between two probability distributions for a given set of samples. (Also see binary cross-entropy, categorical cross-entropy, and log loss.) This is a heuristic-based approach inspired by natural selection in biological processes (i.e., survival of the fittest). It is typically employed to tackle (combinatorial) optimization problems, in which gradients (needed for gradient descent methods) are ill-defined (e.g., in problems involving discrete or categorical variables) or otherwise inaccessible. Each possible solution behaves as an individual in a population of solutions and a fitness function (itself a loss function metric) is used to determine its quality. Evolutionary optimization of the population takes place via reproduction, mutation, crossover, and selection iterations. This is a loss function metric that assesses the quality of a solution with respect to an objective of an optimization. Its output can be maximized or minimized (e.g., as part of an evolutionary algorithm). one of multiple types of mean value metrics. Given a set of sample values, the harmonic mean is the inverse of the arithmetic mean of the inverse of the sample values. in ML, hyperparameters are the parameters that define the structure of a model and control the learning process, as opposed to other parameters that are derived (‘learned’) from the data in the course of training the model. the negative logarithm of the likelihood of a set of observations given a model’s parameters. While log loss and cross-entropy are not the same by definition, they calculate the same quantity when used as fitness functions. In practice, the two terms are thus often used interchangeably. statistical error metrics used to assess the performance of ML models and the quality of their predictions. a technique to transform the feature basis, in which a set of data is described, into a basis that is adapted to the nature of the given data. The principal components are the eigenvectors of the covariance matrix of the data set. this metric is used to assess the similarity between the finite feature (e.g., descriptor, fingerprint) vectors of two samples. The similarity ranges from 0 to 1, with 0 indicating no point of intersection between the two vectors and 1 revealing completely identical vectors.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
舒服的白云完成签到,获得积分10
刚刚
zou完成签到,获得积分10
刚刚
忆茶戏发布了新的文献求助10
1秒前
酷波er应助幸运星采纳,获得10
3秒前
华仔应助张雯思采纳,获得10
3秒前
CipherSage应助张雯思采纳,获得30
3秒前
星辰大海应助张雯思采纳,获得10
3秒前
搜集达人应助张雯思采纳,获得10
3秒前
3秒前
田様应助Cl1audia采纳,获得10
3秒前
123发布了新的文献求助10
4秒前
4秒前
搜集达人应助Ma采纳,获得10
4秒前
大模型应助孟欣玥采纳,获得10
4秒前
木可发布了新的文献求助10
5秒前
泊頔完成签到,获得积分10
7秒前
lrq发布了新的文献求助10
8秒前
8秒前
wwl关闭了wwl文献求助
10秒前
mr_chxb82发布了新的文献求助10
10秒前
阿智完成签到,获得积分10
10秒前
大写的LV完成签到 ,获得积分10
11秒前
麦子发布了新的文献求助10
12秒前
852应助wbh采纳,获得10
13秒前
17秒前
lzx发布了新的文献求助10
18秒前
王彩香发布了新的文献求助10
19秒前
五六七发布了新的文献求助150
20秒前
20秒前
mr_chxb82完成签到,获得积分10
20秒前
量子星尘发布了新的文献求助10
21秒前
小二郎应助麦子采纳,获得10
22秒前
清脆凡阳完成签到 ,获得积分10
22秒前
香蕉觅云应助小晓采纳,获得10
23秒前
科研通AI2S应助wwl采纳,获得10
25秒前
28秒前
眉间尺完成签到,获得积分10
30秒前
DC发布了新的文献求助10
31秒前
我们发布了新的文献求助10
33秒前
贪玩的野狍子完成签到,获得积分20
34秒前
高分求助中
A new approach to the extrapolation of accelerated life test data 1000
ACSM’s Guidelines for Exercise Testing and Prescription, 12th edition 500
‘Unruly’ Children: Historical Fieldnotes and Learning Morality in a Taiwan Village (New Departures in Anthropology) 400
Indomethacinのヒトにおける経皮吸収 400
Phylogenetic study of the order Polydesmida (Myriapoda: Diplopoda) 370
基于可调谐半导体激光吸收光谱技术泄漏气体检测系统的研究 350
Robot-supported joining of reinforcement textiles with one-sided sewing heads 320
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 3989444
求助须知:如何正确求助?哪些是违规求助? 3531531
关于积分的说明 11254250
捐赠科研通 3270191
什么是DOI,文献DOI怎么找? 1804901
邀请新用户注册赠送积分活动 882105
科研通“疑难数据库(出版商)”最低求助积分说明 809174