过度拟合
计算机科学
人工智能
机器学习
集合(抽象数据类型)
人工神经网络
冗余(工程)
质量(理念)
训练集
培训(气象学)
作者
Cheng Shang,Zhi-Pan Liu
出处
期刊:Elsevier eBooks
[Elsevier]
日期:2023-01-01
卷期号:: 313-327
标识
DOI:10.1016/b978-0-323-90049-2.00018-4
摘要
The high-quality training dataset is of great importance to the success of machine learning (ML) applications. For generating ML potentials to describe multidimensional potential energy surfaces (PESs), an ideal training set should be not only large enough to include all representative atomic configurations of interest but also as small as possible to reduce the cost in performing quantum chemistry calculations. The traditional way to generate a training set is often inefficient and empirical, requiring intensive manpower, which could introduce a high redundancy in geometrical features of low-energy structures and then cause overfitting because of the imbalance of data density. In this chapter, we will introduce the active learning (AL) algorithm, a subclass of supervised ML, in generating ML potentials aiming at automatically optimizing the quality of the training dataset. Three widely used strategies in the AL algorithm to expand the training set are presented and discussed in connection with their applications to different ML potentials. We also illustrate how the AL algorithm can help to build a high-quality training dataset and thus train a global neural network (G-NN) potential, as shown in the example of the Li system.
科研通智能强力驱动
Strongly Powered by AbleSci AI