支持向量机
随机森林
计算机科学
材料信息学
财产(哲学)
机器学习
人工智能
克里金
电负性
解析
数据挖掘
集成学习
化学
健康信息学
认识论
公共卫生
哲学
工程信息学
护理部
医学
有机化学
作者
Sangjoon Lee,Clio Chen,Griheydi Garcia,Anton O. Oliynyk
出处
期刊:Data in Brief
[Elsevier]
日期:2024-02-09
卷期号:53: 110178-110178
被引量:2
标识
DOI:10.1016/j.dib.2024.110178
摘要
Materials informatics employs data-driven approaches for analysis and discovery of materials. Features also referred to as descriptors are essential in generating reliable and accurate machine-learning models. While general data can be obtained through public and commercial sources, features must be tailored to specific applications. Common featurizers suitable for generic chemical problems may not be effective in features-property mapping in solid-state materials with ML models. Here, we have assembled the Oliynyk property list for compositional feature generation, which performs well on limited datasets (50 to 1,000 training data points) in the solid-state materials domain. The dataset contains 98 elemental features for atomic numbers from 1 to 92, including thermodynamic properties, electronic structure data, size, electronegativity, and bulk properties such as melting point, density, and conductivity. The dataset has been utilized peer-reviewed publications in predicting material hardness, classification, discovery of novel Heusler compounds, band gap prediction, and determining the site preference of atoms using machine learning models including support vector machines, random forests for classification, and support vector regression for regression problems. We have compiled the dataset by parsing data from publicly available databases and literature and further supplementing it by interpolating values with Gaussian process regression.
科研通智能强力驱动
Strongly Powered by AbleSci AI