特征选择
可解释性
特征(语言学)
计算机科学
工作流程
选择(遗传算法)
降维
人工智能
理论(学习稳定性)
机器学习
k-最近邻算法
数据挖掘
模式识别(心理学)
数据库
语言学
哲学
作者
Wang Jun-ya,Pengcheng Xu,Xiaobo Ji,Minjie Li,Wencong Lu
标识
DOI:10.1016/j.mtcomm.2023.106910
摘要
Feature selection has kept playing a significant role in the workflow of materials machine learning, but currently most of works of materials machine learning tend to use single or stepwise feature selection methods. A new ensemble feature selection method named MIC-SHAP was proposed in this work, which combines the SHapley Additive exPlanations (SHAP) method and the maximal information coefficient (MIC) method. The effectiveness of the ensemble feature selection method was evaluated with three different material datasets collected from publications. The results have demonstrated that MIC-SHAP method outperforms the commonly used feature selection methods, guaranteeing the prediction accuracy and greatly reducing the model complexity. The highest feature reduction rate is 91.67%, while the R2 of the 10-fold cross-validation reaches 0.98. The MIC-SHAP method could quickly select the optimal feature subset effectively, avoiding repeated attempts of different feature selection methods. Moreover, the MIC-SHAP method could increase the stability and interpretability of feature selection to help the subsequent process of materials design and discovery.
科研通智能强力驱动
Strongly Powered by AbleSci AI