冗余(工程)
计算机科学
降维
过度拟合
数据挖掘
维数之咒
贝叶斯信息准则
特征选择
机器学习
人工智能
人工神经网络
操作系统
作者
Lei Luo,Ge He,Chen Chen,Xu Ji,Li Zhou,Yiyang Dai,Yagu Dang
标识
DOI:10.1021/acs.iecr.1c04926
摘要
Chemical process modeling is the basis for research and applications in related fields. With the development of industrial informatization, data-driven process modeling techniques are increasingly applied in chemical processes, helping to obtain more accurate results with less model development costs. However, due to the high-dimensional nonlinear characteristics of most chemical processes, problems such as the "curse of dimensionality" and information redundancy will render the models more prone to overfitting with reduced accuracies and weaker generalization abilities. Many data dimensionality reduction methods are adopted to mitigate the above problems, but most are limited by inaccurate association measurements and weak redundancy exclusion. In this paper, the extensive existence of data associations and information redundancies is first revealed through the analysis from an information-theoretic perspective. Then, a feature selection method based on conditional refined maximal information coefficient maximization (CRMICM) is proposed to improve the consistency of association measurement and the accuracy of redundancy estimation with limited samples. The final prediction modeling test for an actual fluidized catalytic cracking (FCC) process proves the extensive association between the variables and targets. Only a few variables are essential for the modeling, while the rest are redundant. Compared with other methods, CRMICM achieves the best dimensionality reduction effects on the FCC process data regarding the number of features and model prediction accuracy, showing its good applicability for chemical processes.
科研通智能强力驱动
Strongly Powered by AbleSci AI