特征选择
支持向量机
偏最小二乘回归
人工智能
人工神经网络
多层感知器
线性回归
计算机科学
均方误差
机器学习
回归
极限学习机
模式识别(心理学)
数据挖掘
数学
统计
作者
Pál Péter Hanzelik,Szilveszter Gergely,C. Gaspar,László Győry
摘要
Abstract Interests in the use of chemometric and data science methods for laboratory techniques have grown rapidly over the last 10 years, for the reason that they are cheaper and faster than traditional analytical methods of material testing. This study uses 888 rock samples collected from the exploration and production (E&P) sector of the oil industry. Based on the Fourier‐transform infrared (FT‐IR) spectra of these rock samples their solubility predictions have been developed and investigated with nine methods including both linear and non‐linear ones. Two of these methods such as Partial Least Squares Regression (PLSR) and Support Vector Regression (SVR) are available in a commercial software package and the other seven methods, Extreme Gradient Boosting (XGBoost), Ridge Regression (RR), k ‐nearest neighbours ( k ‐NN), Decision Tree (DT), Multilayer Perceptron (MLP), Support Vector Regression (SVR), Artificial Neural Network (ANN) with TensorFlow (TF), were coded by the authors based either on commercial applications or open source libraries. The investigation starts with spectral data pre‐processing carried out by standard normal variate (SNV), baseline correction and feature selection methods creating the feature set for all machine learning (ML) applications. The accuracy of predictions has been evaluated with mean squared error as a performance metric for each investigated method. The comparisons of predicted values to real data of test samples have shown that mineral solubility in acids can be well predicted in the range of the uncertainties of real laboratory measurements, therefore it can be used to improve the response time of these investigations and reduce the risk in industrial applications. In those cases, where the unknown samples have got some out of the range features, the limitations in the accuracy of predictions have become clear. We have also identified the limitations in the methodology and planned steps to further improve the prediction capabilities. The identified constraint of samples' multitude further emphasizes the need for database building efforts, so that the real potential in big data and machine learning can be realized.
科研通智能强力驱动
Strongly Powered by AbleSci AI