预处理器
数据预处理
平滑的
选择(遗传算法)
计算机科学
特征选择
集合(抽象数据类型)
原始数据
数据挖掘
基础(线性代数)
校准
过程(计算)
人工智能
模式识别(心理学)
统计
数学
操作系统
几何学
计算机视觉
程序设计语言
作者
Jan Gerretzen,Ewa Szymańska,Jeroen J. Jansen,Jacob Bart,Henk‐Jan van Manen,Edwin R. van den Heuvel,L.M.C. Buydens
出处
期刊:Analytical Chemistry
[American Chemical Society]
日期:2015-11-19
卷期号:87 (24): 12096-12103
被引量:148
标识
DOI:10.1021/acs.analchem.5b02832
摘要
The selection of optimal preprocessing is among the main bottlenecks in chemometric data analysis. Preprocessing currently is a burden, since a multitude of different preprocessing methods is available for, e.g., baseline correction, smoothing, and alignment, but it is not clear beforehand which method(s) should be used for which data set. The process of preprocessing selection is often limited to trial-and-error and is therefore considered somewhat subjective. In this paper, we present a novel, simple, and effective approach for preprocessing selection. The defining feature of this approach is a design of experiments. On the basis of the design, model performance of a few well-chosen preprocessing methods, and combinations thereof (called strategies) is evaluated. Interpretation of the main effects and interactions subsequently enables the selection of an optimal preprocessing strategy. The presented approach is applied to eight different spectroscopic data sets, covering both calibration and classification challenges. We show that the approach is able to select a preprocessing strategy which improves model performance by at least 50% compared to the raw data; in most cases, it leads to a strategy very close to the true optimum. Our approach makes preprocessing selection fast, insightful, and objective.
科研通智能强力驱动
Strongly Powered by AbleSci AI