离群值
计算机科学
异常检测
人工智能
超参数
机器学习
探测器
一级分类
监督学习
模式识别(心理学)
估计员
管道(软件)
二元分类
集合(抽象数据类型)
数据挖掘
分类器(UML)
支持向量机
数学
人工神经网络
统计
电信
程序设计语言
作者
Ángela Fernández,Juan Bella,José R. Dorronsoro
出处
期刊:Neurocomputing
[Elsevier BV]
日期:2022-02-25
卷期号:486: 77-92
被引量:24
标识
DOI:10.1016/j.neucom.2022.02.047
摘要
Outlier detection, i.e., the task of detecting points that are markedly different from the data sample, is an important challenge in machine learning. When a model is built, these special points can skew the model training and result in less accurate predictions. Due to this fact, it is important to identify and remove them before building any supervised model and this is often the first step when dealing with a machine learning problem. Nowadays, there exists a very large number of outlier detector algorithms that provide good results, but their main drawbacks are their unsupervised nature together with the hyperparameters that must be properly set for obtaining good performance. In this work, a new supervised outlier estimator is proposed. This is done by pipelining an outlier detector with a following a supervised model, in such a way that the targets of the later supervise how all the hyperparameters involved in the outlier detector are optimally selected. This pipeline-based approach makes it very easy to combine different outlier detectors with different classifiers and regressors. In the experiments done, nine relevant outlier detectors have been combined with three regressors over eight regression problems as well as with two classifiers over another eight binary and multi-class classification problems. The usefulness of the proposal as an objective and automatic way to optimally determine detector hyperparameters has been proven and the effectiveness of the nine outlier detectors has also been analyzed and compared.
科研通智能强力驱动
Strongly Powered by AbleSci AI