噪音(视频)
计算机科学
人工智能
滤波器(信号处理)
噪声测量
机器学习
超参数
模式识别(心理学)
集合(抽象数据类型)
数据挖掘
降噪
计算机视觉
图像(数学)
程序设计语言
作者
Chuang Li,Zhizhong Mao
标识
DOI:10.1016/j.eswa.2023.120422
摘要
The quality of training data plays a decisive role in the establishment of intelligent models. Since raw data obtained from the real world are usually entwined with noise due to variety of causes, noise filtering has become an important aspect of machine learning techniques. In contrast with the extensive research conducted on noise elimination for classification purposes, papers addressing this problem for regression tasks are rather scarce. In this paper, we propose a novel noise filter to clean noisy instances with real-valued label noise. Aiming at the deficiency of the existing noise determination criterion, a new adaptive threshold-based method is first proposed. It allows a noisy instance to be adaptively defined according to the fitting difficulty levels of different datasets, and areas with different densities. Embedded with this criterion, an effective noise filtering procedure is also designed. An ensemble filtering scheme and an iterative filtering process are combined to detect as many potential noisy samples as possible from the original training set. According to the acquire noise detection information, a noise score for evaluating the noise level is specifically developed. The potential noisy samples whose scores exceed a reasonable threshold are further filtered, which can compensate for the possible errors incurred during the previous procedure, and contribute to more reliable filtering results. The validity of the proposed method is studied in exhaustive experiments. We discuss reasonable hyperparameters, and compare the developed method with several state-of-the-art noise filters. The outcomes show that the prediction accuracy of the utilized regressor can greatly benefit from preprocessing the given raw dataset by using our method. Simultaneously, the method is able to acquire a good balance between the elimination of noisy samples and the retention of clean samples, and consistently achieves a better noise filtering performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI