A label noise filtering method for regression based on adaptive threshold and noise score

噪音(视频) 计算机科学 人工智能 滤波器(信号处理) 噪声测量 机器学习 超参数 模式识别(心理学) 集合(抽象数据类型) 数据挖掘 降噪 计算机视觉 图像(数学) 程序设计语言
作者
Chuang Li,Zhizhong Mao
出处
期刊:Expert Systems With Applications [Elsevier]
卷期号:228: 120422-120422 被引量:9
标识
DOI:10.1016/j.eswa.2023.120422
摘要

The quality of training data plays a decisive role in the establishment of intelligent models. Since raw data obtained from the real world are usually entwined with noise due to variety of causes, noise filtering has become an important aspect of machine learning techniques. In contrast with the extensive research conducted on noise elimination for classification purposes, papers addressing this problem for regression tasks are rather scarce. In this paper, we propose a novel noise filter to clean noisy instances with real-valued label noise. Aiming at the deficiency of the existing noise determination criterion, a new adaptive threshold-based method is first proposed. It allows a noisy instance to be adaptively defined according to the fitting difficulty levels of different datasets, and areas with different densities. Embedded with this criterion, an effective noise filtering procedure is also designed. An ensemble filtering scheme and an iterative filtering process are combined to detect as many potential noisy samples as possible from the original training set. According to the acquire noise detection information, a noise score for evaluating the noise level is specifically developed. The potential noisy samples whose scores exceed a reasonable threshold are further filtered, which can compensate for the possible errors incurred during the previous procedure, and contribute to more reliable filtering results. The validity of the proposed method is studied in exhaustive experiments. We discuss reasonable hyperparameters, and compare the developed method with several state-of-the-art noise filters. The outcomes show that the prediction accuracy of the utilized regressor can greatly benefit from preprocessing the given raw dataset by using our method. Simultaneously, the method is able to acquire a good balance between the elimination of noisy samples and the retention of clean samples, and consistently achieves a better noise filtering performance.

科研通智能强力驱动
Strongly Powered by AbleSci AI

祝大家在新的一年里科研腾飞
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
领导范儿应助舒心的雍采纳,获得10
1秒前
晴天完成签到 ,获得积分10
2秒前
chenzhuod完成签到,获得积分10
2秒前
3秒前
AmyHu完成签到,获得积分10
3秒前
woaikeyan完成签到 ,获得积分10
3秒前
NexusExplorer应助梁寒采纳,获得10
5秒前
5秒前
6秒前
yingzaifeixiang完成签到 ,获得积分10
7秒前
田様应助科研通管家采纳,获得10
7秒前
Hello应助科研通管家采纳,获得10
7秒前
妍宝贝完成签到 ,获得积分10
7秒前
baomingqiu完成签到 ,获得积分10
11秒前
Freddy完成签到 ,获得积分10
11秒前
jnoker完成签到,获得积分10
11秒前
11秒前
13秒前
15秒前
舒心的雍发布了新的文献求助10
15秒前
梁寒完成签到,获得积分10
16秒前
HITvagary完成签到,获得积分0
17秒前
梁寒发布了新的文献求助10
20秒前
默默发布了新的文献求助10
23秒前
dwbh完成签到,获得积分10
26秒前
csu_zs完成签到,获得积分10
33秒前
意寒完成签到,获得积分10
35秒前
突突突完成签到 ,获得积分10
35秒前
我不是哪吒完成签到 ,获得积分10
36秒前
wangfang0228完成签到 ,获得积分10
41秒前
应俊完成签到 ,获得积分0
43秒前
折柳完成签到 ,获得积分10
43秒前
星辰大海应助123采纳,获得30
43秒前
明理的亦寒完成签到 ,获得积分10
44秒前
bkagyin应助默默采纳,获得10
44秒前
Позовименя完成签到,获得积分10
45秒前
harden9159完成签到,获得积分10
46秒前
刻苦羽毛完成签到,获得积分10
47秒前
52秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Les Mantodea de guyane 2500
Common Foundations of American and East Asian Modernisation: From Alexander Hamilton to Junichero Koizumi 600
Signals, Systems, and Signal Processing 510
Discrete-Time Signals and Systems 510
Campbell Walsh Wein Urology 3-Volume Set 12th Edition 200
Three-dimensional virtual model for robot-assisted partial nephrectomy in totally endophytic renal tumors: a propensity-score matching analysis with a control group 200
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5866758
求助须知:如何正确求助?哪些是违规求助? 6426838
关于积分的说明 15654966
捐赠科研通 4981749
什么是DOI,文献DOI怎么找? 2686737
邀请新用户注册赠送积分活动 1629553
关于科研通互助平台的介绍 1587550