Synthetic minority oversampling using edited displacement-based k-nearest neighbors

过采样 欠采样 机器学习 人工智能 计算机科学 算法 噪音(视频) 班级(哲学) 自举(财务) 水准点(测量) 数据挖掘 模式识别(心理学) 数学 图像(数学) 计算机网络 大地测量学 带宽(计算) 计量经济学 地理
作者
Alex X. Wang,Stefanka Chukova,Binh P. Nguyen
出处
期刊:Applied Soft Computing [Elsevier]
卷期号:148: 110895-110895
标识
DOI:10.1016/j.asoc.2023.110895
摘要

Skewed class proportions in real-world datasets present a challenge for machine learning algorithms, as they have a tendency to correctly categorize the majority class while incorrectly classifying the minority class. Such classification disparities hold significant implications, particularly in predictive scenarios involving minority groups, where misclassifying minority instances could lead to adverse outcomes. To tackle this, class imbalance learning has gained attention, with the Synthetic Minority Oversampling Technique (SMOTE) being a notable approach that addresses class imbalance by generating synthetic instances for the minority class based on their feature space neighbors. Despite its effectiveness and simplicity, SMOTE is known to suffer from a noise propagation issue where noisy and uninformative samples are introduced. While various SMOTE variants, including hybrids with undersampling, have been developed to tackle this problem, identifying noisy samples in complex real-world datasets remains a challenge. To address this, our study introduces a new SMOTE-based hybrid approach called SMOTE-centroid displacement-based k-NN (SMOTE-CDNN). SMOTE-CDNN employs centroid displacement for class prediction, which is more robust against noisy data. After SMOTE is applied, noise instances are detected and removed for clearer decision boundaries if their labels predicted by our centroid displacement-based k-NN algorithm are different from the real ones. While our experiments on 24 imbalance datasets demonstrate the resilience and efficiency of our proposed algorithm, which outperforms state-of-art resampling algorithms with various classification models, we acknowledge the need for further investigation into specific dataset characteristics and classification scenarios to determine the generalizability of our approach.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
zqingqing发布了新的文献求助10
1秒前
1秒前
2秒前
2秒前
华仔应助zhan采纳,获得30
2秒前
2秒前
轻松向彤发布了新的文献求助10
3秒前
小晋发布了新的文献求助10
4秒前
5秒前
5秒前
直率绮梅完成签到,获得积分10
5秒前
张点心完成签到,获得积分10
5秒前
TingtingGZ发布了新的文献求助10
6秒前
6秒前
6秒前
Melody发布了新的文献求助10
6秒前
YY完成签到,获得积分20
7秒前
7秒前
阳佟半仙发布了新的文献求助10
7秒前
鳗鱼思卉完成签到,获得积分10
7秒前
7秒前
8秒前
芙卡洛斯发布了新的文献求助10
9秒前
cly完成签到,获得积分10
10秒前
明理明杰完成签到 ,获得积分10
12秒前
虞无声发布了新的文献求助10
12秒前
Strike发布了新的文献求助10
13秒前
echo发布了新的文献求助10
14秒前
南风不竞发布了新的文献求助10
15秒前
15秒前
111完成签到,获得积分20
15秒前
小晋完成签到,获得积分10
16秒前
17秒前
echo完成签到,获得积分10
18秒前
赤侯完成签到,获得积分10
19秒前
芙卡洛斯完成签到,获得积分10
19秒前
科研通AI2S应助嘉嘉采纳,获得10
20秒前
亮子完成签到,获得积分10
20秒前
21秒前
21秒前
高分求助中
The Oxford Handbook of Social Cognition (Second Edition, 2024) 1050
Kinetics of the Esterification Between 2-[(4-hydroxybutoxy)carbonyl] Benzoic Acid with 1,4-Butanediol: Tetrabutyl Orthotitanate as Catalyst 1000
The Young builders of New china : the visit of the delegation of the WFDY to the Chinese People's Republic 1000
юрские динозавры восточного забайкалья 800
English Wealden Fossils 700
Chen Hansheng: China’s Last Romantic Revolutionary 500
Mantiden: Faszinierende Lauerjäger Faszinierende Lauerjäger 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3140881
求助须知:如何正确求助?哪些是违规求助? 2791855
关于积分的说明 7800523
捐赠科研通 2448091
什么是DOI,文献DOI怎么找? 1302393
科研通“疑难数据库(出版商)”最低求助积分说明 626548
版权声明 601210