SMOTE-LOF for noise identification in imbalanced data classification

过采样 计算机科学 离群值 鉴定(生物学) 数据挖掘 噪音(视频) 机器学习 人工智能 模式识别(心理学) 计算机网络 植物 带宽(计算) 图像(数学) 生物
作者
Asniar Asniar,Nur Ulfa Maulidevi,Kridanto Surendro
出处
期刊:Journal of King Saud University - Computer and Information Sciences [Elsevier BV]
卷期号:34 (6): 3413-3423 被引量:76
标识
DOI:10.1016/j.jksuci.2021.01.014
摘要

Imbalanced data typically refers to a condition in which several data samples in a certain problem is not equally distributed, thereby leading to the underrepresentation of one or more classes in the dataset. These underrepresented classes are referred to as a minority, while the overrepresented ones are called the majority. The unequal distribution of data leads to the machine's inability to carry out predictive accuracy in determining the minority classes, thereby causing various costs of classification errors. Currently, the standard framework used to solve the unequal distribution of imbalanced data learning is the Synthetic Minority Oversampling Technique (SMOTE). However, SMOTE can produce synthetic minority data samples considered as noise, which is also part of the majority classes. Therefore, this study aims to improve SMOTE to identify the noise from synthetic minority data produced in handling imbalanced data by adding the Local Outlier Factor (LOF). The proposed method is called SMOTE-LOF, and the experiment was carried out using imbalanced datasets with the results compared with the performance of the SMOTE. The results showed that SMOTE-LOF produces better accuracy and f-measure than the SMOTE. In a dataset with a large number of data examples and a smaller imbalance ratio, the SMOTE-LOF approach also produced a better AUC than the SMOTE. However, for a dataset with a smaller number of data samples, the SMOTE's AUC result is arguably better at handling imbalanced data. Therefore, future research needs to be carried out using different datasets with combinations varying from the number of data samples and the imbalanced ratio.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
2秒前
2秒前
FF完成签到,获得积分10
3秒前
阿辉发布了新的文献求助10
7秒前
nicolight发布了新的文献求助10
7秒前
Jy发布了新的文献求助10
7秒前
8秒前
11秒前
11秒前
Coldpal完成签到,获得积分10
12秒前
13秒前
14秒前
mayun95发布了新的文献求助10
15秒前
15秒前
灵巧幻露发布了新的文献求助10
16秒前
ashin17完成签到,获得积分10
17秒前
Jy发布了新的文献求助10
18秒前
6666666666666666完成签到,获得积分10
18秒前
科研狗发布了新的文献求助10
18秒前
123完成签到,获得积分10
19秒前
抹茶发布了新的文献求助10
20秒前
汉堡包应助吴新宇采纳,获得10
22秒前
慕青应助dyy采纳,获得10
23秒前
24秒前
李爱国应助nicolight采纳,获得10
24秒前
香蕉觅云应助搞怪的笑阳采纳,获得10
24秒前
李健应助WY采纳,获得10
25秒前
27秒前
HaoyuHu完成签到,获得积分10
27秒前
xxl完成签到 ,获得积分10
28秒前
烟花应助蓝天采纳,获得10
28秒前
jxl完成签到 ,获得积分10
28秒前
今后应助王哪跑12采纳,获得10
29秒前
29秒前
瘦瘦稀完成签到,获得积分10
29秒前
lilili完成签到,获得积分0
29秒前
30秒前
yejx完成签到,获得积分10
31秒前
墨墨小7发布了新的文献求助10
31秒前
感性的从波完成签到,获得积分10
31秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
PowerCascade: A Synthetic Dataset for Cascading Failure Analysis in Power Systems 2000
Various Faces of Animal Metaphor in English and Polish 800
Signals, Systems, and Signal Processing 610
Photodetectors: From Ultraviolet to Infrared 500
On the Dragon Seas, a sailor's adventures in the far east 500
Yangtze Reminiscences. Some Notes And Recollections Of Service With The China Navigation Company Ltd., 1925-1939 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6354092
求助须知:如何正确求助?哪些是违规求助? 8169101
关于积分的说明 17196078
捐赠科研通 5410215
什么是DOI,文献DOI怎么找? 2863906
邀请新用户注册赠送积分活动 1841349
关于科研通互助平台的介绍 1689961