过采样
过度拟合
计算机科学
聚类分析
插值(计算机图形学)
人工智能
班级(哲学)
数据挖掘
模式识别(心理学)
机器学习
人工神经网络
计算机网络
运动(物理)
带宽(计算)
作者
Chunsheng An,Jingtong Sun,Yifeng Wang,Qingjie Wei
标识
DOI:10.1109/qrs54544.2021.00097
摘要
CTGAN is a tabular data synthesis method for privacy preservation, which is used in this paper for data imbalance problem. This paper proposes a method for dealing with imbalanced data sets that combines K-means clustering and CTGAN to address the imbalanced distribution of minority class examples that result from oversampling with CTGAN. By conducting experiments with the LightGBM algorithm on home loan and online shopping datasets, it is demonstrated that the CTGAN method achieves superior learning results in f1-score and G-mean metrics compared to the interpolation-based oversampling technique represented by SMOTE. The preceding results indicate that by applying the method described in this paper to handle an imbalanced dataset, one can obtain a dataset with more examples, a more uniform distribution, and less overfitting while still satisfying the original dataset's probability distribution.
科研通智能强力驱动
Strongly Powered by AbleSci AI