计算机科学
分类器(UML)
人工智能
机器学习
自编码
学习迁移
数据挖掘
生成对抗网络
模式识别(心理学)
人工神经网络
深度学习
作者
Hongwei Ding,Yu Sun,Nana Huang,Zhidong Shen,Zhenyu Wang,Adnan Iftekhar,Xiaohui Cui
标识
DOI:10.1016/j.ins.2023.01.147
摘要
Imbalanced data distribution is the main reason for the performance degradation of most supervised classification algorithms. When dealing with imbalanced learning problems, the prediction of traditional classifiers tends to favor the majority class and ignore the minority class which is often much more important. Therefore, it is necessary to balance majority data and minority data before classification. A popular strategy for balancing the two data classes is synthesising minority data. In recent years, generative adversarial networks (GAN) have shown great potential in fitting sample distributions. Based on this, this paper proposes a model combining improved GAN and transfer learning, RVGAN-TL, to solve the imbalanced learning problem of tabular data. As for the improvement of GAN, variational autoencoder (VAE) is used to generate latent variables with a posterior distribution as the input of GAN, and similarity measure loss is introduced into the generator to improve the quality of the minority data generated by GAN. In addition, a roulette wheel selection method is applied to the training data selection in GAN to rebalance data in the overlapping area. When data is balanced, the generated data is used as the source domain and the original data as the target domain, and the transfer learning method is used to train the final classifier. Experiments on 20 real datasets show that the classification performance of the proposed method is significantly improved compared with other popular methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI