欠采样
重采样
计算机科学
聚类分析
分类器(UML)
人工智能
灵敏度(控制系统)
模式识别(心理学)
边界判定
数据挖掘
机器学习
数学
工程类
电子工程
作者
Wing W. Y. Ng,Junjie Hu,Daniel Yeung,Shengming Yin,Fabio Roli
标识
DOI:10.1109/tcyb.2014.2372060
摘要
Undersampling is a widely adopted method to deal with imbalance pattern classification problems. Current methods mainly depend on either random resampling on the majority class or resampling at the decision boundary. Random-based undersampling fails to take into consideration informative samples in the data while resampling at the decision boundary is sensitive to class overlapping. Both techniques ignore the distribution information of the training dataset. In this paper, we propose a diversified sensitivity-based undersampling method. Samples of the majority class are clustered to capture the distribution information and enhance the diversity of the resampling. A stochastic sensitivity measure is applied to select samples from both clusters of the majority class and the minority class. By iteratively clustering and sampling, a balanced set of samples yielding high classifier sensitivity is selected. The proposed method yields a good generalization capability for 14 UCI datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI