计算机科学
数据挖掘
班级(哲学)
对象(语法)
模式识别(心理学)
价值(数学)
人工智能
机器学习
作者
Jaesub Yun,Jihyun Ha,Jong‐Seok Lee
标识
DOI:10.1145/2857546.2857648
摘要
In order to handle the class imbalance problem, synthetic data generation methods such as SMOTE, ADASYN, and Borderline-SMOTE have been developed. These methods use a common parameter k, the number of nearest neighbors. Nonetheless the most effective k value depends on the given dataset, there is no guideline to determine k. Moreover, if a dataset has noises, small sub-clusters, or complex patterns, the existing SMOTE and its variants show poor classification performance. Our method that we named Automatic Neighborhood size Determination(AND) restricts the size of neighborhood in SMOTE to maintain the original distribution of data, and helps SMOTE for its best performance. Defining and examining a minority region including a minority class object and its neighbors assure safety for generating synthetic samples. The proposed AND-method determines the value of k for each minority objects. By independently generating synthetic minority samples with the automatically predefined k values, we aim at achieving a better classification than the existing methods. Numerical experiments showed that the proposed method outperformed SMOTE, ADASYN, or Borderline-SMOTE.
科研通智能强力驱动
Strongly Powered by AbleSci AI