计算机科学
稳健性(进化)
人工智能
特征(语言学)
班级(哲学)
机器学习
模式识别(心理学)
提取器
特征向量
特征提取
数据挖掘
工程类
语言学
生物化学
化学
哲学
工艺工程
基因
作者
Huiran Yan,Zenghao Cui,Xinyi Luo,Rui Wang,Yuan Yao
标识
DOI:10.1016/j.knosys.2023.110745
摘要
Class imbalance hinders the performance of some standard classifiers. However, class imbalance may not be solely responsible for the decrease in performance. Research efforts show that imbalanced datasets suffer from overlapping problems and borderline samples, which often deteriorate the classification performance. Conventional imbalanced learning methods mainly focus on balancing the distribution between classes but ignore difficulties caused by the above problems, thus underperforming drastically. This paper proposes a hybrid network called SemiPro-Empha to alleviate the aforementioned problems by learning a feature space with good inter-class separability and intra-class compactness. SemiPro-Empha comprises two modules: a feature learning loss called Semi-Prototype contrastive loss (Semi-Proto), guiding the feature extractor to learn a feature space where the projections of original overlapping classes can be separated, thereby improving classification performance. Additionally, this paper also presents a robust valuable borderline sample mining strategy called Emphasizing (Empha). Emphasizing identifies "valuable" borderline samples and eliminates noisy samples to create an auxiliary training dataset during each training epoch, providing up-to-date global classification boundary information for the training model while ensuring its robustness. Extensive experiments conducted on the breast cancer dataset and seven imbalanced datasets demonstrate the effectiveness of SemiPro-Empha.
科研通智能强力驱动
Strongly Powered by AbleSci AI