欠采样
熵(时间箭头)
计算机科学
k-最近邻算法
人工智能
模式识别(心理学)
排名(信息检索)
信息丢失
机器学习
数据挖掘
量子力学
物理
作者
Anil Kumar,Dinesh Singh,Rama Shankar Yadav
摘要
Summary Many real‐world application datasets such as medical diagnostics, fraud detection, biological classification, risk analysis and so forth are facing class imbalance and overlapping problems. It seriously affects the learning of the classification model on these datasets because minority instances are not visible to the learner in the overlapped region and the performance of learners is biased towards the majority. Undersampling‐based methods are the most commonly used techniques to handle the above‐mentioned problems. The major problem with these methods is excessive elimination and information loss, that is, unable to retain potential informative majority instances. We propose a novel entropy and neighborhood‐based undersampling (ENU) that removed only those majority instances from the overlapped region which are having less informativeness (entropy) score than the threshold entropy. Most of such existing methods improved sensitivity scores significantly but not in many other performance contexts. ENU first computes entropy and threshold score for majority instances and, a local density‐based improved KNN search is used to identify overlapped majority instances. To tackle the problem effectively ENU defined four improved KNN‐based procedures (ENUB, ENUT, ENUC, and ENUR) for effective undersampling. ENU outperformed in sensitivity, G‐mean, and F1‐score average ranking with reduced information loss as compared to the existing state‐of‐the‐art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI