聚类分析
人工智能
计算机科学
机器学习
决策树
水准点(测量)
数据挖掘
班级(哲学)
统计分类
模式识别(心理学)
大地测量学
地理
作者
Jin Xiao,Yuhang Tian,Ling Xie,Xiaoyi Jiang,Jing Huang
标识
DOI:10.1109/tii.2019.2933675
摘要
The traditional supervised classification algorithms tend to focus on uncovering the relationship between sample attributes and the class labels; they seldom consider the potential structural characteristics of the sample space, often leading to unsatisfactory classification results. To improve the performance of classification models, many scholars have sought to construct hybrid models by combining both supervised and unsupervised learning. Although the existing hybrid models have shown significant potential in industrial applications, our experiments indicate that some shortcomings remain. With the aim of overcoming such shortcomings of the existing hybrid models, this article proposes a hybrid classification framework based on clustering (HCFC). First, it applies a clustering algorithm to partition the training samples into K clusters. It then constructs a clustering-based attribute selection measure—namely, the hybrid information gain ratio, based upon which it then trains a C4.5 decision tree. Depending on the differences in the clustering algorithms used, this article constructs two different versions of the HCFC (HCFC-K and HCFC-D) and tests them on eight benchmark datasets in the healthcare and disease diagnosis industries and on 15 datasets from other fields. The results indicate that both versions of the HCFC achieve a comparable or even better classification performance than the other three hybrid and six single models considered. In addition, the HCFC-D has a stronger ability to resist class noise compared with the HCFC-K.
科研通智能强力驱动
Strongly Powered by AbleSci AI