判别式
过度拟合
计算机科学
人工智能
特征选择
特征(语言学)
机器学习
模式识别(心理学)
维数之咒
班级(哲学)
数据挖掘
人工神经网络
哲学
语言学
作者
Yaojin Lin,Haoyang Liu,Hong Zhao,Qinghua Hu,Xingquan Zhu,Xindong Wu
出处
期刊:IEEE Transactions on Knowledge and Data Engineering
[Institute of Electrical and Electronics Engineers]
日期:2022-01-01
卷期号:: 1-1
被引量:49
标识
DOI:10.1109/tkde.2022.3177246
摘要
Hierarchical classification learning, which organizes data categories into a hierarchical structure, is an effective approach for large-scale classification tasks. The high dimensionality of data feature space, represented in hierarchical class structures, is one of the main research challenges. In addition, the class hierarchy often introduces imbalanced class distributions and causes overfitting. In this paper, we propose a feature selection method based on label distribution learning to address the above challenges. The crux is to alleviate the class imbalance problem and learn a discriminative feature subset for hierarchical classification process. Due to correlation between different class categories in the hierarchical tree structure, sibling categories can provide additional supervisory information for each learning sub tasks, which, in turn, alleviates the problem of under-sampling of minority categories. Therefore, we transform hierarchical labels to a hierarchical label distribution to represent this correlation. After that, a discriminative feature subset is selected recursively, by the common features and label-specific feature constraints, to ensure that downstream classification tasks can achieve the best performance. Experiments and comparisons, using seven well-established feature selection algorithms on six real data sets with different degrees of imbalance, demonstrate the superiority of the proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI