计算机科学
特征选择
数据挖掘
帕累托原理
相互信息
机器学习
特征(语言学)
人工智能
冗余(工程)
集合(抽象数据类型)
数据流挖掘
选择(遗传算法)
过程(计算)
特征向量
数学优化
操作系统
哲学
语言学
程序设计语言
数学
作者
Azar Rafie,Parham Moradi,Abdulbaghi Ghaderzadeh
标识
DOI:10.1016/j.eswa.2022.119428
摘要
Multi-label classification methods aim at assigning more than one label to each instance. In many real-world classification problems such as image multi-label classification tasks such as cancer detection, and text classification, we faced with thousands of thousand features. The performance of machine learning methods will be reduced while faced with high dimensional problems. To tackle this issue, feature selection methods are introduced to choose a small set of prominent features which best describe the data. Traditional multi-label feature selection methods are required to access to whole feature space, while in online platforms such as Facebook and Twitter, we faced with streams of data added by the users of these platforms over the time. Traditional multi-label feature selection methods are failed while applied on data streams. To solve this issue, online methods are introduced to deal with data streams. Existing streaming multi-label feature selection methods consider the task as a single optimization process while there are several contradictory objectives that need to be optimize simultaneously. To solve this issue, this paper uses a multi-objective search strategy to choose streaming features by using the mutual information and Pareto optimal set theories. There are several objectives such as minimizing the redundancy of features, and maximizing the relevancy of features to a set of labels that are need to be optimized during the feature selection process. Here, we used the Pareto set theory to identify a set of no-dominant solutions which best describe the problem. The proposed method has compared with a set of state-of-the-art online feature selection methods and the obtained results demonstrate the effectiveness of the proposed strategy.
科研通智能强力驱动
Strongly Powered by AbleSci AI