聚类分析
特征选择
选择(遗传算法)
特征(语言学)
计算机科学
人工智能
模式识别(心理学)
数据挖掘
哲学
语言学
作者
Michal Moran,Goren Gordon
出处
期刊:IEEE transactions on artificial intelligence
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:: 1-13
标识
DOI:10.1109/tai.2024.3407034
摘要
In tabular data, certain challenges can negatively affect the quality of machine learning models, such as high dimensionality, noisy, irrelevant, or repetitive features, interactions between features and the fact that instances often come from different sources or distributions. Feature selection, instance selection and clustering algorithms address some of these challenges. Here, we propose a new holistic framework that assists in clarifying the structure of tabular datasets and enables the production of higher-quality machine learning models. The framework, based on intrinsic-reward deep reinforcement learning loops, uses curious feature selection as the basis for clustering data instances, effectively creating blocks within the tabular data with the most relevant features for each cluster. The framework results in a clustering algorithm, wherein the instances are clustered based on their predicted optimal informative features. We show that this framework makes it possible to improve the accuracy of learning models on artificial and real datasets and to provide important insights into the data themselves.
科研通智能强力驱动
Strongly Powered by AbleSci AI