假阳性悖论
特征选择
计算机科学
假阳性和假阴性
人工智能
特征(语言学)
水准点(测量)
机器学习
模式识别(心理学)
条件独立性
选择(遗传算法)
数据挖掘
哲学
语言学
大地测量学
地理
作者
Xianjie Guo,Kui Yu,Lin Liu,Fuyuan Cao,Jiuyong Li
出处
期刊:IEEE transactions on neural networks and learning systems
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:35 (1): 938-951
被引量:11
标识
DOI:10.1109/tnnls.2022.3178075
摘要
Causal feature selection methods aim to identify a Markov boundary (MB) of a class variable, and almost all the existing causal feature selection algorithms use conditional independence (CI) tests to learn the MB. However, in real-world applications, due to data issues (e.g., noisy or small samples), CI tests can be unreliable; thus, causal feature selection algorithms relying on CI tests encounter two types of errors: false positives (i.e., selecting false MB features) and false negatives (i.e., discarding true MB features). Existing algorithms only tackle either false positives or false negatives, and they cannot deal with both types of errors at the same time, leading to unsatisfactory results. To address this issue, we propose a dual-correction-strategy-based MB learning (DCMB) algorithm to correct the two types of errors simultaneously. Specifically, DCMB selectively removes false positives from the MB features currently selected, while selectively retrieving false negatives from the features currently discarded. To automatically determine the optimal number of selected features for the selective removal and retrieval in the dual correction strategy, we design the simulated-annealing-based DCMB (SA-DCMB) algorithm. Using benchmark Bayesian network (BN) datasets, the experimental results demonstrate that DCMB achieves substantial improvements on the MB learning accuracy compared with the existing MB learning methods. Empirical studies in real-world datasets validate the effectiveness of SA-DCMB for classification against state-of-the-art causal and traditional feature selection algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI