特征选择
水准点(测量)
计算机科学
人工智能
机器学习
条件独立性
特征(语言学)
算法
数据挖掘
大地测量学
语言学
哲学
地理
作者
Xingyu Wu,Bingbing Jiang,Ke Yu,Miao Chen
出处
期刊:IEEE transactions on cybernetics
[Institute of Electrical and Electronics Engineers]
日期:2020-12-01
卷期号:50 (12): 4983-4996
被引量:52
标识
DOI:10.1109/tcyb.2019.2940509
摘要
Causal feature selection has achieved much attention in recent years, which discovers a Markov boundary (MB) of the class attribute. The MB of the class attribute implies local causal relations between the class attribute and the features, thus leading to more interpretable and robust prediction models than the features selected by the traditional feature selection algorithms. Many causal feature selection methods have been proposed, and almost all of them employ conditional independence (CI) tests to identify MBs. However, many datasets from real-world applications may suffer from incorrect CI tests due to noise or small-sized samples, resulting in lower MB discovery accuracy for these existing algorithms. To tackle this issue, in this article, we first introduce a new concept of PCMasking to explain a type of incorrect CI tests in the MB discovery, then propose a cross-check and complement MB discovery (CCMB) algorithm to repair this type of incorrect CI tests for accurate MB discovery. To improve the efficiency of CCMB, we further design a pipeline machine-based CCMB (PM-CCMB) algorithm. Using benchmark Bayesian network datasets, the experiments demonstrate that both CCMB and PM-CCMB achieve significant improvements on the MB discovery accuracy compared with the existing methods, and PM-CCMB further improves the computational efficiency. The empirical study in the real-world datasets validates the effectiveness of CCMB and PM-CCMB against the state-of-the-art causal and traditional feature selection algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI