特征选择
马尔可夫毯
缺少数据
插补(统计学)
计算机科学
人工智能
机器学习
数据挖掘
特征(语言学)
估计员
模式识别(心理学)
马尔可夫链
马尔可夫模型
数学
马尔可夫性质
统计
哲学
语言学
作者
Ke Yu,Yajing Yang,Wei Ding
出处
期刊:ACM Transactions on Knowledge Discovery From Data
[Association for Computing Machinery]
日期:2022-01-08
卷期号:16 (4): 1-24
被引量:13
摘要
Causal feature selection aims at learning the Markov blanket (MB) of a class variable for feature selection. The MB of a class variable implies the local causal structure among the class variable and its MB and all other features are probabilistically independent of the class variable conditioning on its MB, this enables causal feature selection to identify potential causal features for feature selection for building robust and physically meaningful prediction models. Missing data, ubiquitous in many real-world applications, remain an open research problem in causal feature selection due to its technical complexity. In this article, we discuss a novel multiple imputation MB (MimMB) framework for causal feature selection with missing data. MimMB integrates Data Imputation with MB Learning in a unified framework to enable the two key components to engage with each other. MB Learning enables Data Imputation in a potentially causal feature space for achieving accurate data imputation, while accurate Data Imputation helps MB Learning identify a reliable MB of the class variable in turn. Then, we further design an enhanced kNN estimator for imputing missing values and instantiate the MimMB. In our comprehensively experimental evaluation, our new approach can effectively learn the MB of a given variable in a Bayesian network and outperforms other rival algorithms using synthetic and real-world datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI