虚假关系
特征选择
计算机科学
选择偏差
人工智能
特征(语言学)
模式识别(心理学)
样品(材料)
熵(时间箭头)
选择(遗传算法)
样本量测定
数据挖掘
机器学习
统计
数学
语言学
哲学
化学
物理
色谱法
量子力学
作者
Shuai Yang,Xianjie Guo,Ke Yu,Xiaoling Huang,Tingting Jiang,He Jiang,Lichuan Gu
出处
期刊:ACM Transactions on Intelligent Systems and Technology
[Association for Computing Machinery]
日期:2023-08-11
卷期号:14 (5): 1-18
被引量:3
摘要
Almost all existing causal feature selection methods are proposed without considering the problem of sample selection bias. However, in practice, as data-gathering process cannot be fully controlled, sample selection bias often occurs, leading to spurious correlations between features and the class variable, which seriously deteriorates the performance of those existing methods. In this article, we study the problem of causal feature selection under sample selection bias and propose a novel Progressive Causal Feature Selection (PCFS) algorithm which has three phases. First, PCFS learns the sample weights to balance the treated group and control group distributions corresponding to each feature for removing spurious correlations. Second, based on the sample weights, PCFS uses a weighted cross-entropy model to estimate the causal effect of each feature and removes some irrelevant features from the confounder set. Third, PCFS progressively repeats the first two phases to remove more irrelevant features and finally obtains a causal feature set. Using synthetic and real-world datasets, the experiments have validated the effectiveness of PCFS, in comparison with several state-of-the-art classical and causal feature selection methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI