观察研究
混淆
因果推理
符号
差异(会计)
推论
数学
变量(数学)
统计
倾向得分匹配
计算机科学
算法
计量经济学
人工智能
数学分析
业务
会计
算术
作者
Kun Kuang,Peng Cui,Hao Zou,Bo Li,Jianrong Tao,Fei Wu,Shiqiang Yang
出处
期刊:IEEE Transactions on Knowledge and Data Engineering
[Institute of Electrical and Electronics Engineers]
日期:2020-07-03
卷期号:34 (5): 2120-2134
被引量:12
标识
DOI:10.1109/tkde.2020.3006898
摘要
Causal Inference plays an important role in decision making in many fields, such as social marketing, healthcare, and public policy. One fundamental problem in causal inference is the treatment effect estimation in observational studies when variables are confounded. Controlling for confounding effects is generally handled by propensity score. But it treats all observed variables as confounders and ignores the adjustment variables, which have no influence on treatment but are predictive of the outcome. Recently, it has been demonstrated that the adjustment variables are effective in reducing the variance of the estimated treatment effect. However, how to automatically separate the confounders and adjustment variables in observational studies is still an open problem, especially in the scenarios of high dimensional variables, which are common in the big data era. In this paper, we first propose a Data-Driven Variable Decomposition (D $^2$ VD) algorithm, which can 1) automatically separate confounders and adjustment variables with a data-driven approach, and 2) simultaneously estimate treatment effect in observational studies with high dimensional variables. Under standard assumptions, we theoretically prove that our D $^2$ VD algorithm can unbiased estimate treatment effect and achieve lower variance than traditional propensity score based methods. Moreover, to address the challenges from high-dimensional variables and nonlinear, we extend our D $^2$ VD to a non-linear version, namely Nonlinear-D $^2$ VD (N-D $^2$ VD) algorithm. To validate the effectiveness of our proposed algorithms, we conduct extensive experiments on both synthetic and real-world datasets. The experimental results demonstrate that our D $^2$ VD and N-D $^2$ VD algorithms can automatically separate the variables precisely, and estimate treatment effect more accurately and with tighter confidence intervals than the state-of-the-art methods. We also demonstrated that the top-ranked features by our algorithm have the best prediction performance on an online advertising dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI