Inference in High-Dimensional Multivariate Response Regression with Hidden Variables

估计员数学多元统计统计多元正态分布推论置信区间应用数学计算机科学人工智能

作者

Xin Bing,Wei Cheng,Huijie Feng,Yang Ning

链接

标识

DOI：10.1080/01621459.2023.2241701

摘要

AbstractThis article studies the inference of the regression coefficient matrix under multivariate response linear regressions in the presence of hidden variables. A novel procedure for constructing confidence intervals of entries of the coefficient matrix is proposed. Our method first uses the multivariate nature of the responses by estimating and adjusting the hidden effect to construct an initial estimator of the coefficient matrix. By further deploying a low-dimensional projection procedure to reduce the bias introduced by the regularization in the previous step, a refined estimator is proposed and shown to be asymptotically normal. The asymptotic variance of the resulting estimator is derived with closed-form expression and can be consistently estimated. In addition, we propose a testing procedure for the existence of hidden effects and provide its theoretical justification. Both our procedures and their analyses are valid even when the feature dimension and the number of responses exceed the sample size. Our results are further backed up via extensive simulations and a real data analysis. Supplementary materials for this article are available online.KEYWORDS: Confidence intervalsConfoundingHidden variablesHigh-dimensional regressionHypothesis testingMultivariate response regressionSurrogate variable analysis Supplementary MaterialsThe supplement contains the rate of maxj‖XF̂j−XFj‖2, the statement of asymptotic normality of multiple components of Θ˜−Θ and all the proofs.AcknowledgmentsThe authors would like to thank the Associate Editor and two reviewers for their insightful comments which have improved the manuscript substantially.Disclosure StatementThe authors report there are no competing interests to declare.Notes1 A centered random vector X∈Rd is γ sub-Gaussian if E[exp (〈u,X〉)]≤ exp (‖u‖22γ2/2) for any u∈Rd.2 If DK is not invertible, we use its Moore-Penrose inverse instead.3 Since Guo, Ćevid, and Bühlmann (Citation2020) only provides guarantees of DDL for large p, we compare with DDL in the high-dimensional scenarios. Also due to the long running time of DDL, we only report its results for m=20 and p = 500.Additional informationFundingNing was supported by the NSF grant CAREER Award DMS-1941945 and DMS-2311291, and NIH 1RF1AG077820-01A1. Bing was partially supported by a discovery grant from the Natural Sciences and Engineering Research Council of Canada.

求助该文献

最长约 10秒，即可获得该文献文件

Inference in High-Dimensional Multivariate Response Regression with Hidden Variables

今日热心研友