估计员
数学
多元统计
统计
多元正态分布
推论
置信区间
应用数学
计算机科学
人工智能
作者
Xin Bing,Wei Cheng,Huijie Feng,Yang Ning
标识
DOI:10.1080/01621459.2023.2241701
摘要
AbstractThis article studies the inference of the regression coefficient matrix under multivariate response linear regressions in the presence of hidden variables. A novel procedure for constructing confidence intervals of entries of the coefficient matrix is proposed. Our method first uses the multivariate nature of the responses by estimating and adjusting the hidden effect to construct an initial estimator of the coefficient matrix. By further deploying a low-dimensional projection procedure to reduce the bias introduced by the regularization in the previous step, a refined estimator is proposed and shown to be asymptotically normal. The asymptotic variance of the resulting estimator is derived with closed-form expression and can be consistently estimated. In addition, we propose a testing procedure for the existence of hidden effects and provide its theoretical justification. Both our procedures and their analyses are valid even when the feature dimension and the number of responses exceed the sample size. Our results are further backed up via extensive simulations and a real data analysis. Supplementary materials for this article are available online.KEYWORDS: Confidence intervalsConfoundingHidden variablesHigh-dimensional regressionHypothesis testingMultivariate response regressionSurrogate variable analysis Supplementary MaterialsThe supplement contains the rate of maxj‖XF̂j−XFj‖2, the statement of asymptotic normality of multiple components of Θ˜−Θ and all the proofs.AcknowledgmentsThe authors would like to thank the Associate Editor and two reviewers for their insightful comments which have improved the manuscript substantially.Disclosure StatementThe authors report there are no competing interests to declare.Notes1 A centered random vector X∈Rd is γ sub-Gaussian if E[exp (〈u,X〉)]≤ exp (‖u‖22γ2/2) for any u∈Rd.2 If DK is not invertible, we use its Moore-Penrose inverse instead.3 Since Guo, Ćevid, and Bühlmann (Citation2020) only provides guarantees of DDL for large p, we compare with DDL in the high-dimensional scenarios. Also due to the long running time of DDL, we only report its results for m=20 and p = 500.Additional informationFundingNing was supported by the NSF grant CAREER Award DMS-1941945 and DMS-2311291, and NIH 1RF1AG077820-01A1. Bing was partially supported by a discovery grant from the Natural Sciences and Engineering Research Council of Canada.
科研通智能强力驱动
Strongly Powered by AbleSci AI