Joint modeling and inference of multiple-subject high-dimensional sparse vector autoregressive models
Kim, Fisher, Pipiras
The multiple-subject vector autoregression (multi-VAR) model captures heterogeneous network Granger causality across subjects by decomposing individual sparse VAR transition matrices into commonly shared and subject-unique paths. The model has been applied to characterize hidden shared and unique paths among subjects and has demonstrated performance compared to methods commonly used in psychology and neuroscience. Despite this innovation, the model suffers from using a weighted median for identifying the common effects, leading to statistical inefficiency as the convergence rates of the common and unique paths are determined by the least sparse subject and the smallest sample size across all subjects. We propose a new identifiability condition for the multi-VAR model based on a communication-efficient data integration framework. We show that this approach achieves convergence rates tailored to each subject's sparsity level and sample size. Furthermore, we develop hypothesis tests to assess the nullity and homogeneity of individual paths, using Wald-type test statistics constructed from individual debiased estimators. A test for the significance of the common paths can also be derived through the framework. Simulation studies under various heterogeneity scenarios and a real data application demonstrate the performance of the proposed method compared to existing benchmark across standard evaluation metrics.
academic
Joint modeling and inference of multiple-subject high-dimensional sparse vector autoregressive models
Title: Joint modeling and inference of multiple-subject high-dimensional sparse vector autoregressive models
Authors: Younghoon Kim (Cornell University), Zachary F. Fisher (University of North Carolina at Chapel Hill), Vladas Pipiras (University of North Carolina at Chapel Hill)
The multi-subject vector autoregressive (multi-VAR) model captures heterogeneous network Granger causality across subjects by decomposing individual sparse VAR transition matrices into common shared pathways and subject-specific pathways. Although this model has been applied to characterize hidden shared and unique pathways across subjects and has demonstrated superior performance compared to commonly used methods in psychology and neuroscience, its use of weighted medians to identify common effects suffers from statistical efficiency issues, as convergence rates for common and unique pathways are determined by the least sparse subject and the minimum sample size across all subjects. This paper proposes new identifiability conditions for the multi-VAR model based on a communication-efficient data integration framework, enabling customized convergence rates tailored to each subject's sparsity level and sample size. Additionally, a hypothesis testing framework is developed to assess the nullity and homogeneity of individual pathways using Wald-type test statistics constructed from subject-specific debiased estimators, from which significance tests for common pathways can be derived.
The core problems addressed in this research concern statistical efficiency and inference in multi-subject high-dimensional sparse vector autoregressive modeling, specifically:
Statistical Efficiency Issue: The existing multi-VAR model uses weighted medians to identify common effects, resulting in convergence rates limited by the least sparse subject and minimum sample size, failing to fully leverage the heterogeneous characteristics of each subject.
Missing Inference Framework: Lack of formal hypothesis testing framework for multi-subject VAR models, preventing assessment of individual pathway significance, nullity, and homogeneity.
Proposes New Identifiability Conditions: Based on a communication-efficient data integration framework, avoiding statistical efficiency issues of weighted median methods
Achieves Subject-Specific Convergence Rates: Convergence rates now depend on each subject's own sparsity level and sample size, rather than global worst-case scenarios
Constructs Complete Inference Framework: Develops three classes of hypothesis tests: nullity tests, homogeneity tests, and significance tests
Provides Theoretical Guarantees: Establishes convergence rates for estimators and asymptotic distribution theory for test statistics
Improves Computational Efficiency: Employs a separate estimation and aggregation strategy, significantly reducing computational complexity
The paper cites abundant relevant literature covering multiple domains including high-dimensional statistics, time series analysis, and robust estimation, providing solid theoretical foundation for the research.