Local Causal Discovery for Statistically Efficient Causal Inference
Schubert, Claassen, Magliacane
Causal discovery methods can identify valid adjustment sets for causal effect estimation for a pair of target variables, even when the underlying causal graph is unknown. Global causal discovery methods focus on learning the whole causal graph and therefore enable the recovery of optimal adjustment sets, i.e., sets with the lowest asymptotic variance, but they quickly become computationally prohibitive as the number of variables grows. Local causal discovery methods offer a more scalable alternative by focusing on the local neighborhood of the target variables, but are restricted to statistically suboptimal adjustment sets. In this work, we propose Local Optimal Adjustments Discovery (LOAD), a sound and complete causal discovery approach that combines the computational efficiency of local methods with the statistical optimality of global methods. First, LOAD identifies the causal relation between the targets and tests if the causal effect is identifiable by using only local information. If it is identifiable, it then finds the optimal adjustment set by leveraging local causal discovery to infer the mediators and their parents. Otherwise, it returns the locally valid parent adjustment sets based on the learned local structure. In our experiments on synthetic and realistic data LOAD outperforms global methods in scalability, while providing more accurate effect estimation than local methods.
academic
Local Causal Discovery for Statistically Efficient Causal Inference
Causal discovery methods can identify valid adjustment sets for estimating causal effects between a pair of target variables, even when the underlying causal graph is unknown. Global causal discovery methods focus on learning the entire causal graph and thus can recover optimal adjustment sets (i.e., those with the lowest asymptotic variance), but they quickly become computationally intractable as the number of variables grows. Local causal discovery methods offer more scalable alternatives by focusing on the local neighborhood of target variables, but are limited to statistically suboptimal adjustment sets. In this work, the authors propose Local Optimal Adjustment Discovery (LOAD), a reliable and complete causal discovery method that combines the computational efficiency of local methods with the statistical optimality of global methods.
In causal inference, estimating the causal effect between two variables is a core task. When the underlying causal graph is unknown, causal discovery methods are needed to identify valid adjustment sets for causal effect estimation. Existing methods face a fundamental trade-off:
The Dilemma of Global Methods: Global causal discovery methods (e.g., PC algorithm) can learn the complete causal graph and recover optimal adjustment sets, but their computational complexity grows exponentially with the number of variables, making them infeasible for large-scale problems.
Limitations of Local Methods: Local causal discovery methods (e.g., MB-by-MB, LDECC) are computationally efficient but can only recover suboptimal adjustment sets, resulting in higher asymptotic variance in causal effect estimation.
The authors identify the following problems with existing local methods:
The LocalPC algorithm is unreliable in identifying adjacent variables and may incorrectly identify non-adjacent spouses as adjacent
The LDECC algorithm is incomplete and cannot orient all orientable edges in some cases
The LDP algorithm may incorrectly report effects as unidentifiable when they are actually identifiable with zero effect
Therefore, there is a need for a new method that maintains the computational efficiency of local methods while achieving the statistical optimality of global methods.
Developed methods for determining causal effect identifiability based on local information: Proposed necessary and sufficient conditions for determining whether a causal effect is identifiable using only local information.
Proposed the LOAD algorithm: A reliable and complete method that identifies optimal adjustment sets using only local information around variables.
Comprehensive experimental evaluation: Evaluated LOAD on synthetic and real data, demonstrating its ability to recover high-quality adjustment sets at low computational cost.
Theoretical guarantees: Proved the reliability and completeness of LOAD in determining causal effect identifiability and finding optimal adjustment sets.
Using the LocalRelate algorithm (Algorithm 1), relationships are determined through the following theorems:
Explicit Ancestor Relationship (Theorem 4.1): For any two distinct nodes X and Y in CPDAG G, X ∈ ExplAn_G(Y) if and only if X ⊥̸⊥ Y | Pa_G(X) ∪ Sib_G(X)
Definite Non-Ancestor Relationship (Theorem 4.2): X is a definite non-ancestor of Y if and only if X ⊥⊥ Y | Pa_G(X)
Local Amenability Test: First proposes necessary and sufficient conditions for testing amenability using only local information, avoiding the need to check all possible directed paths.
Caching Mechanism: An improved MB-by-MB algorithm uses caching to reuse Markov blankets and local structures identified in previous runs, significantly improving computational efficiency.
Theoretical Completeness: Proves that LOAD is reliable and complete in determining causal relationships, identifiability, and optimal adjustment sets.
The paper cites important literature in causal inference, including:
Pearl (2009): Causality - classic textbook on causal inference
Spirtes et al. (2000): Foundational work on constraint-based causal discovery
Henckel et al. (2022): Graphical criteria for optimal adjustment sets
Perković et al. (2015): Definition and properties of amenability
Overall Assessment: This is a high-quality causal inference paper with important contributions at both theoretical and practical levels. The LOAD algorithm elegantly resolves the trade-off between computational efficiency and statistical optimality in causal discovery, with significant academic value and promising application prospects.