2025-11-12T12:37:10.401101

Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables

Li, Guo, Xie et al.
Estimating causal effects from nonexperimental data is a fundamental problem in many fields of science. A key component of this task is selecting an appropriate set of covariates for confounding adjustment to avoid bias. Most existing methods for covariate selection often assume the absence of latent variables and rely on learning the global network structure among variables. However, identifying the global structure can be unnecessary and inefficient, especially when our primary interest lies in estimating the effect of a treatment variable on an outcome variable. To address this limitation, we propose a novel local learning approach for covariate selection in nonparametric causal effect estimation, which accounts for the presence of latent variables. Our approach leverages testable independence and dependence relationships among observed variables to identify a valid adjustment set for a target causal relationship, ensuring both soundness and completeness under standard assumptions. We validate the effectiveness of our algorithm through extensive experiments on both synthetic and real-world data.
academic

Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables

Basic Information

  • Paper ID: 2411.16315
  • Title: Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables
  • Authors: Zheng Li, Xichen Guo, Feng Xie, Zeng Yan, Hao Zhang, Zhi Geng
  • Classification: cs.LG math.ST stat.ML stat.TH
  • Conference: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
  • Paper Link: https://arxiv.org/abs/2411.16315

Abstract

Estimating causal effects from non-experimental data is a fundamental problem across many scientific disciplines. A critical component of this task is selecting an appropriate set of covariates for confounding adjustment to avoid bias. Existing covariate selection methods typically assume the absence of latent variables and rely on learning the global network structure among variables. However, when the primary focus is on estimating the effect of a treatment variable on an outcome variable, identifying the global structure may be unnecessary and inefficient. To address this limitation, this paper proposes a novel local learning approach for covariate selection in nonparametric causal effect estimation in the presence of latent variables. The method leverages testable independence and dependence relationships among observed variables to identify valid adjustment sets for the target causal relationship, ensuring completeness and correctness under standard assumptions.

Research Background and Motivation

Problem Definition

The core problem addressed by this research is: How can we efficiently select a set of covariates to estimate the causal effect of a specific treatment variable X on an outcome variable Y in the presence of latent variables?

Problem Significance

  1. Broad Applicability: Causal effect estimation is crucial in epidemiology, social sciences, economics, and artificial intelligence
  2. Practical Necessity: In real-world applications, ideal randomized controlled trials are often difficult to implement
  3. Bias Control: Incorrect covariate selection leads to biased causal effect estimates

Limitations of Existing Methods

  1. Global Structure Learning: Existing methods such as IDA and LV-IDA require learning the complete causal graph structure, resulting in high computational complexity
  2. Neglect of Latent Variables: Many methods assume the absence of latent confounding variables, which is unrealistic in practical applications
  3. Incompleteness of Local Methods: Methods like CEELS, while more efficient, may miss valid adjustment sets

Research Motivation

The starting point of this paper is to develop a covariate selection method that maintains the efficiency advantages of local learning while ensuring completeness and correctness, particularly in complex scenarios with latent variables.

Core Contributions

  1. Proposes LSAS Algorithm: Designs a fully local covariate selection algorithm that leverages testable independence and dependence relationships, allowing for the existence of latent variables
  2. Theoretical Guarantees: Proves the completeness and correctness of the proposed algorithm under standard assumptions, enabling identification of valid adjustment sets for target causal relationships
  3. Efficiency Improvement: Significantly reduces computational complexity compared to global methods, reducing time complexity from O(t×2^t) to O(|MB(X)|-1)×2^|MB(Y)|-1+n
  4. Experimental Validation: Verifies algorithm effectiveness on both synthetic and real data

Methodology Details

Task Definition

Input: Observed dataset D containing treatment variable X, outcome variable Y, and covariate set O Output:

  • Scenario S1: Causal effect estimate θ of X on Y
  • Scenario S2: Determination that X has no causal effect on Y (θ=0)
  • Scenario S3: Unable to determine whether causal effect exists (θ=∅)

Constraints:

  • Y is not a causal ancestor of X
  • O is a set of pre-treatment variables (X and Y are not causal ancestors of any variable in O)

Core Theoretical Foundation

AMB Definition

Defines adjustment sets within the Markov blanket AMB(X,Y):

  • Z ⊆ MB(Y) \ {X}
  • Z ∩ Forb(X,Y) = ∅
  • Z blocks all non-causal paths from X to Y

Key Theorems

Theorem 1 (AMB Existence): A subset of O serves as an adjustment set for (X,Y) if and only if a subset of MB(Y){X} serves as an adjustment set.

Theorem 2 (Rule R1): For Z ⊆ MB(Y){X}, if there exists S ∈ MB(X){Y} satisfying:

  • S ⊥̸⊥ Y | Z (condition i)
  • S ⊥⊥ Y | Z∪{X} (condition ii)

then Z is an AMB(X,Y), and X has a causal effect on Y.

Theorem 3 (Rule R2): If there exist Z ⊆ MB(Y){X} and S ∈ MB(X){Y} satisfying either:

  • X ⊥⊥ Y | Z (condition i)
  • S ⊥̸⊥ X | Z and S ⊥⊥ Y | Z (condition ii)

then X has no causal effect on Y.

LSAS Algorithm Flow

Algorithm 1: Local Search Adjustment Sets (LSAS)
Input: Observed dataset D, treatment variable X, outcome variable Y
1: MB(X), MB(Y) ← Markov Blanket Discovery(X,Y,D)
2: Θ ← ∅ // Initialize causal effect estimate
3: for each S ∈ MB(X)\{Y}, each Z ⊆ MB(Y)\{X} do
4:   if S and Z satisfy Rule R1 then
5:     Estimate causal effect θ of X on Y, Θ ← θ // Scenario S1
6:   end if
7:   if S and Z satisfy Rule R2 then
8:     return Θ ← 0 // No causal effect, Scenario S2
9:   end if
10: end for
Output: Estimated causal effect Θ // If ∅ then Scenario S3

Technical Innovations

  1. Local Markov Blanket Utilization: Requires only Markov blanket information for X and Y, avoiding global graph learning
  2. Rule-Driven Identification: Directly identifies causal relationships from conditional independence tests through R1 and R2 rules
  3. Latent Variable Handling: Addresses latent confounding variables within the MAG framework
  4. Completeness Guarantee: Theoretical proof of method completeness ensures no valid adjustment sets are missed

Experimental Setup

Datasets

  1. Synthetic Data:
    • Random graphs: Erdős-Rényi model G(n,d) with 20-50 nodes, average degree 3-9
    • Specific structures: DAGs based on Figures 3(a) and 4(a)
    • Benchmark networks: INSURANCE (27 nodes), MILDEW (35 nodes), WIN95PTS (76 nodes), ANDES (223 nodes)
  2. Real Data: Cattaneo2 dataset containing 4,642 singleton birth records from Pennsylvania

Evaluation Metrics

  • Relative Error (RE): |(Estimated Value - True Value)/True Value| × 100%
  • Number of Tests (nTest): Number of conditional independence tests executed by the algorithm

Comparison Methods

  • LV-IDA: Global graph learning method based on RFCI algorithm
  • EHS: Global search method with pre-treatment assumption
  • CEELS: Local search method with pre-treatment assumption
  • LDP: Local search method relaxing pre-treatment assumption

Implementation Details

  • Sample sizes: 1K, 5K, 10K, 15K
  • Linear Gaussian causal models with edge weights sampled from Uniform0.5,1.5
  • Conditional independence test significance level: 0.01
  • Maximum conditioning set size: 3-7 (depending on network complexity)

Experimental Results

Main Results

Specific Structure Experiments

On MAG structures corresponding to Figures 3(b) and 4(b):

  • Relative Error: LSAS significantly outperforms other methods across all sample sizes
  • Test Efficiency: LSAS's nTest is far lower than LV-IDA and EHS
  • Completeness Advantage: CEELS and LDP fail to find valid adjustment sets on certain structures due to incompleteness

Benchmark Network Experiments

On MILDEW and WIN95PTS networks:

  • LSAS performs optimally on almost all evaluation metrics and sample sizes
  • LSAS outperforms other methods even when pre-treatment assumptions are violated
  • EHS cannot complete on large networks due to excessive runtime

Real Data Validation

On Cattaneo2 dataset studying the effect of maternal smoking during pregnancy on infant birth weight:

  • LSAS and EHS effect estimates both fall within the benchmark interval -250g, -200g
  • LSAS requires only 158 conditional independence tests, compared to 1,284 for CEELS and 266 for LDP
  • Validates method effectiveness in practical applications

Ablation Studies

The paper validates method robustness through experiments with varying network densities:

  • Performance of all methods declines with increasing graph density, but LSAS maintains clear advantages
  • In G(40,9) networks, although LDP has lower nTest, LSAS's RE is significantly superior

Runtime Analysis

LSAS demonstrates optimal runtime performance on most networks and sample sizes, with the only exception being WIN95PTS network at large sample size (15K) where LDP is faster, but LSAS shows significantly higher accuracy.

Methods with Known Causal Graphs

  • Classical Adjustment Criteria: Backdoor criterion, generalized backdoor criterion
  • Optimal Adjustment Sets: Finding adjustment sets with minimal asymptotic variance

Methods with Unknown Causal Graphs

  • Global Learning: IDA series methods requiring complete CPDAG/PAG learning
  • Local Learning: CovSel, EHS and other methods, but most assume no latent variables
  • Latent Variable Handling: LV-IDA, CE-SAT and other methods, but with high computational complexity

Advantages of This Work

Compared to existing work, this method achieves a unification of local learning efficiency and global method completeness, with particular advantages in handling latent variables.

Conclusions and Discussion

Main Conclusions

  1. Proposes the first covariate selection algorithm that maintains locality while ensuring completeness in the presence of latent variables
  2. Theoretically proves the correctness and completeness of the method
  3. Experimental validation demonstrates significant advantages in efficiency and accuracy

Limitations

  1. Pre-treatment Assumption: Still relies on pre-treatment assumption, though performs well under some violations
  2. Descendant Identification: Cannot locally identify descendants of treatment variables without recovering the complete graph
  3. Conditional Independence Testing: Depends on accurate conditional independence tests, which may have errors with finite samples

Future Directions

  1. Relaxing Assumptions: Develop methods not dependent on pre-treatment assumptions
  2. Background Knowledge Integration: Utilize domain knowledge to assist causal identification
  3. Multi-environment Data: Leverage multi-environment data to enhance causal identification
  4. Descendant Identification: Study methods for locally identifying treatment variable descendants

In-Depth Evaluation

Strengths

  1. Theoretical Contribution: Provides a complete theoretical framework proving the feasibility of local methods
  2. Practical Value: Significantly reduces computational complexity, enabling large-scale applications
  3. Comprehensive Experiments: Thorough validation across multiple data types
  4. Clear Writing: Well-structured paper with rigorous theoretical exposition

Weaknesses

  1. Assumption Limitations: Pre-treatment assumptions may not hold in certain application scenarios
  2. Test Dependence: Method performance heavily depends on accuracy of conditional independence tests
  3. Scalability: Scalability to ultra-large networks requires further validation

Impact

  1. Academic Value: Provides new theoretical and methodological frameworks for causal inference
  2. Practical Significance: Offers efficient solutions for covariate selection in real-world applications
  3. Reproducibility: Open-source code and detailed experimental settings ensure good reproducibility

Applicable Scenarios

This method is particularly suitable for:

  • Causal effect estimation on large-scale observational data
  • Complex systems with latent confounding variables
  • Real-time applications with computational efficiency requirements
  • Research designs with relatively complete pre-treatment variable collection

References

The paper cites important literature in causal inference, including Pearl's classical works, the PC algorithm by Spirtes et al., and recent local learning methods, demonstrating comprehensive understanding and deep engagement with related work.