2025-11-12T12:37:10.401101

Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables

Li, Guo, Xie et al.

Estimating causal effects from nonexperimental data is a fundamental problem in many fields of science. A key component of this task is selecting an appropriate set of covariates for confounding adjustment to avoid bias. Most existing methods for covariate selection often assume the absence of latent variables and rely on learning the global network structure among variables. However, identifying the global structure can be unnecessary and inefficient, especially when our primary interest lies in estimating the effect of a treatment variable on an outcome variable. To address this limitation, we propose a novel local learning approach for covariate selection in nonparametric causal effect estimation, which accounts for the presence of latent variables. Our approach leverages testable independence and dependence relationships among observed variables to identify a valid adjustment set for a target causal relationship, ensuring both soundness and completeness under standard assumptions. We validate the effectiveness of our algorithm through extensive experiments on both synthetic and real-world data.

academic

Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables

Basic Information

Paper ID: 2411.16315
Title: Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables
Authors: Zheng Li, Xichen Guo, Feng Xie, Zeng Yan, Hao Zhang, Zhi Geng
Classification: cs.LG math.ST stat.ML stat.TH
Conference: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Paper Link: https://arxiv.org/abs/2411.16315

Abstract

Estimating causal effects from non-experimental data is a fundamental problem across many scientific disciplines. A critical component of this task is selecting an appropriate set of covariates for confounding adjustment to avoid bias. Existing covariate selection methods typically assume the absence of latent variables and rely on learning the global network structure among variables. However, when the primary focus is on estimating the effect of a treatment variable on an outcome variable, identifying the global structure may be unnecessary and inefficient. To address this limitation, this paper proposes a novel local learning approach for covariate selection in nonparametric causal effect estimation in the presence of latent variables. The method leverages testable independence and dependence relationships among observed variables to identify valid adjustment sets for the target causal relationship, ensuring completeness and correctness under standard assumptions.

Research Background and Motivation

Problem Definition

The core problem addressed by this research is: How can we efficiently select a set of covariates to estimate the causal effect of a specific treatment variable X on an outcome variable Y in the presence of latent variables?

Problem Significance

Broad Applicability: Causal effect estimation is crucial in epidemiology, social sciences, economics, and artificial intelligence
Practical Necessity: In real-world applications, ideal randomized controlled trials are often difficult to implement
Bias Control: Incorrect covariate selection leads to biased causal effect estimates

Limitations of Existing Methods

Global Structure Learning: Existing methods such as IDA and LV-IDA require learning the complete causal graph structure, resulting in high computational complexity
Neglect of Latent Variables: Many methods assume the absence of latent confounding variables, which is unrealistic in practical applications
Incompleteness of Local Methods: Methods like CEELS, while more efficient, may miss valid adjustment sets

Research Motivation

The starting point of this paper is to develop a covariate selection method that maintains the efficiency advantages of local learning while ensuring completeness and correctness, particularly in complex scenarios with latent variables.

Core Contributions

Proposes LSAS Algorithm: Designs a fully local covariate selection algorithm that leverages testable independence and dependence relationships, allowing for the existence of latent variables
Theoretical Guarantees: Proves the completeness and correctness of the proposed algorithm under standard assumptions, enabling identification of valid adjustment sets for target causal relationships
Efficiency Improvement: Significantly reduces computational complexity compared to global methods, reducing time complexity from O(t×2^t) to O(|MB(X)|-1)×2^|MB(Y)|-1+n
Experimental Validation: Verifies algorithm effectiveness on both synthetic and real data

Methodology Details

Task Definition

Input: Observed dataset D containing treatment variable X, outcome variable Y, and covariate set O Output:

Scenario S1: Causal effect estimate θ of X on Y
Scenario S2: Determination that X has no causal effect on Y (θ=0)
Scenario S3: Unable to determine whether causal effect exists (θ=∅)

Constraints:

Y is not a causal ancestor of X
O is a set of pre-treatment variables (X and Y are not causal ancestors of any variable in O)

Core Theoretical Foundation

AMB Definition

Defines adjustment sets within the Markov blanket AMB(X,Y):

Z ⊆ MB(Y) \ {X}
Z ∩ Forb(X,Y) = ∅
Z blocks all non-causal paths from X to Y

Key Theorems

Theorem 1 (AMB Existence): A subset of O serves as an adjustment set for (X,Y) if and only if a subset of MB(Y){X} serves as an adjustment set.

Theorem 2 (Rule R1): For Z ⊆ MB(Y){X}, if there exists S ∈ MB(X){Y} satisfying:

S ⊥̸⊥ Y | Z (condition i)
S ⊥⊥ Y | Z∪{X} (condition ii)

then Z is an AMB(X,Y), and X has a causal effect on Y.

Theorem 3 (Rule R2): If there exist Z ⊆ MB(Y){X} and S ∈ MB(X){Y} satisfying either:

X ⊥⊥ Y | Z (condition i)
S ⊥̸⊥ X | Z and S ⊥⊥ Y | Z (condition ii)

then X has no causal effect on Y.

LSAS Algorithm Flow

Algorithm 1: Local Search Adjustment Sets (LSAS)
Input: Observed dataset D, treatment variable X, outcome variable Y
1: MB(X), MB(Y) ← Markov Blanket Discovery(X,Y,D)
2: Θ ← ∅ // Initialize causal effect estimate
3: for each S ∈ MB(X)\{Y}, each Z ⊆ MB(Y)\{X} do
4:   if S and Z satisfy Rule R1 then
5:     Estimate causal effect θ of X on Y, Θ ← θ // Scenario S1
6:   end if
7:   if S and Z satisfy Rule R2 then
8:     return Θ ← 0 // No causal effect, Scenario S2
9:   end if
10: end for
Output: Estimated causal effect Θ // If ∅ then Scenario S3

Technical Innovations

Local Markov Blanket Utilization: Requires only Markov blanket information for X and Y, avoiding global graph learning
Rule-Driven Identification: Directly identifies causal relationships from conditional independence tests through R1 and R2 rules
Latent Variable Handling: Addresses latent confounding variables within the MAG framework
Completeness Guarantee: Theoretical proof of method completeness ensures no valid adjustment sets are missed

Experimental Setup

Datasets

Synthetic Data:
- Random graphs: Erdős-Rényi model G(n,d) with 20-50 nodes, average degree 3-9
- Specific structures: DAGs based on Figures 3(a) and 4(a)
- Benchmark networks: INSURANCE (27 nodes), MILDEW (35 nodes), WIN95PTS (76 nodes), ANDES (223 nodes)
Real Data: Cattaneo2 dataset containing 4,642 singleton birth records from Pennsylvania

Evaluation Metrics

Relative Error (RE): |（Estimated Value - True Value）/True Value| × 100%
Number of Tests (nTest): Number of conditional independence tests executed by the algorithm

Comparison Methods

LV-IDA: Global graph learning method based on RFCI algorithm
EHS: Global search method with pre-treatment assumption
CEELS: Local search method with pre-treatment assumption
LDP: Local search method relaxing pre-treatment assumption

Implementation Details

Sample sizes: 1K, 5K, 10K, 15K
Linear Gaussian causal models with edge weights sampled from Uniform0.5,1.5
Conditional independence test significance level: 0.01
Maximum conditioning set size: 3-7 (depending on network complexity)

Experimental Results

Main Results

Specific Structure Experiments

On MAG structures corresponding to Figures 3(b) and 4(b):

Relative Error: LSAS significantly outperforms other methods across all sample sizes
Test Efficiency: LSAS's nTest is far lower than LV-IDA and EHS
Completeness Advantage: CEELS and LDP fail to find valid adjustment sets on certain structures due to incompleteness

Benchmark Network Experiments

On MILDEW and WIN95PTS networks:

LSAS performs optimally on almost all evaluation metrics and sample sizes
LSAS outperforms other methods even when pre-treatment assumptions are violated
EHS cannot complete on large networks due to excessive runtime

Real Data Validation

On Cattaneo2 dataset studying the effect of maternal smoking during pregnancy on infant birth weight:

LSAS and EHS effect estimates both fall within the benchmark interval -250g, -200g
LSAS requires only 158 conditional independence tests, compared to 1,284 for CEELS and 266 for LDP
Validates method effectiveness in practical applications

Ablation Studies

The paper validates method robustness through experiments with varying network densities:

Performance of all methods declines with increasing graph density, but LSAS maintains clear advantages
In G(40,9) networks, although LDP has lower nTest, LSAS's RE is significantly superior

Runtime Analysis

LSAS demonstrates optimal runtime performance on most networks and sample sizes, with the only exception being WIN95PTS network at large sample size (15K) where LDP is faster, but LSAS shows significantly higher accuracy.

Methods with Known Causal Graphs

Classical Adjustment Criteria: Backdoor criterion, generalized backdoor criterion
Optimal Adjustment Sets: Finding adjustment sets with minimal asymptotic variance

Methods with Unknown Causal Graphs

Global Learning: IDA series methods requiring complete CPDAG/PAG learning
Local Learning: CovSel, EHS and other methods, but most assume no latent variables
Latent Variable Handling: LV-IDA, CE-SAT and other methods, but with high computational complexity

Advantages of This Work

Compared to existing work, this method achieves a unification of local learning efficiency and global method completeness, with particular advantages in handling latent variables.

Conclusions and Discussion

Main Conclusions

Proposes the first covariate selection algorithm that maintains locality while ensuring completeness in the presence of latent variables
Theoretically proves the correctness and completeness of the method
Experimental validation demonstrates significant advantages in efficiency and accuracy

Limitations

Pre-treatment Assumption: Still relies on pre-treatment assumption, though performs well under some violations
Descendant Identification: Cannot locally identify descendants of treatment variables without recovering the complete graph
Conditional Independence Testing: Depends on accurate conditional independence tests, which may have errors with finite samples

Future Directions

Relaxing Assumptions: Develop methods not dependent on pre-treatment assumptions
Background Knowledge Integration: Utilize domain knowledge to assist causal identification
Multi-environment Data: Leverage multi-environment data to enhance causal identification
Descendant Identification: Study methods for locally identifying treatment variable descendants

In-Depth Evaluation

Strengths

Theoretical Contribution: Provides a complete theoretical framework proving the feasibility of local methods
Practical Value: Significantly reduces computational complexity, enabling large-scale applications
Comprehensive Experiments: Thorough validation across multiple data types
Clear Writing: Well-structured paper with rigorous theoretical exposition

Weaknesses

Assumption Limitations: Pre-treatment assumptions may not hold in certain application scenarios
Test Dependence: Method performance heavily depends on accuracy of conditional independence tests
Scalability: Scalability to ultra-large networks requires further validation

Impact

Academic Value: Provides new theoretical and methodological frameworks for causal inference
Practical Significance: Offers efficient solutions for covariate selection in real-world applications
Reproducibility: Open-source code and detailed experimental settings ensure good reproducibility

Applicable Scenarios

This method is particularly suitable for:

Causal effect estimation on large-scale observational data
Complex systems with latent confounding variables
Real-time applications with computational efficiency requirements
Research designs with relatively complete pre-treatment variable collection

References

The paper cites important literature in causal inference, including Pearl's classical works, the PC algorithm by Spirtes et al., and recent local learning methods, demonstrating comprehensive understanding and deep engagement with related work.