2025-11-14T09:04:13.401384

Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods

Yang, Xiong, He
There is a growing interest in studying sequential neural posterior estimation (SNPE) techniques due to their advantages for simulation-based models with intractable likelihoods. The methods aim to learn the posterior from adaptively proposed simulations using neural network-based conditional density estimators. As an SNPE technique, the automatic posterior transformation (APT) method proposed by Greenberg et al. (2019) performs well and scales to high-dimensional data. However, the APT method requires computing the expectation of the logarithm of an intractable normalizing constant, i.e., a nested expectation. Although atomic proposals were used to render an analytical normalizing constant, it remains challenging to analyze the convergence of learning. In this paper, we reformulate APT as a nested estimation problem. Building on this, we construct several multilevel Monte Carlo (MLMC) estimators for the loss function and its gradients to accommodate different scenarios, including two unbiased estimators, and a biased estimator that trades a small bias for reduced variance and controlled runtime and memory usage. We also provide convergence results of stochastic gradient descent to quantify the interaction of the bias and variance of the gradient estimator. Numerical experiments for approximating complex posteriors with multimodality in moderate dimensions are provided to examine the effectiveness of the proposed methods.
academic

Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods

Basic Information

  • Paper ID: 2401.16776
  • Title: Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods
  • Authors: Xiliang Yang (South China University of Technology), Yifei Xiong (Purdue University), Zhijian He (South China University of Technology, Corresponding Author)
  • Classification: stat.CO cs.LG stat.ML
  • Publication Date: January 2024, arXiv Preprint
  • Paper Link: https://arxiv.org/abs/2401.16776

Abstract

This paper investigates the application of Sequential Neural Posterior Estimation (SNPE) techniques for simulation models with intractable likelihood functions. Addressing the nested expectation problem in the Automatic Posterior Transformation (APT) method, which requires computing the logarithmic expectation of an intractable normalization constant, the paper reformulates APT as a nested estimation problem and constructs several Multilevel Monte Carlo (MLMC) estimators, including two unbiased estimators and one biased estimator. The biased estimator trades a small bias for reduced variance and controlled runtime and memory usage. The paper also provides convergence results for stochastic gradient descent, quantifying the interaction between gradient estimator bias and variance.

Research Background and Motivation

Problem Background

  1. Challenges of Simulation Models: Simulation models are widely used in neuroscience, physics, and biology, but traditional Bayesian inference faces challenges with intractable likelihood functions and expensive simulators.
  2. Demand for SNPE Methods: Sequential neural posterior estimation methods avoid direct likelihood computation by learning posterior distributions from simulations with adaptive proposals using neural network conditional density estimators.
  3. Limitations of APT Methods: Although the Automatic Posterior Transformation (APT) method proposed by Greenberg et al. performs well and scales to high-dimensional data, it requires computing the logarithmic expectation of an intractable normalization constant, forming a nested expectation problem.

Shortcomings of Existing Methods

  • Limitations of Atomic Proposals: While using atomic proposals yields analytical normalization constants, this complicates convergence analysis
  • Missing Theoretical Analysis: Existing techniques struggle to explain APT's poor performance on certain tasks
  • Computational Complexity Issues: Single-layer nested estimators have computational complexity O(ε^-3), which is inefficient

Core Contributions

  1. APT Problem Reformulation: Reformulates the APT method as a nested estimation problem, providing a framework for rigorous convergence analysis
  2. MLMC Estimator Construction: Develops three MLMC estimators:
    • RU-MLMC: Randomized Unbiased Multilevel Monte Carlo
    • GRR-MLMC: Generalized Russian Roulette Method
    • TGRR-MLMC: Truncated Generalized Russian Roulette Method
  3. Theoretical Analysis: Provides theoretical upper bounds on bias, variance, and average cost, proving that MLMC methods achieve optimal complexity O(ε^-2)
  4. Convergence Guarantees: Establishes convergence theorems for stochastic gradient descent, quantifying the impact of bias and variance on optimization
  5. Experimental Validation: Verifies method effectiveness on multiple benchmark tasks

Methodology Details

Task Definition

Given prior distribution p(θ) and observed data x_o, the goal is to approximate the posterior distribution p(θ|x_o) ∝ p(θ)p(x_o|θ), where the likelihood function p(x|θ) is intractable but can be sampled via a simulator.

Nested APT Reformulation

Loss Function Reformulation

The APT loss function is rewritten as:

L(φ) = -E_p̃(θ,x)[log g_φ(x,θ)] + E_p̃(x)[log E_p̃(θ')[g_φ(x,θ')]]

where g_φ(x,θ) = q_F(x,φ)(θ)/p(θ) is the importance weight.

Gradient Expression

The gradient is:

∇_φL(φ) = -E_p̃(θ,x)[∇_φ log g_φ(x,θ)] + E_p̃(x)[∇_φ log E_p̃(θ')[g_φ(x,θ')]]

MLMC Estimator Design

1. RU-MLMC (Randomized Unbiased MLMC)

Uses geometric distribution Ge(p) to randomly select level L, with estimator:

V_RU = ω_L^{-1}Δρ_{φ,L}

2. GRR-MLMC (Generalized Russian Roulette)

Introduces base level m, ensuring the first m levels are always computed:

V_GRR = ρ_{φ,M_m} + Σ_{j=m+1}^L (Δρ_{φ,j}/p_j)

3. TGRR-MLMC (Truncated GRR)

Controls computational cost and memory usage through distribution truncation:

V_TGRR = ρ_{φ,M_m} + Σ_{j=m+1}^L (Δρ_{φ,j}/p_j)

where L is restricted to the range m,m̄.

Reverse Coupling Construction

Uses reverse coupling technique to construct difference estimators:

Δρ_{φ,ℓ} = ρ_{φ,M_ℓ} - (1/2)(ρ_{φ,M_{ℓ-1}}^{(a)} + ρ_{φ,M_{ℓ-1}}^{(b)})

Theoretical Analysis

Complexity Analysis

Theorems 3.1 and 3.2: Under appropriate conditions, difference estimators satisfy:

  • Bias rate: α = 1
  • Variance rate: r ∈ (1,2]
  • Cost rate: γ = 1

Since r > γ, MLMC achieves optimal complexity O(ε^{-2}), a significant improvement over single-layer nested estimators' O(ε^{-3}).

Convergence Analysis

Theorem 4.2: Under Lipschitz continuity and strong convexity conditions, the optimal gap for SGD satisfies:

G_T ≤ (1-γμ)^T G_0 + (1/2μ)(U_b + U_η)

where U_b and U_η are upper bounds on bias and variance, respectively.

Experimental Setup

Datasets

  1. Two-Moon Model: Toy model with 2D parameter space and multimodal posterior
  2. Lotka-Volterra Model: Predator-prey dynamics model with 4D parameter space
  3. M/G/1 Queue Model: Single-server queue system with 3D parameter space
  4. Hodgkin-Huxley Neuron Model: High-dimensional neuron model with 8D parameter space

Evaluation Metrics

  • MMD (Maximum Mean Discrepancy): Measures distribution divergence
  • C2ST (Classifier Two-Sample Test): Binary classifier test
  • LMD (Logarithmic Median Distance): Logarithmic median distance
  • NLOG (Negative Log-density): Negative log-density at true parameters

Implementation Details

  • Neural Spline Flow (NSF) as conditional density estimator, 8 layers with 50 units each
  • Adam optimizer with learning rate 1×10^{-4}, batch size 100
  • N=1000 samples per round, R=20 rounds total
  • M_0 = 8, truncation level m̄ = 4, base level m = 2

Experimental Results

Main Results

  1. Performance Comparison: TGRR-MLMC performs best on complex tasks (e.g., Lotka-Volterra), with superior C2ST means across three tasks compared to SNSE
  2. Computational Efficiency: Although MLMC requires 1.2-1.5× computation time, GPU memory usage is only 1/12 of SNSE (5GB vs 60GB)
  3. Method Selection Guidance:
    • Simple tasks: RU-MLMC
    • Moderate complexity: GRR-MLMC
    • Complex tasks: TGRR-MLMC

Ablation Studies

  • Hyperparameter α Selection: Optimal α determined by minimizing asymptotic inefficiency
  • Impact of Truncation Level: Appropriate truncation significantly reduces variance and improves training stability

High-Dimensional Experiments

On the 8D Hodgkin-Huxley model, TGRR-MLMC shows improvements over atomic APT in both LMD and NLOG metrics, validating method scalability.

Likelihood-Free Bayesian Computation

  • ABC Methods: Approximate Bayesian Computation
  • Synthetic Likelihood: Summary statistic-based methods
  • Ratio Estimation: Inference via likelihood ratios

Neural Posterior Estimation

  • NPE: Neural Posterior Estimation baseline
  • SNPE: Sequential Neural Posterior Estimation framework
  • APT: Automatic Posterior Transformation

MLMC Methods

  • Nested Simulation: Applications in Bayesian experimental design
  • Unbiased Estimation: Russian roulette and stochastic truncation methods

Conclusions and Discussion

Main Conclusions

  1. Nested MLMC methods provide a theoretically analyzable alternative to APT
  2. Three MLMC variants offer flexible choices in bias-variance-cost tradeoffs
  3. Theoretical analysis reveals that variance typically outweighs bias in neural network training

Limitations

  1. High-Dimensional Challenges: May suffer from excessive variance in high-dimensional problems and complex neural network structures
  2. Computational Overhead: MLMC requires more computation time than atomic APT due to multi-level gradient calculations
  3. Parameter Tuning: Requires careful selection of level parameters and truncation settings

Future Directions

  1. Quasi-Monte Carlo: Use low-discrepancy sequences to reduce MLMC estimator variance
  2. Algorithm Acceleration: Develop more efficient MLMC algorithm implementations
  3. Adaptive Strategies: Automatically select optimal MLMC variants and parameters

In-Depth Evaluation

Strengths

  1. Theoretical Contribution: Reformulates APT as a nested estimation problem, providing a rigorous theoretical framework
  2. Methodological Innovation: Designs three MLMC estimators providing optimal choices for different scenarios
  3. Comprehensive Experiments: Validates method effectiveness across multiple benchmark tasks from simple to complex
  4. Practical Value: Significantly reduces GPU memory requirements, improving practical applicability

Weaknesses

  1. Computational Complexity: Although theoretically superior, actual runtime remains lengthy
  2. Parameter Sensitivity: Requires careful tuning of multiple hyperparameters (α, m, m̄, etc.)
  3. Scalability: Performance on extremely high-dimensional problems requires further verification

Impact

  1. Theoretical Impact: Provides new theoretical analysis framework for SNPE methods
  2. Practical Value: Memory efficiency improvements make the method more suitable for real applications
  3. Reproducibility: Provides detailed implementation details and algorithm descriptions

Applicable Scenarios

  • Scientific computing problems with expensive simulators
  • Large-scale inference tasks requiring memory control
  • Bayesian inference applications requiring theoretical guarantees

References

  • Greenberg et al. (2019): Automatic posterior transformation for likelihood-free inference
  • Giles (2015): Multilevel Monte Carlo methods
  • Rhee & Glynn (2015): Unbiased estimation with square root convergence for SDE models
  • Papamakarios & Murray (2016): Fast ε-free inference of simulation models

Summary: This is an important paper with significant theoretical and practical value in the field of likelihood-free Bayesian inference. By cleverly reformulating APT as a nested estimation problem and introducing MLMC techniques, it addresses theoretical analysis difficulties and computational efficiency issues of the original method. While there remains room for improvement in computation time, its memory efficiency and theoretical guarantees make it an important contribution to the field.