2025-11-14T09:04:13.401384

Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods

Yang, Xiong, He

There is a growing interest in studying sequential neural posterior estimation (SNPE) techniques due to their advantages for simulation-based models with intractable likelihoods. The methods aim to learn the posterior from adaptively proposed simulations using neural network-based conditional density estimators. As an SNPE technique, the automatic posterior transformation (APT) method proposed by Greenberg et al. (2019) performs well and scales to high-dimensional data. However, the APT method requires computing the expectation of the logarithm of an intractable normalizing constant, i.e., a nested expectation. Although atomic proposals were used to render an analytical normalizing constant, it remains challenging to analyze the convergence of learning. In this paper, we reformulate APT as a nested estimation problem. Building on this, we construct several multilevel Monte Carlo (MLMC) estimators for the loss function and its gradients to accommodate different scenarios, including two unbiased estimators, and a biased estimator that trades a small bias for reduced variance and controlled runtime and memory usage. We also provide convergence results of stochastic gradient descent to quantify the interaction of the bias and variance of the gradient estimator. Numerical experiments for approximating complex posteriors with multimodality in moderate dimensions are provided to examine the effectiveness of the proposed methods.

academic

Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods

Basic Information

Paper ID: 2401.16776
Title: Leveraging Nested MLMC for Sequential Neural Posterior Estimation with Intractable Likelihoods
Authors: Xiliang Yang (South China University of Technology), Yifei Xiong (Purdue University), Zhijian He (South China University of Technology, Corresponding Author)
Classification: stat.CO cs.LG stat.ML
Publication Date: January 2024, arXiv Preprint
Paper Link: https://arxiv.org/abs/2401.16776

Abstract

This paper investigates the application of Sequential Neural Posterior Estimation (SNPE) techniques for simulation models with intractable likelihood functions. Addressing the nested expectation problem in the Automatic Posterior Transformation (APT) method, which requires computing the logarithmic expectation of an intractable normalization constant, the paper reformulates APT as a nested estimation problem and constructs several Multilevel Monte Carlo (MLMC) estimators, including two unbiased estimators and one biased estimator. The biased estimator trades a small bias for reduced variance and controlled runtime and memory usage. The paper also provides convergence results for stochastic gradient descent, quantifying the interaction between gradient estimator bias and variance.

Research Background and Motivation

Problem Background

Challenges of Simulation Models: Simulation models are widely used in neuroscience, physics, and biology, but traditional Bayesian inference faces challenges with intractable likelihood functions and expensive simulators.
Demand for SNPE Methods: Sequential neural posterior estimation methods avoid direct likelihood computation by learning posterior distributions from simulations with adaptive proposals using neural network conditional density estimators.
Limitations of APT Methods: Although the Automatic Posterior Transformation (APT) method proposed by Greenberg et al. performs well and scales to high-dimensional data, it requires computing the logarithmic expectation of an intractable normalization constant, forming a nested expectation problem.

Shortcomings of Existing Methods

Limitations of Atomic Proposals: While using atomic proposals yields analytical normalization constants, this complicates convergence analysis
Missing Theoretical Analysis: Existing techniques struggle to explain APT's poor performance on certain tasks
Computational Complexity Issues: Single-layer nested estimators have computational complexity O(ε^-3), which is inefficient

Core Contributions

APT Problem Reformulation: Reformulates the APT method as a nested estimation problem, providing a framework for rigorous convergence analysis
MLMC Estimator Construction: Develops three MLMC estimators:
- RU-MLMC: Randomized Unbiased Multilevel Monte Carlo
- GRR-MLMC: Generalized Russian Roulette Method
- TGRR-MLMC: Truncated Generalized Russian Roulette Method
Theoretical Analysis: Provides theoretical upper bounds on bias, variance, and average cost, proving that MLMC methods achieve optimal complexity O(ε^-2)
Convergence Guarantees: Establishes convergence theorems for stochastic gradient descent, quantifying the impact of bias and variance on optimization
Experimental Validation: Verifies method effectiveness on multiple benchmark tasks

Methodology Details

Task Definition

Given prior distribution p(θ) and observed data x_o, the goal is to approximate the posterior distribution p(θ|x_o) ∝ p(θ)p(x_o|θ), where the likelihood function p(x|θ) is intractable but can be sampled via a simulator.

Nested APT Reformulation

Loss Function Reformulation

The APT loss function is rewritten as:

L(φ) = -E_p̃(θ,x)[log g_φ(x,θ)] + E_p̃(x)[log E_p̃(θ')[g_φ(x,θ')]]

where g_φ(x,θ) = q_F(x,φ)(θ)/p(θ) is the importance weight.

Gradient Expression

The gradient is:

∇_φL(φ) = -E_p̃(θ,x)[∇_φ log g_φ(x,θ)] + E_p̃(x)[∇_φ log E_p̃(θ')[g_φ(x,θ')]]

MLMC Estimator Design

1. RU-MLMC (Randomized Unbiased MLMC)

Uses geometric distribution Ge(p) to randomly select level L, with estimator:

V_RU = ω_L^{-1}Δρ_{φ,L}

2. GRR-MLMC (Generalized Russian Roulette)

Introduces base level m, ensuring the first m levels are always computed:

V_GRR = ρ_{φ,M_m} + Σ_{j=m+1}^L (Δρ_{φ,j}/p_j)

3. TGRR-MLMC (Truncated GRR)

Controls computational cost and memory usage through distribution truncation:

V_TGRR = ρ_{φ,M_m} + Σ_{j=m+1}^L (Δρ_{φ,j}/p_j)

where L is restricted to the range m,m̄.

Reverse Coupling Construction

Uses reverse coupling technique to construct difference estimators:

Δρ_{φ,ℓ} = ρ_{φ,M_ℓ} - (1/2)(ρ_{φ,M_{ℓ-1}}^{(a)} + ρ_{φ,M_{ℓ-1}}^{(b)})

Theoretical Analysis

Complexity Analysis

Theorems 3.1 and 3.2: Under appropriate conditions, difference estimators satisfy:

Bias rate: α = 1
Variance rate: r ∈ (1,2]
Cost rate: γ = 1

Since r > γ, MLMC achieves optimal complexity O(ε^{-2}), a significant improvement over single-layer nested estimators' O(ε^{-3}).

Convergence Analysis

Theorem 4.2: Under Lipschitz continuity and strong convexity conditions, the optimal gap for SGD satisfies:

G_T ≤ (1-γμ)^T G_0 + (1/2μ)(U_b + U_η)

where U_b and U_η are upper bounds on bias and variance, respectively.

Experimental Setup

Datasets

Two-Moon Model: Toy model with 2D parameter space and multimodal posterior
Lotka-Volterra Model: Predator-prey dynamics model with 4D parameter space
M/G/1 Queue Model: Single-server queue system with 3D parameter space
Hodgkin-Huxley Neuron Model: High-dimensional neuron model with 8D parameter space

Evaluation Metrics

MMD (Maximum Mean Discrepancy): Measures distribution divergence
C2ST (Classifier Two-Sample Test): Binary classifier test
LMD (Logarithmic Median Distance): Logarithmic median distance
NLOG (Negative Log-density): Negative log-density at true parameters

Implementation Details

Neural Spline Flow (NSF) as conditional density estimator, 8 layers with 50 units each
Adam optimizer with learning rate 1×10^{-4}, batch size 100
N=1000 samples per round, R=20 rounds total
M_0 = 8, truncation level m̄ = 4, base level m = 2

Experimental Results

Main Results

Performance Comparison: TGRR-MLMC performs best on complex tasks (e.g., Lotka-Volterra), with superior C2ST means across three tasks compared to SNSE
Computational Efficiency: Although MLMC requires 1.2-1.5× computation time, GPU memory usage is only 1/12 of SNSE (5GB vs 60GB)
Method Selection Guidance:
- Simple tasks: RU-MLMC
- Moderate complexity: GRR-MLMC
- Complex tasks: TGRR-MLMC

Ablation Studies

Hyperparameter α Selection: Optimal α determined by minimizing asymptotic inefficiency
Impact of Truncation Level: Appropriate truncation significantly reduces variance and improves training stability

High-Dimensional Experiments

On the 8D Hodgkin-Huxley model, TGRR-MLMC shows improvements over atomic APT in both LMD and NLOG metrics, validating method scalability.

Likelihood-Free Bayesian Computation

ABC Methods: Approximate Bayesian Computation
Synthetic Likelihood: Summary statistic-based methods
Ratio Estimation: Inference via likelihood ratios

Neural Posterior Estimation

NPE: Neural Posterior Estimation baseline
SNPE: Sequential Neural Posterior Estimation framework
APT: Automatic Posterior Transformation

MLMC Methods

Nested Simulation: Applications in Bayesian experimental design
Unbiased Estimation: Russian roulette and stochastic truncation methods

Conclusions and Discussion

Main Conclusions

Nested MLMC methods provide a theoretically analyzable alternative to APT
Three MLMC variants offer flexible choices in bias-variance-cost tradeoffs
Theoretical analysis reveals that variance typically outweighs bias in neural network training

Limitations

High-Dimensional Challenges: May suffer from excessive variance in high-dimensional problems and complex neural network structures
Computational Overhead: MLMC requires more computation time than atomic APT due to multi-level gradient calculations
Parameter Tuning: Requires careful selection of level parameters and truncation settings

Future Directions

Quasi-Monte Carlo: Use low-discrepancy sequences to reduce MLMC estimator variance
Algorithm Acceleration: Develop more efficient MLMC algorithm implementations
Adaptive Strategies: Automatically select optimal MLMC variants and parameters

In-Depth Evaluation

Strengths

Theoretical Contribution: Reformulates APT as a nested estimation problem, providing a rigorous theoretical framework
Methodological Innovation: Designs three MLMC estimators providing optimal choices for different scenarios
Comprehensive Experiments: Validates method effectiveness across multiple benchmark tasks from simple to complex
Practical Value: Significantly reduces GPU memory requirements, improving practical applicability

Weaknesses

Computational Complexity: Although theoretically superior, actual runtime remains lengthy
Parameter Sensitivity: Requires careful tuning of multiple hyperparameters (α, m, m̄, etc.)
Scalability: Performance on extremely high-dimensional problems requires further verification

Impact

Theoretical Impact: Provides new theoretical analysis framework for SNPE methods
Practical Value: Memory efficiency improvements make the method more suitable for real applications
Reproducibility: Provides detailed implementation details and algorithm descriptions

Applicable Scenarios

Scientific computing problems with expensive simulators
Large-scale inference tasks requiring memory control
Bayesian inference applications requiring theoretical guarantees

References

Greenberg et al. (2019): Automatic posterior transformation for likelihood-free inference
Giles (2015): Multilevel Monte Carlo methods
Rhee & Glynn (2015): Unbiased estimation with square root convergence for SDE models
Papamakarios & Murray (2016): Fast ε-free inference of simulation models

Summary: This is an important paper with significant theoretical and practical value in the field of likelihood-free Bayesian inference. By cleverly reformulating APT as a nested estimation problem and introducing MLMC techniques, it addresses theoretical analysis difficulties and computational efficiency issues of the original method. While there remains room for improvement in computation time, its memory efficiency and theoretical guarantees make it an important contribution to the field.