2025-11-14T13:34:11.421709

Explaining Models under Multivariate Bernoulli Distribution via Hoeffding Decomposition

Ferrere, Bousquet, Gamboa et al.
Explaining the behavior of predictive models with random inputs can be achieved through sub-models decomposition, where such sub-models have easier interpretable features. Arising from the uncertainty quantification community, recent results have demonstrated the existence and uniqueness of a generalized Hoeffding decomposition for such predictive models when the stochastic input variables are correlated, based on concepts of oblique projection onto L 2 subspaces. This article focuses on the case where the input variables have Bernoulli distributions and provides a complete description of this decomposition. We show that in this case the underlying L 2 subspaces are one-dimensional and that the functional decomposition is explicit. This leads to a complete interpretability framework and theoretically allows reverse engineering. Explicit indicators of the influence of inputs on the output prediction (exemplified by Sobol' indices and Shapley effects) can be explicitly derived. Illustrated by numerical experiments, this type of analysis proves useful for addressing decision-support problems, based on binary decision diagrams, Boolean networks or binary neural networks. The article outlines perspectives for exploring high-dimensional settings and, beyond the case of binary inputs, extending these findings to models with finite countable inputs.
academic

Explaining Models under Multivariate Bernoulli Distribution via Hoeffding Decomposition

Basic Information

  • Paper ID: 2510.07088
  • Title: Explaining Models under Multivariate Bernoulli Distribution via Hoeffding Decomposition
  • Authors: Baptiste Ferrere, Nicolas Bousquet, Fabrice Gamboa, Jean-Michel Loubes, Joseph Muré
  • Classification: stat.ML cs.LG
  • Publication Date: October 10, 2025 (arXiv v2)
  • Paper Link: https://arxiv.org/abs/2510.07088

Abstract

This paper investigates the interpretability of predictive models with stochastic inputs through sub-model decomposition to understand model behavior. Building on recent advances in uncertainty quantification, the paper provides a complete characterization of the generalized Hoeffding decomposition for the case where input variables follow a multivariate Bernoulli distribution. The research demonstrates that in this case, the underlying L² subspace is one-dimensional and the function decomposition is explicit, establishing the foundation for a complete interpretability framework that theoretically enables reverse engineering. The paper derives explicit indices measuring the impact of inputs on output predictions (such as Sobol indices and Shapley effects) and validates the method's effectiveness in decision support problems through numerical experiments.

Research Background and Motivation

Problem Definition

  1. Core Problem: How to explain the behavior of complex predictive models with correlated binary input variables
  2. Practical Need: In machine learning and uncertainty quantification, input variables are often not independent. Traditional Hoeffding decomposition assumes independence, which is overly restrictive in practical applications
  3. Application Scenarios: Binary decision diagrams, Boolean networks, binary neural networks, molecular structure representations, probabilistic Boolean networks, etc.

Research Motivation

Traditional Hoeffding decomposition (HD) requires input variables to be mutually independent, which is unrealistic in many practical applications. While theoretical frameworks for generalized Hoeffding decomposition (GHD) exist, explicit construction methods for specific distributions are lacking. The multivariate Bernoulli distribution, as an important special case, has widespread applications in many fields.

Limitations of Existing Methods

  1. Independence Assumption: Classical HD requires independent input variables, limiting its applicability
  2. Computational Complexity: Existing GHD methods lack explicit constructions, making practical computation difficult
  3. Insufficient Interpretability: Lack of a complete interpretability framework for binary inputs

Core Contributions

  1. Theoretical Contribution: Proves that the L² subspace of GHD under multivariate Bernoulli distribution is one-dimensional, providing an explicit function decomposition representation
  2. Constructive Method: Based on Fourier-Walsh-Hadamard basis transformation, provides explicit computation methods for decomposition coefficients
  3. Interpretability Framework: Derives explicit expressions for generalized Sobol indices and Shapley effects
  4. Algorithm Implementation: Provides truncation approximation methods for high-dimensional cases and statistical estimation guarantees
  5. Application Validation: Validates the method's effectiveness on synthetic and real datasets

Methodology Details

Task Definition

Given a d-dimensional multivariate Bernoulli random vector X = (X₁, ..., Xd) and a square-integrable function G: {0,1}^d → ℝ, the goal is to find a unique function decomposition:

G(X) = ∑_{A∈P_D} G_A(X_A)

where P_D is the power set of {1,...,d}, and the decomposition satisfies hierarchical orthogonality conditions.

Core Theoretical Framework

Multivariate Bernoulli Hoeffding Decomposition (MBHD)

The paper's central theoretical result is Theorem 2.2, which establishes an explicit decomposition representation:

Theorem 2.2: Let G: {0,1}^d → ℝ. Define:

  • g(X) := (e_A(X_A)G(X)){A∈P_D}, where e_A(X_A) := (-1)^{∑{j∈A} X_j}/P_A(X_A)
  • Γ = (Γ_{A,B}){A,B∈P_D} as the Gram matrix, Γ{A,B} := Ee_A(X_A)e_B(X_B)
  • μ as the mean of g(X)

Then the GHD is given by: G(X) = ∑_{A∈P_D} β_A e_A(X_A)

where the coefficients β satisfy the linear system: Γβ = μ

Geometric Interpretation

The paper also provides a geometric perspective (Corollary 2.3):

G(X) = ∑_{A∈P_D} ⟨G(X), e*_A(X)⟩e_A(X_A)

where e*_A(X) is the skew-dual vector of e_A(X_A).

Technical Innovations

  1. One-Dimensional Subspace Property: Proves that each Hoeffding decomposition space V_A is one-dimensional under multivariate Bernoulli distribution
  2. Explicit Basis Construction: The transformed Fourier-Walsh-Hadamard basis {e_A(X_A)}_{A∈P_D} forms a hierarchical orthogonal basis
  3. Linear System Solution: Transforms the decomposition problem into solving a 2^d-dimensional linear system Γβ = μ
  4. Exclusion Property: Proves that if certain variables have no causal impact on predictions, the corresponding β coefficients must be zero

Sensitivity Analysis Indices

Generalized Sobol Indices

The paper derives explicit expressions for generalized Sobol indices:

S_A := CovG(X), G_A(X_A)/VarG(X) = β_A β_B Γ_{A,B}/VarG(X)

These indices satisfy the normalization condition ∑_{A∈P_D} S_A = 1, but may take negative values (when strong negative correlations exist).

Generalized Shapley Effects

Based on the Harsanyi dividend definition of Shapley effects:

Sh_i = ∑_{A⊆D: i∈A} S_A/|A|

This has explicit expressions under multivariate Bernoulli distribution.

Experimental Setup

Synthetic Experiments

  1. Linear Threshold Functions: Designed 10-dimensional binary classifier G(X) = sign(W^T X + b)
  2. Correlation Control: Generated binary vectors with different correlation levels by thresholding multivariate Gaussian distributions
  3. Three Dependence Levels: High dependence (ρ=0.9), moderate dependence (ρ=0.5), weak dependence (ρ=0.1)

Decision Tree Applications

  1. Two-Dimensional Parametric Study: Used Farlie-Gumbel-Morgenstern copula to control dependence structure
  2. Mushroom Classification Dataset: Agaricus-Lepiota dataset from UCI Machine Learning Repository, 8,124 samples, 22 categorical attributes

Evaluation Metrics

  • Variance decomposition error: ‖S^ρ - S^ρ_⊥‖₁, ‖S^ρ - S^ρ_⊥‖₂
  • Relative error: Normalized error relative to true values
  • Classification performance: Precision, recall, F1 score

Experimental Results

Main Findings

Impact of Dependence on Variance Decomposition

Experiments show that ignoring input dependence leads to significant approximation errors:

  • Under high dependence, relative variance error reaches 87%
  • Relative error in Sobol matrix reaches 75% under high dependence
  • Error decreases significantly as correlation decreases

Decision Tree Analysis Results

  1. Two-Dimensional Case: Successfully recovered the theoretical conjunctive rule X₁X₂
  2. Mushroom Classification: Identified 5 critical binary rules, with odor rule accounting for 78.2% of total variance
  3. Feature Importance Hierarchy: X₁(odor) ≫ X₂(stem root) > {X₃,X₄,X₅}(other features)

Statistical Guarantees

The paper provides theoretical guarantees for estimators:

  • Strong consistency: Ĝₙ(x) →^{a.s.} G(x)
  • Asymptotic normality: Central limit theorem
  • Non-asymptotic concentration bounds: Bernstein-type inequalities

Computational Complexity and High-Dimensional Approximation

Curse of Dimensionality

Complete decomposition requires solving a 2^d-dimensional linear system, which is infeasible in high dimensions.

Truncation Approximation

Proposes a truncation method retaining only low-order terms: G_(x) := ∑_{A∈P_D, |A|≤c} G_A(x_A)

Complexity reduces from O(2^d) to O(d^c), with c ∈ {1,2,3} typically chosen in practice.

Error Decomposition

Total error decomposes into bias and variance components: E(G(x) - Ĝₙ,c(x))² = Bias² + Variance

Hoeffding Decomposition Theory

  • Classical HD (Hoeffding 1948): Independent input assumption
  • Generalized HD (Chastaing et al. 2012): Theoretical framework for correlated inputs
  • Recent advances (Il Idrissi et al. 2025): Oblique projection theory

Sensitivity Analysis

  • Sobol indices: Variance decomposition methods
  • Shapley values: Cooperative game theory methods
  • Kernel methods: Alternative approaches for handling dependence structures

Machine Learning Interpretability

  • SHAP: Shapley value-based explanation methods
  • LIME: Local interpretability methods
  • Attention mechanisms: Interpretability in deep learning

Conclusions and Discussion

Main Conclusions

  1. GHD under multivariate Bernoulli distribution has an explicit one-dimensional subspace structure
  2. Provides a complete constructive decomposition method and computational framework
  3. Generalized sensitivity indices can be computed explicitly with good theoretical properties
  4. The method has practical value in decision support and model interpretation

Limitations

  1. Full Support Assumption: Requires all 2^d configurations to have positive probability, potentially too restrictive in high dimensions
  2. Computational Complexity: Exponential complexity of complete decomposition limits high-dimensional applications
  3. Truncation Bias: Bias introduced by high-dimensional approximation requires further investigation

Future Directions

  1. Theoretical Extensions: Relax full support assumptions, extend to finite countable inputs
  2. Algorithm Optimization: Develop more efficient high-dimensional computation methods
  3. Application Expansion: Explore applications in deep learning and other machine learning models

In-Depth Evaluation

Strengths

  1. Theoretical Rigor: Provides a complete mathematical theoretical framework with proofs
  2. Methodological Innovation: First explicit decomposition for multivariate Bernoulli case
  3. Practical Value: Direct application value in binary input model interpretation
  4. Completeness: Forms a complete chain from theory to algorithms to applications

Weaknesses

  1. Limited Applicability: Only applies to binary inputs with full support assumption
  2. High-Dimensional Challenges: Exponential complexity limits large-scale applications
  3. Limited Experimental Validation: Primarily validated in low-dimensional and specific scenarios

Impact

  1. Theoretical Contribution: Provides important special case for function decomposition theory
  2. Methodological Value: Offers new tools for model interpretation with correlated binary inputs
  3. Application Potential: Broad application prospects in Boolean functions, decision trees, and related fields

Applicable Scenarios

  1. Binary Decision Systems: Such as medical diagnosis, credit assessment
  2. Boolean Network Analysis: Gene regulatory networks, logic circuits
  3. Decision Tree Interpretation: Random forests, gradient boosting trees and other ensemble methods
  4. Binary Neural Networks: Interpretability analysis of quantized neural networks

References

The paper cites 50 related references covering multiple domains including Hoeffding decomposition theory, sensitivity analysis, and machine learning interpretability, providing a solid theoretical foundation for the research.


Overall Assessment: This is a high-quality paper with rigorous theory and methodological innovation, making important contributions to function decomposition theory under multivariate Bernoulli distribution. While facing challenges in high-dimensional applications, it provides powerful theoretical tools for interpretability analysis of binary input models.