2025-11-14T13:34:11.421709

Explaining Models under Multivariate Bernoulli Distribution via Hoeffding Decomposition

Ferrere, Bousquet, Gamboa et al.

Explaining the behavior of predictive models with random inputs can be achieved through sub-models decomposition, where such sub-models have easier interpretable features. Arising from the uncertainty quantification community, recent results have demonstrated the existence and uniqueness of a generalized Hoeffding decomposition for such predictive models when the stochastic input variables are correlated, based on concepts of oblique projection onto L 2 subspaces. This article focuses on the case where the input variables have Bernoulli distributions and provides a complete description of this decomposition. We show that in this case the underlying L 2 subspaces are one-dimensional and that the functional decomposition is explicit. This leads to a complete interpretability framework and theoretically allows reverse engineering. Explicit indicators of the influence of inputs on the output prediction (exemplified by Sobol' indices and Shapley effects) can be explicitly derived. Illustrated by numerical experiments, this type of analysis proves useful for addressing decision-support problems, based on binary decision diagrams, Boolean networks or binary neural networks. The article outlines perspectives for exploring high-dimensional settings and, beyond the case of binary inputs, extending these findings to models with finite countable inputs.

academic

多変量ベルヌーイ分布下のモデル説明：Hoeffding分解を用いた手法

基本情報

論文ID: 2510.07088
タイトル: Explaining Models under Multivariate Bernoulli Distribution via Hoeffding Decomposition
著者: Baptiste Ferrere, Nicolas Bousquet, Fabrice Gamboa, Jean-Michel Loubes, Joseph Muré
分類: stat.ML cs.LG
発表日時: 2025年10月10日 (arXiv v2)
論文リンク: https://arxiv.org/abs/2510.07088

要約

本論文は、確率的入力を持つ予測モデルの解釈可能性問題を研究し、部分モデル分解を通じてモデル動作の理解を実現している。不確定性定量化領域の最新の進展に基づき、入力変数が多変量ベルヌーイ分布に従う場合について、一般化Hoeffding分解の完全な記述を提供している。研究により、この場合の基礎となるL²部分空間は1次元であり、関数分解は明示的であることが示され、完全な解釈可能性フレームワークの基礎が確立された。理論上、逆エンジニアリングが可能である。さらに、入力が出力予測に与える影響の明示的指標（Sobol指数やShapley効果など）を導出し、数値実験を通じて意思決定支援問題におけるこの手法の有効性を検証している。

研究背景と動機

問題定義

中核的課題：相関のある二値入力変数を持つ複雑な予測モデルの動作をいかに説明するか
実務的必要性：機械学習と不確定性定量化において、入力変数はしばしば独立ではなく、独立性を仮定する従来のHoeffding分解は実際の応用では過度に制限的である
応用シーン：二値決定図、ブール網、二値ニューラルネットワーク、分子構造表現、確率ブール網など

研究動機

従来のHoeffding分解（HD）は入力変数の相互独立性を要求するが、これは多くの実際の応用では現実的ではない。一般化Hoeffding分解（GHD）の理論的フレームワークは存在するが、特定の分布に対する明示的な構成方法が不足している。多変量ベルヌーイ分布は多くの領域で広く応用される重要な特殊ケースである。