2025-11-13T01:28:10.704881

Revisiting Madigan and Mosurski: Collapsibility via Minimal Separators

Heng, Sun, He et al.

Collapsibility provides a principled approach for dimension reduction in contingency tables and graphical models. Madigan and Mosurski (1990) pioneered the study of minimal collapsible sets in decomposable models, but existing algorithms for general graphs remain computationally demanding. We show that a model is collapsible onto a target set precisely when that set contains all minimal separators between its non-adjacent vertices. This insight motivates the Close Minimal Separator Absorption (CMSA) algorithm, which constructs minimal collapsible sets using only local separator searches at very low costs. Simulations confirm substantial efficiency gains, making collapsibility analysis practical in high-dimensional settings.

academic

Revisiting Madigan and Mosurski: Collapsibility via Minimal Separators

Basic Information

Paper ID: 2510.09024
Title: Revisiting Madigan and Mosurski: Collapsibility via Minimal Separators
Authors: Pei Heng (Northeast Normal University), Yi Sun (Xinjiang University), Shiyuan He, Jianhua Guo (Beijing Technology and Business University)
Classification: stat.ME (Statistics - Methodology)
Published Journal: Biometrika (2025), 103, 1, p. 1
Paper Link: https://arxiv.org/abs/2510.09024

Abstract

Collapsibility provides a principled approach for dimensionality reduction in contingency tables and graphical models. Madigan and Mosurski (1990) pioneered the study of minimal collapsible sets in decomposable models, yet existing general graph algorithms remain computationally demanding. This paper establishes that a model is collapsible to a target set if and only if the target set contains all minimal separators between its non-adjacent vertices. This insight motivates the Close Minimal Separator Absorption (CMSA) algorithm, which constructs minimal collapsible sets using only computationally inexpensive local separator searches. Simulations confirm substantial efficiency gains, making collapsibility analysis practical in high-dimensional settings.

Research Background and Motivation

Problem Background

Collapsibility is a classical concept in multivariate statistical analysis, originally introduced by Yule (1903) and Simpson (1951). Within the framework of log-linear models, it provides a principled method for removing variables to simplify statistical analysis without distorting marginal associations.

Core Problem

For a given target variable set, how can one find the minimal superset to which the model can be collapsed without compromising inferential validity? Such a superset is termed a minimal collapsible set.

Limitations of Existing Methods

Madigan & Mosurski (1990)'s Selective Acyclic Hypergraph Reduction (SAHR) algorithm applies only to decomposable graphical models
Wang et al. (2011)'s convex hull approach and Heng & Sun (2023)'s path absorption method typically require global graph operations, which are computationally expensive in high-dimensional models
Lack of efficient algorithms based on local graph properties

Research Motivation

This paper revisits minimal collapsibility from a new perspective, aiming to:

Provide a separator-based characterization of collapsibility
Develop efficient algorithms based on local operations
Make collapsibility analysis practical in high-dimensional graphical models

Core Contributions

Theoretical Contribution: Establishes that a graphical model is collapsible to a target set if and only if that set contains all minimal separators between its non-adjacent vertices
Algorithmic Innovation: Proposes the Close Minimal Separator Absorption (CMSA) algorithm, which constructs minimal collapsible sets through local separator searches
Computational Efficiency: The CMSA algorithm achieves O(nm) time complexity and O(n) space complexity, outperforming existing methods
Practical Value: Makes collapsibility analysis practically feasible in high-dimensional settings

Methodology Details

Task Definition

Input: Hierarchical log-linear model L and its interaction graph G=(V,E), target variable set A⊆V Output: Minimal collapsible set μ containing A Constraint: Model L is collapsible to μ, and μ is the minimal set satisfying this condition

Core Theory

Key Lemma

Lemma 1 (Asmussen & Edwards, 1983): A graphical model L is collapsible to a subset A⊆V if and only if for any X,Y⊆A, X⊥Y|SG implies X⊥Y|S∩AG.

Main Theorem

Theorem 1: A graphical model L is collapsible to a subset A⊆V if and only if A contains every minimal xy-separator for each pair of non-adjacent vertices x,y in A.

Corollary 1: A graphical model L is collapsible to a subset A⊆V if and only if A contains at least one minimal xy-separator for each pair of non-adjacent vertices x,y in A.

CMSA Algorithm Architecture

Key Concepts

Close Minimal Separator (Definition 2): For any two non-adjacent vertices x,y∈V, if a minimal xy-separator S is completely contained in the neighborhood of x, i.e., S⊆N_G(x), then S is called a separator close to x, denoted S_G(x,y).

Algorithm Flow

The CMSA algorithm comprises the following main steps:

Component Identification: Identify all connected components M₁,...,M_K of G_{V\A}
Local Processing: For each connected component M_i:
- Initialize μᵢ := A
- Iteratively identify non-adjacent vertex pairs in the neighborhoods of connected components of G_{Mᵢ}
- Absorb their close minimal separators into μᵢ
- Terminate when the neighborhoods of all connected components form complete subsets
Result Merging: Merge all μᵢ to obtain the final minimal collapsible set μ = ⋃ᵢμᵢ

Technical Innovations

Localization Strategy: Transforms global graph operations into local separator searches
Close Separator Exploitation: Leverages properties of close separators to avoid full graph traversal
Component Decomposition: Reduces problem complexity through connected component decomposition
Incremental Construction: Iteratively absorbs separators until termination conditions are satisfied

Experimental Setup

Datasets

Decomposable Graphical Models:
- Graph scale: n ∈ {250, 500, 750, 1000}
- Edge probability: p ∈ {0.1, 0.01}
- 100 random chordal graphs generated for each configuration
General Graphical Models:
- Graph scale: n ∈ {2500, 5000, 7500, 10000}
- Edge probability: p ∈ {0.1, 0.01, 0.005, 0.001}
- Random graphs generated by adding edges to random trees

Evaluation Metrics

Execution Time: Average time for algorithm execution (seconds)
Efficiency Comparison: Relative performance against baseline methods

Comparison Methods

SAHR (Madigan & Mosurski, 1990): Applicable to decomposable graphs
IPA (Heng & Sun, 2023): Induced Path Absorption algorithm, applicable to general graphs

Implementation Details

Programming Language: C for core algorithms, Python interface
Hardware Environment: Intel Xeon Silver 4215R CPU, 128 GB RAM
10 random target vertices tested for each graph

Experimental Results

Decomposable Graphical Model Results

Nodes	250	500	750	1000
Avg Edges	529/3334	1812/12912	3567/28652	6062/52959
CMSA	0.0007/0.0012	0.0021/0.0047	0.0044/0.0112	0.0072/0.0248
SAHR	0.0113/0.0611	0.0681/0.5455	0.1876/2.1648	0.3808/6.6983

Key Findings:

CMSA significantly outperforms SAHR across all graph scales and densities
CMSA's advantage becomes increasingly pronounced as nodes and edges grow
On the largest-scale graph (1000 nodes, high density), CMSA is approximately 270 times faster than SAHR

General Graphical Model Results

Experimental results demonstrate that CMSA is significantly more efficient than IPA on dense graphs, with performance advantages increasing with node count. On sparse graphs, both algorithms show substantially reduced execution times, but CMSA consistently maintains superior efficiency.

Case Analysis

Example 1: Consider graph G and target set A = {c, b}

Initial connected components: M₁ = {x}, M₂ = {a, d}, M₃ = {g, l, t}
When processing M₂, non-adjacent pair {c, b} is discovered, separator {a} is absorbed
When processing M₃, the same pair {c, b} is handled, separator {l} is absorbed
Final minimal collapsible set obtained: {a, c, l, b}

Development of Collapsibility Theory

Yule (1903), Simpson (1951): First introduced the concept of collapsibility
Asmussen & Edwards (1983): Provided rigorous theoretical exposition in Biometrika
Madigan & Mosurski (1990): Posed the minimal collapsible set problem in Biometrika

Algorithm Development Timeline

SAHR Algorithm: Applicable only to decomposable graphs, efficient but limited in scope
Convex Hull Method (Wang et al., 2011): Extended to general graphs but computationally expensive
Path Absorption Method (Heng & Sun, 2023): Improved efficiency but still requires global operations

Theoretical Breakthrough: Establishes equivalence between collapsibility and minimal separators
Algorithmic Innovation: CMSA algorithm achieves a paradigm shift from global to local operations
Efficiency Gains: Achieves substantial computational efficiency improvements across various graphical models
Practical Value: Makes collapsibility analysis of high-dimensional graphical models practically feasible

Limitations

Theoretical Assumptions: Based on the hierarchical log-linear model framework
Graph Structure Dependence: Algorithm efficiency may be affected by specific graph structures
Implementation Complexity: Requires efficient separator search implementation

Future Directions

Extension to mixed graphical models (discrete and continuous variables)
Investigation of collapsibility analysis for online/dynamic graphs
Exploration of separator perspective applications in other graph inference problems

In-Depth Evaluation

Strengths

Theoretical Depth: Provides a novel theoretical perspective on collapsibility, transforming global problems into local separator problems
Algorithmic Innovation: CMSA algorithm is ingeniously designed, fully exploiting the local properties of close separators
Comprehensive Experiments: Thorough performance evaluation across multiple graph scales and densities
Practical Value: Substantial efficiency gains make the method more valuable for real-world applications

Weaknesses

Scope of Applicability: Primarily targets undirected graphical models; extensibility to directed graphs is unclear
Baseline Comparisons: For general graphical models, comparison is limited to the IPA algorithm; more baseline methods would be beneficial
Theoretical Analysis: Lacks average-case complexity analysis
Real-World Applications: Lacks application cases on real datasets

Impact

Academic Contribution: Provides a new theoretical framework for collapsibility research in graphical models
Practical Value: Substantial algorithmic efficiency gains enable practical applications in large-scale data analysis
Reproducibility: Authors provide complete open-source code, enhancing result reproducibility
Future Research: The separator perspective may inspire research on other graph inference problems

Applicable Scenarios

High-Dimensional Contingency Table Analysis: When variable dimensionality reduction is needed
Large-Scale Graphical Model Inference: Under computationally constrained conditions
Causal Inference: Identifying minimal sufficient sets for causal effect estimation
Data Mining: Feature selection and dimensionality reduction tasks

References

This paper builds primarily on the following key references:

Asmussen, S. & Edwards, D. (1983). Collapsibility and response variables in contingency tables. Biometrika.
Madigan, D. & Mosurski, K. (1990). An extension of the results of asmussen and edwards on collapsibility in contingency tables. Biometrika.
Takata, K. (2010). Space-optimal, backtracking algorithms to list the minimal vertex separators of a graph.
Wang, X., Guo, J. & He, X. (2011). Finding the minimal set for collapsible graphical models.