Collapsibility provides a principled approach for dimension reduction in contingency tables and graphical models. Madigan and Mosurski (1990) pioneered the study of minimal collapsible sets in decomposable models, but existing algorithms for general graphs remain computationally demanding. We show that a model is collapsible onto a target set precisely when that set contains all minimal separators between its non-adjacent vertices. This insight motivates the Close Minimal Separator Absorption (CMSA) algorithm, which constructs minimal collapsible sets using only local separator searches at very low costs. Simulations confirm substantial efficiency gains, making collapsibility analysis practical in high-dimensional settings.
Revisiting Madigan and Mosurski: Collapsibility via Minimal Separators
- Paper ID: 2510.09024
- Title: Revisiting Madigan and Mosurski: Collapsibility via Minimal Separators
- Authors: Pei Heng (Northeast Normal University), Yi Sun (Xinjiang University), Shiyuan He, Jianhua Guo (Beijing Technology and Business University)
- Classification: stat.ME (Statistics - Methodology)
- Published Journal: Biometrika (2025), 103, 1, p. 1
- Paper Link: https://arxiv.org/abs/2510.09024
Collapsibility provides a principled approach for dimensionality reduction in contingency tables and graphical models. Madigan and Mosurski (1990) pioneered the study of minimal collapsible sets in decomposable models, yet existing general graph algorithms remain computationally demanding. This paper establishes that a model is collapsible to a target set if and only if the target set contains all minimal separators between its non-adjacent vertices. This insight motivates the Close Minimal Separator Absorption (CMSA) algorithm, which constructs minimal collapsible sets using only computationally inexpensive local separator searches. Simulations confirm substantial efficiency gains, making collapsibility analysis practical in high-dimensional settings.
Collapsibility is a classical concept in multivariate statistical analysis, originally introduced by Yule (1903) and Simpson (1951). Within the framework of log-linear models, it provides a principled method for removing variables to simplify statistical analysis without distorting marginal associations.
For a given target variable set, how can one find the minimal superset to which the model can be collapsed without compromising inferential validity? Such a superset is termed a minimal collapsible set.
- Madigan & Mosurski (1990)'s Selective Acyclic Hypergraph Reduction (SAHR) algorithm applies only to decomposable graphical models
- Wang et al. (2011)'s convex hull approach and Heng & Sun (2023)'s path absorption method typically require global graph operations, which are computationally expensive in high-dimensional models
- Lack of efficient algorithms based on local graph properties
This paper revisits minimal collapsibility from a new perspective, aiming to:
- Provide a separator-based characterization of collapsibility
- Develop efficient algorithms based on local operations
- Make collapsibility analysis practical in high-dimensional graphical models
- Theoretical Contribution: Establishes that a graphical model is collapsible to a target set if and only if that set contains all minimal separators between its non-adjacent vertices
- Algorithmic Innovation: Proposes the Close Minimal Separator Absorption (CMSA) algorithm, which constructs minimal collapsible sets through local separator searches
- Computational Efficiency: The CMSA algorithm achieves O(nm) time complexity and O(n) space complexity, outperforming existing methods
- Practical Value: Makes collapsibility analysis practically feasible in high-dimensional settings
Input: Hierarchical log-linear model L and its interaction graph G=(V,E), target variable set A⊆V
Output: Minimal collapsible set μ containing A
Constraint: Model L is collapsible to μ, and μ is the minimal set satisfying this condition
Lemma 1 (Asmussen & Edwards, 1983): A graphical model L is collapsible to a subset A⊆V if and only if for any X,Y⊆A, X⊥Y|SG implies X⊥Y|S∩AG.
Theorem 1: A graphical model L is collapsible to a subset A⊆V if and only if A contains every minimal xy-separator for each pair of non-adjacent vertices x,y in A.
Corollary 1: A graphical model L is collapsible to a subset A⊆V if and only if A contains at least one minimal xy-separator for each pair of non-adjacent vertices x,y in A.
Close Minimal Separator (Definition 2): For any two non-adjacent vertices x,y∈V, if a minimal xy-separator S is completely contained in the neighborhood of x, i.e., S⊆N_G(x), then S is called a separator close to x, denoted S_G(x,y).
The CMSA algorithm comprises the following main steps:
- Component Identification: Identify all connected components M₁,...,M_K of G_{V\A}
- Local Processing: For each connected component M_i:
- Initialize μᵢ := A
- Iteratively identify non-adjacent vertex pairs in the neighborhoods of connected components of G_{Mᵢ}
- Absorb their close minimal separators into μᵢ
- Terminate when the neighborhoods of all connected components form complete subsets
- Result Merging: Merge all μᵢ to obtain the final minimal collapsible set μ = ⋃ᵢμᵢ
- Localization Strategy: Transforms global graph operations into local separator searches
- Close Separator Exploitation: Leverages properties of close separators to avoid full graph traversal
- Component Decomposition: Reduces problem complexity through connected component decomposition
- Incremental Construction: Iteratively absorbs separators until termination conditions are satisfied
- Decomposable Graphical Models:
- Graph scale: n ∈ {250, 500, 750, 1000}
- Edge probability: p ∈ {0.1, 0.01}
- 100 random chordal graphs generated for each configuration
- General Graphical Models:
- Graph scale: n ∈ {2500, 5000, 7500, 10000}
- Edge probability: p ∈ {0.1, 0.01, 0.005, 0.001}
- Random graphs generated by adding edges to random trees
- Execution Time: Average time for algorithm execution (seconds)
- Efficiency Comparison: Relative performance against baseline methods
- SAHR (Madigan & Mosurski, 1990): Applicable to decomposable graphs
- IPA (Heng & Sun, 2023): Induced Path Absorption algorithm, applicable to general graphs
- Programming Language: C for core algorithms, Python interface
- Hardware Environment: Intel Xeon Silver 4215R CPU, 128 GB RAM
- 10 random target vertices tested for each graph
| Nodes | 250 | 500 | 750 | 1000 |
|---|
| Avg Edges | 529/3334 | 1812/12912 | 3567/28652 | 6062/52959 |
| CMSA | 0.0007/0.0012 | 0.0021/0.0047 | 0.0044/0.0112 | 0.0072/0.0248 |
| SAHR | 0.0113/0.0611 | 0.0681/0.5455 | 0.1876/2.1648 | 0.3808/6.6983 |
Key Findings:
- CMSA significantly outperforms SAHR across all graph scales and densities
- CMSA's advantage becomes increasingly pronounced as nodes and edges grow
- On the largest-scale graph (1000 nodes, high density), CMSA is approximately 270 times faster than SAHR
Experimental results demonstrate that CMSA is significantly more efficient than IPA on dense graphs, with performance advantages increasing with node count. On sparse graphs, both algorithms show substantially reduced execution times, but CMSA consistently maintains superior efficiency.
Example 1: Consider graph G and target set A = {c, b}
- Initial connected components: M₁ = {x}, M₂ = {a, d}, M₃ = {g, l, t}
- When processing M₂, non-adjacent pair {c, b} is discovered, separator {a} is absorbed
- When processing M₃, the same pair {c, b} is handled, separator {l} is absorbed
- Final minimal collapsible set obtained: {a, c, l, b}
- Yule (1903), Simpson (1951): First introduced the concept of collapsibility
- Asmussen & Edwards (1983): Provided rigorous theoretical exposition in Biometrika
- Madigan & Mosurski (1990): Posed the minimal collapsible set problem in Biometrika
- SAHR Algorithm: Applicable only to decomposable graphs, efficient but limited in scope
- Convex Hull Method (Wang et al., 2011): Extended to general graphs but computationally expensive
- Path Absorption Method (Heng & Sun, 2023): Improved efficiency but still requires global operations
This paper unifies collapsibility theory through a separator perspective, providing the first efficient algorithm based on purely local operations.
- Theoretical Breakthrough: Establishes equivalence between collapsibility and minimal separators
- Algorithmic Innovation: CMSA algorithm achieves a paradigm shift from global to local operations
- Efficiency Gains: Achieves substantial computational efficiency improvements across various graphical models
- Practical Value: Makes collapsibility analysis of high-dimensional graphical models practically feasible
- Theoretical Assumptions: Based on the hierarchical log-linear model framework
- Graph Structure Dependence: Algorithm efficiency may be affected by specific graph structures
- Implementation Complexity: Requires efficient separator search implementation
- Extension to mixed graphical models (discrete and continuous variables)
- Investigation of collapsibility analysis for online/dynamic graphs
- Exploration of separator perspective applications in other graph inference problems
- Theoretical Depth: Provides a novel theoretical perspective on collapsibility, transforming global problems into local separator problems
- Algorithmic Innovation: CMSA algorithm is ingeniously designed, fully exploiting the local properties of close separators
- Comprehensive Experiments: Thorough performance evaluation across multiple graph scales and densities
- Practical Value: Substantial efficiency gains make the method more valuable for real-world applications
- Scope of Applicability: Primarily targets undirected graphical models; extensibility to directed graphs is unclear
- Baseline Comparisons: For general graphical models, comparison is limited to the IPA algorithm; more baseline methods would be beneficial
- Theoretical Analysis: Lacks average-case complexity analysis
- Real-World Applications: Lacks application cases on real datasets
- Academic Contribution: Provides a new theoretical framework for collapsibility research in graphical models
- Practical Value: Substantial algorithmic efficiency gains enable practical applications in large-scale data analysis
- Reproducibility: Authors provide complete open-source code, enhancing result reproducibility
- Future Research: The separator perspective may inspire research on other graph inference problems
- High-Dimensional Contingency Table Analysis: When variable dimensionality reduction is needed
- Large-Scale Graphical Model Inference: Under computationally constrained conditions
- Causal Inference: Identifying minimal sufficient sets for causal effect estimation
- Data Mining: Feature selection and dimensionality reduction tasks
This paper builds primarily on the following key references:
- Asmussen, S. & Edwards, D. (1983). Collapsibility and response variables in contingency tables. Biometrika.
- Madigan, D. & Mosurski, K. (1990). An extension of the results of asmussen and edwards on collapsibility in contingency tables. Biometrika.
- Takata, K. (2010). Space-optimal, backtracking algorithms to list the minimal vertex separators of a graph.
- Wang, X., Guo, J. & He, X. (2011). Finding the minimal set for collapsible graphical models.