2025-11-24T23:04:17.128917

Coagulation-Fragmentation Duality of Infinitely Exchangeable Partitions from Coupled Mixed Poisson Species Sampling Models

James
Jim Pitman's~(1999) celebrated coagulation-fragmentation duality for the PD($α$,$θ$) family of laws of Pitman and Marc Yor~(1997) has resisted generalization beyond its canonical setting. We resolve this by introducing a novel, four-part coupled process built upon the Poisson Hierarchical Indian Buffet Process (PHIBP), a framework developed for modeling microbiome species sampling. This approach yields a tractable generalization of the duality in two fundamental directions: to processes driven by arbitrary subordinators and to the previously uncharacterised multi-group ($J \ge 1$) setting, providing explicit laws for both. The static, fixed-time partitions are revealed to be a single projection of an inherently dynamic system. This new construction simultaneously defines: (i) the fine-grained partition, (ii) its coagulation operator, (iii) a forward-in-time system of coupled, time-homogeneous fragmentation processes in the sense of Jean Bertoin~(2006), and (iv) a dual, backward-in-time structured coalescent that drives simultaneous, across-group merger events. All four components are governed by a unified compositional structure, yielding their exact compound Poisson representations. The hallmark of this work is its circumvention of direct, and often intractable, analysis on mass and integer partition spaces. By shifting the problem to this transparent framework, the generalized duality emerges as a natural consequence of the architecture itself.
academic

Coagulation-Fragmentation Duality of Infinitely Exchangeable Partitions from Coupled Mixed Poisson Species Sampling Models

Basic Information

  • Paper ID: 2508.18668
  • Title: Coagulation-Fragmentation Duality of Infinitely Exchangeable Partitions from Coupled Mixed Poisson Species Sampling Models
  • Author: Lancelot F. James (Hong Kong University of Science and Technology)
  • Classification: math.PR (Probability Theory)
  • Publication Date: October 13, 2025 (arXiv version 3)
  • Paper Link: https://arxiv.org/abs/2508.18668

Abstract

This paper addresses the generalization of Jim Pitman's (1999) celebrated coagulation-fragmentation duality of the PD(α,θ) distribution family beyond its classical setting. The author achieves a tractable generalization of this duality in two fundamental directions through introducing a novel four-component coupled process based on the Poisson Hierarchical Indian Buffet Process (PHIBP): extension to cases driven by arbitrary subordinators, and the multi-population setting (J≥1) previously uncharacterized. The construction simultaneously defines four components: a fine-grained partition, its coagulation operator, a forward-time coupled homogeneous fragmentation process system, and a dual backward-time structured coalescence process.

Research Background and Motivation

Core Problem

The core problem addressed in this paper is the generalization of Pitman's classical coagulation-fragmentation duality from its specific PD(α,θ) distribution family setting to more general cases. This duality establishes deep structural relationships between two different Poisson-Dirichlet distributions, yet has remained ungeneralized for over two decades.

Problem Significance

  1. Theoretical Importance: Coagulation-fragmentation duality is a foundational result in combinatorial stochastic process theory; its generalization will substantially expand the theoretical framework
  2. Applied Value: Widespread applications in population genetics, Bayesian statistics, machine learning, and other fields
  3. Mathematical Challenge: Involves complex analysis on mass partition and integer partition spaces; traditional methods are difficult to apply

Limitations of Existing Methods

  1. Dependence on Special Algebraic Structure: Classical duality relies on special properties of stable-beta-gamma algebra
  2. Single Population Restriction: Existing theory applies only to the J=1 case
  3. Analytical Complexity: Direct analysis on partition spaces is often intractable and opaque

Research Motivation

Inspired by practical demands in microbiome species sampling modeling, the author discovered that the PHIBP framework implicitly defines a fully coupled dynamical system, thereby providing a new perspective for solving the classical problem.

Core Contributions

  1. Establishing Unified Framework: Proposes a four-component coupled process based on PHIBP, treating static partitions as projections of a dynamical system
  2. Achieving Theoretical Breakthrough: First generalization of coagulation-fragmentation duality to arbitrary subordinators and multi-population settings
  3. Providing Explicit Characterization: Gives precise composite Poisson representations and joint EPPF for all four components
  4. Establishing Dynamic Theory: Embeds static duality into continuous-time dynamical framework, revealing new process classes
  5. Discovering New Duality Relations: Proves simultaneous duality between Kingman coalescence process and α-stable homogeneous fragmentation process

Methodology Details

Task Definition

Construct a four-component coupled system (Ij, Aj, F_j,ℓ, Zj) that simultaneously defines:

  • Fine-grained partition and its coagulation operator
  • Forward fragmentation process system
  • Backward structured coalescence process
  • Explicit probability distributions for all components

Core Architecture

1. Subordinator Construction

Define J+1 independent subordinators:

  • Population-specific subordinators: σj (j ∈ {1,...,J})
  • Global tether subordinator: σ0

2. Four-Component Coupled Process

Theorem 3.1 (Unified Composite Poisson Representation): For each population j, the joint process vector is:

(Ij(γj,y), Aj(γj,y), (F_j,ℓ^(Hℓ)(γj,y))_ℓ≥1, Zj(γj,y))

where:

  • Ij: Fine-grained counting process
  • Aj: Allocation process (key to coagulation operator)
  • F_j,ℓ: Fragmentation process family
  • Zj: Coarse-grained counting process

3. Key Distribution Components

  • Number of coarse blocks: φ ~ Poisson(Ψ0(∑ψj(γj)))
  • Fine block counts: (Xj,ℓ) ~ MtP(τ0, ∑ψj(γj))
  • Individual counts: (Cj,k) ~ MtP(τj, γj)

Technical Innovations

1. Architectural Innovation

Through the Allocation process Aj as a structural connecting variable, all four components are placed on a unified probability space, avoiding the "black box" problem in traditional methods.

2. Poissonization Perspective

Transfers the problem to the "Poissonized world," where all components have explicit distributions within this framework, and complex marginal dependencies naturally arise through integration.

3. Pointwise Coupling

Provides pointwise coupling rather than merely distributional equivalence, making the coupling between fragmentation and coagulation operators tractable under the partition value setting.

Core Theorems and Results

Main Duality Identity

Theorem 3.2 (Unified Poissonized Duality Identity):

pcoag(π_n^(2)|π_n^(1),γ) · (pfine(π_n^(1)|γ) · fT1,n(γ)) 
= pfrag(π_n^(1)|π_n^(2),γ) · (pcoarse(π_n^(2)|γ) · fT1,n(γ))

Master Equation for Stable Case

Theorem 5.1: Under the stable subordinator setting, the joint distribution satisfies:

p_{β/α}(x1,...,xr) · pα(c1,...,cK) · fG_{K_n^[β]}(ζ) 
= ∏pα,-β(cl) · pβ(n1,...,nr) · fG_{K_n^[β]}(ζ)

Dynamic Extension

Theorem 6.1: Simultaneous duality between Kingman coalescence process and α-stable homogeneous fragmentation process, a relationship discovered for the first time.

Experimental Verification and Applications

Theoretical Verification

  1. Cross-Validation: Verifies the stable case through two independent approaches—marginal change-of-measure method (Section 2) and coupled Poisson construction (Section 5)
  2. Consistency Check: Proves that the J=1 case recovers Pitman's classical duality
  3. Limiting Behavior: Verifies convergence to Kingman-HFG duality as β→0

Computational Implementation

The paper provides detailed calculations for the generalized gamma family and microbiome dataset applications in accompanying work 22, including:

  • Prediction rule derivation
  • Large-scale dataset validation
  • Computational efficiency analysis

Classical Theoretical Foundations

  1. Pitman-Yor Processes: Two-parameter Poisson-Dirichlet distribution family and its properties
  2. Bertoin Fragmentation Theory: General theoretical framework for homogeneous fragmentation processes
  3. Kingman Paintbox Construction: Foundational theory for infinitely exchangeable random partitions

Modern Developments

  1. Poisson-Kingman Distributions: Distribution families generated by general subordinators
  2. Structured Coalescence Processes: Multi-type and fine-grained population models
  3. Microbiome Modeling: Probabilistic frameworks for complex count data

Innovations in This Paper

Compared to existing work, this paper is the first to:

  • Provide tractable duality for arbitrary subordinators
  • Establish complete theory for multi-population settings
  • Reveal deep connections between static and dynamic theory

Conclusions and Discussion

Main Conclusions

  1. Theoretical Breakthrough: Successfully generalizes Pitman duality to arbitrary subordinators and multi-population settings
  2. Methodological Innovation: PHIBP framework provides transparent tools for analyzing complex partition structures
  3. Application Prospects: Provides new modeling tools for population genetics, microbiome analysis, and other fields

Limitations

  1. Technical Complexity: Despite providing a unified framework, concrete calculations remain complex
  2. Application Verification: Requires more empirical validation of theoretical predictions
  3. Computational Efficiency: Computational complexity for large-scale applications requires further optimization

Future Directions

  1. Extended Applications: Apply the framework to broader scientific domains
  2. Algorithm Optimization: Develop more efficient computational algorithms
  3. Theoretical Deepening: Explore connections with other stochastic process theories

In-Depth Evaluation

Strengths

  1. Theoretical Depth: Resolves a twenty-year-old open problem with significant theoretical value
  2. Methodological Innovation: PHIBP framework provides a novel analytical perspective
  3. Complete Results: Provides explicit distributional characterizations and computational formulas
  4. Application Potential: Significant application prospects across multiple fields

Weaknesses

  1. Technical Threshold: Requires deep background in probability theory and stochastic processes
  2. Notational Complexity: Extensive technical notation may impact readability
  3. Computational Challenges: Relatively high computational complexity in practical applications

Impact

  1. Theoretical Impact: Will advance development of combinatorial stochastic process theory
  2. Applied Value: Provides new tools for complex data modeling
  3. Methodological Contribution: Demonstrates a pathway from applied problems to theoretical breakthroughs

Applicable Scenarios

  1. Population Genetics: Multi-population evolution and coalescence process modeling
  2. Microbiome Research: Complex community structure analysis
  3. Bayesian Statistics: Prior construction for infinite-dimensional parameter spaces
  4. Machine Learning: Hierarchical feature learning and clustering

References

The paper cites 55 important references, primarily including:

  • Pitman, J. (1999). Coalescents with multiple collisions. Original classical duality paper
  • Bertoin, J. (2006). Random Fragmentation and Coagulation Processes. Foundational fragmentation theory
  • Pitman, J. and Yor, M. (1997). The two-parameter Poisson-Dirichlet distribution. PD distribution theory
  • James, L.F. et al. (2025). Poisson Hierarchical Indian Buffet Processes. PHIBP framework

This paper represents a significant advance in combinatorial stochastic process theory. Through clever construction, it resolves a long-standing open problem while providing powerful tools for practical applications. Its theoretical depth and broad applicability make it an important contribution to the field.