2025-11-10T02:53:59.476691

Chromatic correlation clustering via cluster LP

Abbasi, An, Byrka et al.

Correlation Clustering is a fundamental clustering problem, and there has been a line of work on improving the approximation ratio for this problem in recent years. A key algorithmic component in these works is the cluster LP. Chromatic Correlation Clustering is an interesting generalization that has also been intensively studied. In light of success of the cluster LP in Correlation Clustering, it would be an interesting question whether the cluster LP can be used in Chromatic Correlation Clustering. We answer this question with affirmatives by presenting a $(2+\varepsilon)$-approximation algorithm for Chromatic Correlation Clustering using a chromatic cluster LP.

academic

Chromatic Correlation Clustering via Cluster LP

Basic Information

Paper ID: 2510.13446
Title: Chromatic Correlation Clustering via Cluster LP
Authors: Fateme Abbasi, Hyung-Chan An, Jarosław Byrka, Changyeol Lee, Yongho Shin
Classification: cs.DS (Data Structures and Algorithms)
Publication Date: October 15, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.13446

Abstract

Correlation clustering is a fundamental clustering problem, and recent years have witnessed a series of works improving its approximation ratio. A key algorithmic component in these works is the cluster LP. Chromatic correlation clustering is an interesting generalization that has also received extensive study. Given the success of cluster LP in correlation clustering, an interesting question is whether cluster LP can be applied to chromatic correlation clustering. This paper affirmatively answers this question by proposing a $(2+\varepsilon)$ -approximation algorithm using chromatic cluster LP.

Research Background and Motivation

Problem Background

Correlation Clustering Problem: Correlation clustering is a fundamental problem in combinatorial optimization and machine learning, aiming to partition vertices into clusters such that positive edges (+edges) have endpoints in the same cluster and negative edges (-edges) have endpoints in different clusters.
Chromatic Correlation Clustering: This is a generalization of correlation clustering where each positive edge has a color label, and vertices within the same cluster must be connected by edges of the same color.
Research Motivation:
- The approximation ratio for correlation clustering has continuously improved in recent years, from an initial 3-approximation to the current 1.437-approximation
- Cluster LP is a key technical component in these improvements
- Existing methods for chromatic correlation clustering are limited to color-blind algorithms, reductions to standard correlation clustering, or standard LP relaxations
- The latest 2.15-approximation algorithm still relies on reduction methods

Research Significance

Exploring whether cluster LP techniques can be directly applied to chromatic correlation clustering to achieve better approximation ratios is significant for both theory and practice.

Core Contributions

Chromatic Cluster LP Formulation: Designs a natural generalization of cluster LP from correlation clustering, applicable to chromatic correlation clustering problems
Polynomial-Time Approximation: Proves that chromatic cluster LP can be approximately solved optimally in polynomial time
2-Approximation Rounding Algorithm: Designs an algorithm for rounding feasible solutions of chromatic cluster LP to integer solutions with approximation ratio 2
$(2+\varepsilon)$ -Approximation Algorithm: Combines the above two results to obtain a $(2+\varepsilon)$ -approximation algorithm for chromatic correlation clustering, improving upon the previous 2.15-approximation
Preclustering Technique: Generalizes the preclustering concept from correlation clustering to the chromatic setting, which is key to achieving polynomial-time solvability

Methodology Details

Problem Definition

Input:

Color set $L$
Complete graph with edges marked as +edges or -edges
Each +edge $e$ associated with a color $c_e \in L$

Output:

Vertex partition $C$
Coloring function $\chi: C \to L$ assigning a color to each cluster

Objective: Minimize the number of inconsistent edges, where inconsistent edges are defined as:

-edges with both endpoints in the same cluster
+edges with endpoints in different clusters
+edges with both endpoints in the same cluster, but the cluster's color does not match the edge's color

Chromatic Cluster LP

The core linear programming relaxation is defined as:

$\min \sum_{S\subseteq V,\ell\in L} \left(\frac{|\delta^+(S)|}{2} + |E^{-\ell}[S]|\right) z^\ell_S$

Subject to: $\sum_{S\ni v,\ell\in L} z^\ell_S = 1, \quad \forall v \in V$ $z^\ell_S \geq 0, \quad \forall S \subseteq V, \forall\ell \in L$

Where:

$z^\ell_S$ indicates whether set $S$ is a cluster of color $\ell$
$\delta^+(S)$ is the set of +edges crossing $S$
$E^{-\ell}[S]$ is the set of all edges in $S$ except +edges of color $\ell$

Algorithm Framework

Step 1: Preclustering Construction

Obtain initial solution $(C^{init}, \chi^{init})$ using a constant-factor approximation algorithm
Mark vertices satisfying specific conditions (based on parameters $\alpha, \beta$ )
Construct preclusters $K$ and color assignment $\chi^{pre}$

Step 2: Bounded Subcluster LP

Restrict the search space to clusters of size at most $r = \Theta(\varepsilon^{-12})$
Construct and solve a polynomial-size LP

Step 3: Monte Carlo Sampling

Sample $\Delta y_\emptyset$ colored clusters from the LP solution
Apply the Raghavendra-Tan rounding algorithm
Construct the final feasible solution

Key Technical Innovations

Chromatic Preclustering:
- Generalizes the preclustering concept to the chromatic setting
- Proves that optimal solutions must respect the preclustering structure
- Controls the number of admissible edges to $O(\varepsilon^{-2})\text{opt}$
Cluster-Based Rounding Algorithm:
- Designs a specialized probabilistic rounding procedure
- Analyzes the probability of different edge types becoming inconsistent
- Proves a 2-approximation ratio

Experimental Setup

This is a theoretical computer science paper whose main contributions lie in algorithm design and theoretical analysis, without an experimental evaluation section. The research focus is on:

Theoretical Analysis: Proving algorithm correctness and approximation ratio
Complexity Analysis: Proving polynomial-time complexity
Mathematical Proofs: Verifying results through rigorous mathematical derivation

Experimental Results

Main Theoretical Results

Theorem 3.2: Given a feasible solution to chromatic cluster LP, the cluster-based algorithm outputs an integer solution whose expected cost is at most twice the LP solution cost.

Theorem 4.3: There exists a polynomial-time algorithm constructing a preclustering instance such that:

There exists a respecting solution with cost at most $(1+O(\varepsilon))\text{opt}$
The number of admissible edges $|E_{adm}| \leq O(\varepsilon^{-2})\text{opt}$

Main Result: Chromatic correlation clustering admits a $(2+\varepsilon)$ -approximation algorithm, improving upon the previous 2.15-approximation.

Complexity Analysis

Preclustering construction: $O(n^2)$ time
LP solving: $\text{poly}(n, \varepsilon^{-1})$ time
Monte Carlo sampling: $n^{O(\varepsilon^{-12})}$ time

Correlation Clustering Research

Classical Results: Bansal et al.'s 3-approximation algorithm
Recent Progress: Improving approximation ratio to 1.437 through cluster LP techniques
Key Techniques: Sherali-Adams hierarchy, preclustering techniques

Chromatic Correlation Clustering Research

Color-Blind Algorithms: Methods ignoring color information
Reduction Methods: Transforming to standard correlation clustering
LP Rounding: Rounding algorithms based on standard LP relaxations
Latest Results: Lee et al.'s 2.15-approximation and 1.64-approximation algorithms

Chromatic cluster LP can be approximately solved in polynomial time
A 2-approximation rounding algorithm exists
Overall achieves a $(2+\varepsilon)$ -approximation algorithm, improving upon the existing best 2.15-approximation

Limitations

Time Complexity: Although polynomial, it has exponential dependence on $\varepsilon^{-1}$
Approximation Ratio: Room for improvement remains, with a gap from the 1.437-approximation for standard correlation clustering
Practicality: Lacks experimental validation of algorithm performance

Future Directions

Explore whether the same approximation ratio as standard correlation clustering can be achieved
Improve the algorithm's time complexity
Study theoretical separations between approximation ratios for the two problems

In-Depth Evaluation

Strengths

Theoretical Innovation: First successful application of cluster LP techniques to chromatic correlation clustering
Technical Depth: The generalization of preclustering techniques demonstrates high technical difficulty
Result Improvement: Theoretically improves upon existing best results
Rigorous Proofs: Complete and rigorous mathematical analysis

Weaknesses

Missing Experiments: As an algorithmic paper, lacks experimental validation
High Complexity: The algorithm's actual running time may be prohibitive
Limited Improvement: The improvement from 2.15 to 2 is relatively modest
Generalizability: The method's extensibility remains to be verified

Impact

Theoretical Contribution: Provides a new technical pathway for chromatic correlation clustering
Methodological Value: The generalization of cluster LP techniques has methodological significance
Foundation for Future Work: Lays groundwork for further improvements in approximation ratios

Applicable Scenarios

Theoretical Research: Provides new case studies for approximation algorithm theory
Practical Applications: Applicable to clustering problems requiring edge color constraints
Algorithm Design: Provides technical reference for related optimization problems

References

The paper cites 24 important references, primarily including:

Bansal, Blum, Chawla (2004) - Foundational work on correlation clustering
Cao et al. (2024) - 1.437-approximation algorithm
Cohen-Addad et al. (2023) - Preclustering techniques
Lee et al. (2025) - Latest results on chromatic correlation clustering
Raghavendra, Tan (2012) - Rounding algorithm techniques

These references form the important theoretical foundation and technical support for this paper.