2025-11-25T10:28:17.626083

Smoothed analysis for graph isomorphism

Anastos, Kwan, Moore
There is no known polynomial-time algorithm for graph isomorphism testing, but elementary combinatorial "refinement" algorithms seem to be very efficient in practice. Some philosophical justification is provided by a classical theorem of Babai, Erdős and Selkow: an extremely simple polynomial-time combinatorial algorithm (variously known as "naïve refinement", "naïve vertex classification", "colour refinement" or the "1-dimensional Weisfeiler-Leman algorithm") yields a so-called canonical labelling scheme for "almost all graphs". More precisely, for a typical outcome of a random graph $G(n,1/2)$, this simple combinatorial algorithm assigns labels to vertices in a way that easily permits isomorphism-testing against any other graph. We improve the Babai-Erdős-Selkow theorem in two directions. First, we consider randomly perturbed graphs, in accordance with the smoothed analysis philosophy of Spielman and Teng: for any graph $G$, naïve refinement becomes effective after a tiny random perturbation to $G$ (specifically, the addition and removal of $O(n\log n)$ random edges). Actually, with a twist on naïve refinement, we show that $O(n)$ random additions and removals suffice. These results significantly improve on previous work of Gaudio-Rácz-Sridhar, and are in certain senses best-possible. Second, we complete a long line of research on canonical labelling of random graphs: for any $p$ (possibly depending on $n$), we prove that a random graph $G(n,p)$ can typically be canonically labelled in polynomial time. This is most interesting in the extremely sparse regime where $p$ has order of magnitude $c/n$; denser regimes were previously handled by Bollobás, Czajka-Pandurangan, and Linial-Mosheiff. Our proof also provides a description of the automorphism group of a typical outcome of $G(n,p_n)$ (slightly correcting a prediction of Linial-Mosheiff).
academic

Smoothed analysis for graph isomorphism

Basic Information

  • Paper ID: 2410.06095
  • Title: Smoothed analysis for graph isomorphism
  • Authors: Michael Anastos, Matthew Kwan, Benjamin Moore
  • Classification: math.CO cs.CC cs.DS
  • Publication Date: October 2024
  • Paper Link: https://arxiv.org/abs/2410.06095v3

Abstract

The graph isomorphism testing problem lacks a known polynomial-time algorithm, yet basic combinatorial "refinement" algorithms perform remarkably efficiently in practice. The classical theorem of Babai, Erdős, and Selkow provides a philosophical explanation: an extremely simple polynomial-time combinatorial algorithm (called "naive refinement," "naive vertex classification," "color refinement," or "1-dimensional Weisfeiler-Leman algorithm") provides a canonical labeling scheme for "almost all graphs."

This paper improves the Babai-Erdős-Selkow theorem in two directions: first, by considering randomly perturbed graphs according to the smoothed analysis framework of Spielman and Teng; second, by completing a long-standing research line on canonical labelings of random graphs.

Research Background and Motivation

Problem Background

  1. Importance of the graph isomorphism problem: Graph isomorphism testing is a central problem in computational complexity theory, occupying a special position between P and NP-complete
  2. Gap between practice and theory: Despite requiring exponential time in the worst case, the color refinement algorithm performs excellently in practice
  3. Limitations of the Babai-Erdős-Selkow theorem: The classical theorem applies only to random graphs G(n,1/2) and performs poorly on structured graphs

Research Motivation

  1. Application of smoothed analysis: Applying the Spielman-Teng smoothed analysis framework to the graph isomorphism problem
  2. Extending applicability: Proving that minor random perturbations enable the color refinement algorithm to work on arbitrary graphs
  3. Perfecting the theoretical framework: Providing complete canonical labeling theory for random graphs of all densities

Core Contributions

  1. Smoothed analysis results: Proving that after O(n log n) random edge perturbations to an arbitrary graph G₀, the color refinement algorithm almost always succeeds
  2. Improved perturbation bounds: Through a modified algorithm, reducing the required perturbations to O(n) random edges
  3. Complete theory for sparse random graphs: Providing polynomial-time canonical labeling schemes for random graphs G(n,p) of arbitrary density p
  4. Characterization of automorphism groups: Describing the automorphism group structure of typical random graphs, correcting predictions by Linial-Mosheiff

Detailed Methodology

Task Definition

Given two n-vertex graphs G₁ and G₂, the graph isomorphism problem requires determining whether there exists a bijection between vertex sets preserving the adjacency relations. Canonical labeling is a method for assigning a standard form to each graph such that isomorphic graphs have identical labelings.

Core Algorithm: Color Refinement

Basic Framework

The color refinement algorithm is an iterative process:

  1. Initialization: All vertices are assigned the same color
  2. Refinement step: Update each vertex's color based on the color distribution of its neighbors
  3. Stabilization: Repeat until the color assignment no longer changes

Mathematical Description

For a graph G and coloring c : V(G) → Ω, the refinement operation is defined as:

R_G c(v) = (c(v), (d_ω(v))_{ω∈Ω})

where d_ω(v) is the number of neighbors of vertex v with color ω.

Views and Universal Covers

The algorithm's effectiveness is analyzed through the concept of "views":

  • View T_G(v) encodes all possible paths starting from vertex v
  • Two vertices have the same color if and only if their views are isomorphic

Technical Innovations

1. Expansion and Anti-Concentration Techniques

  • Expansion properties: Utilizing strong expansion properties of random graphs to prove that small vertex sets grow rapidly
  • Anti-concentration inequalities: Applying Erdős-Littlewood-Offord type inequalities to control random fluctuations

2. Core Structure Analysis

  • k-core: The k-core of a graph is the maximal subgraph with minimum degree at least k
  • Special properties of 2-cores: In 2-cores, vertices with degree at least 3 can typically be distinguished by color refinement

3. Sprinkling Technique

Decomposing random perturbations into multiple independent sparse perturbations:

  • Each round of perturbation assigns unique colors to new vertices
  • Utilizing monotonicity to progressively improve graph properties

4. Disparity Graph

Defining disparity graph D(G,c) to analyze the effectiveness of color refinement:

  • Capturing mismatches between graph structure and color classes
  • Small connected components corresponding to effective canonical labelings

Main Theorems

Theorem 1.2 (Smoothed Analysis - Basic Version)

For constant ε > 0 and p ≥ (1+ε)log n/n, for any graph G₀ and random graph G_rand ~ G(n,p), the color refinement algorithm almost always distinguishes all vertices of G₀△G_rand.

Theorem 1.3 (Improved Smoothed Analysis)

There exist a graph class H and a polynomial-time canonical labeling algorithm such that for p ≥ 100/n, for any graph G₀ and G_rand ~ G(n,p), almost always G₀△G_rand ∈ H.

Theorem 1.4 (Sparse Random Graphs)

For any sequence (p_n), random graphs G_n ~ G(n,p_n) can almost always be canonically labeled in polynomial time.

Proof Techniques

Key Lemma 4.1

This is the core technical result, proving that in appropriately randomly perturbed graphs, when S^{≤i}({u,v}) is sufficiently large, vertices u and v are almost always distinguished by color refinement.

Proof Strategy

  1. Exploration process: Gradually revealing random edges and studying the evolution of view difference sets
  2. Expansion lemma: Proving that small difference sets grow exponentially
  3. Anti-concentration analysis: Using anti-concentration properties of independent random variables

2-Dimensional Weisfeiler-Leman Algorithm

For finer analysis, introducing the 2-dimensional version:

  • Coloring vertex pairs rather than individual vertices
  • Detecting distance information
  • Providing stronger distinguishing power

Experimental Setup

Theory-Focused Analysis

This paper primarily conducts theoretical analysis, proving algorithm effectiveness through probabilistic methods:

  1. Probabilistic model: Using the Erdős-Rényi random graph model G(n,p)
  2. Asymptotic analysis: Studying behavior as n → ∞
  3. High-probability events: Proving required properties hold with probability 1-o(1)

Complexity Analysis

  • Color refinement: O((n+m)log n) time
  • 2-dimensional algorithm: O(n³log n) time
  • Overall algorithm: Polynomial time

Main Results

Smoothed Analysis Results

  1. Perturbation threshold: Proving that p ≥ (1+ε)log n/n is the threshold for color refinement success
  2. Optimality: This threshold is optimal in a certain sense
  3. Improved algorithm: Using 2-dimensional Weisfeiler-Leman algorithm to reduce the threshold to p ≥ 100/n

Sparse Random Graph Results

  1. Complete characterization: Providing canonical labeling schemes for all densities p
  2. Phase transition phenomenon: Discovering critical phase transitions near p ≈ 1/n
  3. Automorphism groups: Completely describing the automorphism group structure of sparse random graphs

Technical Contributions

  1. New analytical tools: Developing new techniques for analyzing randomly perturbed graphs
  2. Unified framework: Unifying results across different density regimes under one framework
  3. Precise constants: Providing exact constant bounds in multiple results

Historical Development

  1. Classical results: Babai-Erdős-Selkow (1980) establishing foundational theory
  2. Dense case: Bollobás et al. addressing denser random graphs
  3. Sparse case: Linial-Mosheiff addressing partial sparse cases

Smoothed Analysis Background

  1. Spielman-Teng framework: Introducing smoothed analysis to discrete problems
  2. Graph algorithm applications: Previous applications to graph coloring, matching, and other problems
  3. This paper's contribution: First systematic application of smoothed analysis to graph isomorphism

Algorithmic Complexity

  1. Babai's breakthrough: Quasipolynomial-time algorithm
  2. Practical algorithms: Individualization-refinement paradigm
  3. Theory and practice: Theoretical work explaining practical algorithm effectiveness

Conclusions and Discussion

Main Conclusions

  1. Theoretical explanation: Providing theoretical explanation for the practical effectiveness of color refinement algorithms
  2. Power of perturbation: Demonstrating the significant impact of minor random perturbations
  3. Complete picture: Providing a complete theoretical landscape for random graph isomorphism

Limitations

  1. Perturbation requirements: Still requiring a certain amount of random perturbation
  2. Constant optimization: Some constants may not be optimal
  3. Practical application: Further work needed to translate theoretical results to practical algorithms

Future Directions

  1. Perturbation models: Considering other types of random perturbations
  2. Algorithm improvement: Developing more efficient practical algorithms
  3. Generalized applications: Applying techniques to other graph algorithm problems

In-Depth Evaluation

Strengths

  1. Theoretical depth: Providing profound theoretical insights explaining an important practical phenomenon
  2. Technical innovation: Developing multiple new analytical techniques, particularly view analysis and sprinkling methods
  3. Completeness: Providing a relatively complete theoretical landscape for a classical problem
  4. Precision: Providing exact thresholds and constants in multiple results

Technical Contributions

  1. Methodology: Successfully applying smoothed analysis to discrete structure problems
  2. Analytical tools: Systematic use of disparity graphs, views, 2-dimensional Weisfeiler-Leman and other concepts
  3. Probabilistic techniques: Clever combination of expansion properties and anti-concentration inequalities

Shortcomings

  1. Complexity: Proof techniques are relatively complex with room for improved readability
  2. Practicality: Transition from theoretical results to practical algorithms is not sufficiently direct
  3. Constant optimization: Some technical constants may have room for improvement

Impact Assessment

  1. Academic impact: Making important contributions to graph isomorphism and random graph theory
  2. Methodological impact: Demonstrating application of smoothed analysis in discrete mathematics
  3. Practical potential: Providing theoretical guidance for developing better graph isomorphism algorithms

Applicable Scenarios

  1. Theoretical research: Graph isomorphism complexity and random graph theory research
  2. Algorithm design: Inspiring new graph isomorphism algorithm designs
  3. Related problems: Techniques potentially applicable to other graph algorithm problems

Technical Details Supplement

Key Inequalities

The paper employs several important probabilistic inequalities:

  • Chernoff bounds for concentration analysis
  • Erdős-Littlewood-Offord type anti-concentration inequalities
  • Precise estimates of modal probabilities

Graph-Theoretic Structures

  • Properties and computation of k-cores
  • Bare paths and core structures
  • Evolution of connected components

Algorithm Complexity

Detailed analysis of time complexity for each algorithm component, proving overall polynomial-time properties.


This paper makes important contributions to theoretical research on the graph isomorphism problem, particularly in explaining the effectiveness of practical algorithms and perfecting random graph theory. Although the techniques are relatively complex, it provides new perspectives and profound insights into this classical problem.