2025-11-25T10:28:17.626083

Smoothed analysis for graph isomorphism

Anastos, Kwan, Moore

There is no known polynomial-time algorithm for graph isomorphism testing, but elementary combinatorial "refinement" algorithms seem to be very efficient in practice. Some philosophical justification is provided by a classical theorem of Babai, ErdÅs and Selkow: an extremely simple polynomial-time combinatorial algorithm (variously known as "naÃ¯ve refinement", "naÃ¯ve vertex classification", "colour refinement" or the "1-dimensional Weisfeiler-Leman algorithm") yields a so-called canonical labelling scheme for "almost all graphs". More precisely, for a typical outcome of a random graph $G(n,1/2)$, this simple combinatorial algorithm assigns labels to vertices in a way that easily permits isomorphism-testing against any other graph. We improve the Babai-ErdÅs-Selkow theorem in two directions. First, we consider randomly perturbed graphs, in accordance with the smoothed analysis philosophy of Spielman and Teng: for any graph $G$, naÃ¯ve refinement becomes effective after a tiny random perturbation to $G$ (specifically, the addition and removal of $O(n\log n)$ random edges). Actually, with a twist on naÃ¯ve refinement, we show that $O(n)$ random additions and removals suffice. These results significantly improve on previous work of Gaudio-RÃ¡cz-Sridhar, and are in certain senses best-possible. Second, we complete a long line of research on canonical labelling of random graphs: for any $p$ (possibly depending on $n$), we prove that a random graph $G(n,p)$ can typically be canonically labelled in polynomial time. This is most interesting in the extremely sparse regime where $p$ has order of magnitude $c/n$; denser regimes were previously handled by BollobÃ¡s, Czajka-Pandurangan, and Linial-Mosheiff. Our proof also provides a description of the automorphism group of a typical outcome of $G(n,p_n)$ (slightly correcting a prediction of Linial-Mosheiff).

academic

Smoothed analysis for graph isomorphism

Basic Information

Paper ID: 2410.06095
Title: Smoothed analysis for graph isomorphism
Authors: Michael Anastos, Matthew Kwan, Benjamin Moore
Classification: math.CO cs.CC cs.DS
Publication Date: October 2024
Paper Link: https://arxiv.org/abs/2410.06095v3

Abstract

The graph isomorphism testing problem lacks a known polynomial-time algorithm, yet basic combinatorial "refinement" algorithms perform remarkably efficiently in practice. The classical theorem of Babai, Erdős, and Selkow provides a philosophical explanation: an extremely simple polynomial-time combinatorial algorithm (called "naive refinement," "naive vertex classification," "color refinement," or "1-dimensional Weisfeiler-Leman algorithm") provides a canonical labeling scheme for "almost all graphs."

This paper improves the Babai-Erdős-Selkow theorem in two directions: first, by considering randomly perturbed graphs according to the smoothed analysis framework of Spielman and Teng; second, by completing a long-standing research line on canonical labelings of random graphs.

Research Background and Motivation

Problem Background

Importance of the graph isomorphism problem: Graph isomorphism testing is a central problem in computational complexity theory, occupying a special position between P and NP-complete
Gap between practice and theory: Despite requiring exponential time in the worst case, the color refinement algorithm performs excellently in practice
Limitations of the Babai-Erdős-Selkow theorem: The classical theorem applies only to random graphs G(n,1/2) and performs poorly on structured graphs

Research Motivation

Application of smoothed analysis: Applying the Spielman-Teng smoothed analysis framework to the graph isomorphism problem
Extending applicability: Proving that minor random perturbations enable the color refinement algorithm to work on arbitrary graphs
Perfecting the theoretical framework: Providing complete canonical labeling theory for random graphs of all densities

Core Contributions

Smoothed analysis results: Proving that after O(n log n) random edge perturbations to an arbitrary graph G₀, the color refinement algorithm almost always succeeds
Improved perturbation bounds: Through a modified algorithm, reducing the required perturbations to O(n) random edges
Complete theory for sparse random graphs: Providing polynomial-time canonical labeling schemes for random graphs G(n,p) of arbitrary density p
Characterization of automorphism groups: Describing the automorphism group structure of typical random graphs, correcting predictions by Linial-Mosheiff

Detailed Methodology

Task Definition

Given two n-vertex graphs G₁ and G₂, the graph isomorphism problem requires determining whether there exists a bijection between vertex sets preserving the adjacency relations. Canonical labeling is a method for assigning a standard form to each graph such that isomorphic graphs have identical labelings.

Core Algorithm: Color Refinement

Basic Framework

The color refinement algorithm is an iterative process:

Initialization: All vertices are assigned the same color
Refinement step: Update each vertex's color based on the color distribution of its neighbors
Stabilization: Repeat until the color assignment no longer changes

Mathematical Description

For a graph G and coloring c : V(G) → Ω, the refinement operation is defined as:

R_G c(v) = (c(v), (d_ω(v))_{ω∈Ω})

where d_ω(v) is the number of neighbors of vertex v with color ω.

Views and Universal Covers

The algorithm's effectiveness is analyzed through the concept of "views":

View T_G(v) encodes all possible paths starting from vertex v
Two vertices have the same color if and only if their views are isomorphic

Technical Innovations

1. Expansion and Anti-Concentration Techniques

Expansion properties: Utilizing strong expansion properties of random graphs to prove that small vertex sets grow rapidly
Anti-concentration inequalities: Applying Erdős-Littlewood-Offord type inequalities to control random fluctuations

2. Core Structure Analysis

k-core: The k-core of a graph is the maximal subgraph with minimum degree at least k
Special properties of 2-cores: In 2-cores, vertices with degree at least 3 can typically be distinguished by color refinement

3. Sprinkling Technique

Decomposing random perturbations into multiple independent sparse perturbations:

Each round of perturbation assigns unique colors to new vertices
Utilizing monotonicity to progressively improve graph properties

4. Disparity Graph

Defining disparity graph D(G,c) to analyze the effectiveness of color refinement:

Capturing mismatches between graph structure and color classes
Small connected components corresponding to effective canonical labelings

Main Theorems

Theorem 1.2 (Smoothed Analysis - Basic Version)

For constant ε > 0 and p ≥ (1+ε)log n/n, for any graph G₀ and random graph G_rand ~ G(n,p), the color refinement algorithm almost always distinguishes all vertices of G₀△G_rand.

Theorem 1.3 (Improved Smoothed Analysis)

There exist a graph class H and a polynomial-time canonical labeling algorithm such that for p ≥ 100/n, for any graph G₀ and G_rand ~ G(n,p), almost always G₀△G_rand ∈ H.

Theorem 1.4 (Sparse Random Graphs)

For any sequence (p_n), random graphs G_n ~ G(n,p_n) can almost always be canonically labeled in polynomial time.

Proof Techniques

Key Lemma 4.1

This is the core technical result, proving that in appropriately randomly perturbed graphs, when S^{≤i}({u,v}) is sufficiently large, vertices u and v are almost always distinguished by color refinement.

Proof Strategy

Exploration process: Gradually revealing random edges and studying the evolution of view difference sets
Expansion lemma: Proving that small difference sets grow exponentially
Anti-concentration analysis: Using anti-concentration properties of independent random variables

2-Dimensional Weisfeiler-Leman Algorithm

For finer analysis, introducing the 2-dimensional version:

Coloring vertex pairs rather than individual vertices
Detecting distance information
Providing stronger distinguishing power

Experimental Setup

Theory-Focused Analysis

This paper primarily conducts theoretical analysis, proving algorithm effectiveness through probabilistic methods:

Probabilistic model: Using the Erdős-Rényi random graph model G(n,p)
Asymptotic analysis: Studying behavior as n → ∞
High-probability events: Proving required properties hold with probability 1-o(1)

Complexity Analysis

Color refinement: O((n+m)log n) time
2-dimensional algorithm: O(n³log n) time
Overall algorithm: Polynomial time

Main Results

Smoothed Analysis Results

Perturbation threshold: Proving that p ≥ (1+ε)log n/n is the threshold for color refinement success
Optimality: This threshold is optimal in a certain sense
Improved algorithm: Using 2-dimensional Weisfeiler-Leman algorithm to reduce the threshold to p ≥ 100/n

Sparse Random Graph Results

Complete characterization: Providing canonical labeling schemes for all densities p
Phase transition phenomenon: Discovering critical phase transitions near p ≈ 1/n
Automorphism groups: Completely describing the automorphism group structure of sparse random graphs

Technical Contributions

New analytical tools: Developing new techniques for analyzing randomly perturbed graphs
Unified framework: Unifying results across different density regimes under one framework
Precise constants: Providing exact constant bounds in multiple results

Historical Development

Classical results: Babai-Erdős-Selkow (1980) establishing foundational theory
Dense case: Bollobás et al. addressing denser random graphs
Sparse case: Linial-Mosheiff addressing partial sparse cases

Smoothed Analysis Background

Spielman-Teng framework: Introducing smoothed analysis to discrete problems
Graph algorithm applications: Previous applications to graph coloring, matching, and other problems
This paper's contribution: First systematic application of smoothed analysis to graph isomorphism

Algorithmic Complexity

Babai's breakthrough: Quasipolynomial-time algorithm
Practical algorithms: Individualization-refinement paradigm
Theory and practice: Theoretical work explaining practical algorithm effectiveness

Conclusions and Discussion

Main Conclusions

Theoretical explanation: Providing theoretical explanation for the practical effectiveness of color refinement algorithms
Power of perturbation: Demonstrating the significant impact of minor random perturbations
Complete picture: Providing a complete theoretical landscape for random graph isomorphism

Limitations

Perturbation requirements: Still requiring a certain amount of random perturbation
Constant optimization: Some constants may not be optimal
Practical application: Further work needed to translate theoretical results to practical algorithms

Future Directions

Perturbation models: Considering other types of random perturbations
Algorithm improvement: Developing more efficient practical algorithms
Generalized applications: Applying techniques to other graph algorithm problems

In-Depth Evaluation

Strengths

Theoretical depth: Providing profound theoretical insights explaining an important practical phenomenon
Technical innovation: Developing multiple new analytical techniques, particularly view analysis and sprinkling methods
Completeness: Providing a relatively complete theoretical landscape for a classical problem
Precision: Providing exact thresholds and constants in multiple results

Technical Contributions

Methodology: Successfully applying smoothed analysis to discrete structure problems
Analytical tools: Systematic use of disparity graphs, views, 2-dimensional Weisfeiler-Leman and other concepts
Probabilistic techniques: Clever combination of expansion properties and anti-concentration inequalities

Shortcomings

Complexity: Proof techniques are relatively complex with room for improved readability
Practicality: Transition from theoretical results to practical algorithms is not sufficiently direct
Constant optimization: Some technical constants may have room for improvement

Impact Assessment

Academic impact: Making important contributions to graph isomorphism and random graph theory
Methodological impact: Demonstrating application of smoothed analysis in discrete mathematics
Practical potential: Providing theoretical guidance for developing better graph isomorphism algorithms

Applicable Scenarios

Theoretical research: Graph isomorphism complexity and random graph theory research
Algorithm design: Inspiring new graph isomorphism algorithm designs
Related problems: Techniques potentially applicable to other graph algorithm problems

Technical Details Supplement

Key Inequalities

The paper employs several important probabilistic inequalities:

Chernoff bounds for concentration analysis
Erdős-Littlewood-Offord type anti-concentration inequalities
Precise estimates of modal probabilities

Graph-Theoretic Structures

Properties and computation of k-cores
Bare paths and core structures
Evolution of connected components

Algorithm Complexity

Detailed analysis of time complexity for each algorithm component, proving overall polynomial-time properties.

This paper makes important contributions to theoretical research on the graph isomorphism problem, particularly in explaining the effectiveness of practical algorithms and perfecting random graph theory. Although the techniques are relatively complex, it provides new perspectives and profound insights into this classical problem.