2025-11-18T02:13:13.860390

Planted clique recovery in random geometric graphs

Avrachenkov, Bobu, Litvak et al.
We investigate the problem of identifying planted cliques in random geometric graphs, focusing on two distinct algorithmic approaches: the first based on vertex degrees (VD) and the other on common neighbors (CN). We analyze the performance of these methods under varying regimes of key parameters, namely the average degree of the graph and the size of the planted clique. We demonstrate that exact recovery is achieved with high probability as the graph size increases, in a specific set of parameters. Notably, our results reveal that the CN-algorithm significantly outperforms the VD-algorithm. In particular, in the connectivity regime, tiny planted cliques (even edges) are correctly identified by the CN-algorithm, yielding a significant impact on anomaly detection. Finally, our results are confirmed by a series of numerical experiments, showing that the devised algorithms are effective in practice.
academic

Planted clique recovery in random geometric graphs

Basic Information

  • Paper ID: 2510.12365
  • Title: Planted clique recovery in random geometric graphs
  • Authors: Konstantin Avrachenkov, Andrei Bobu, Nelly Litvak, Riccardo Michielan
  • Classification: math.PR (Probability Theory), cs.DS (Data Structures and Algorithms)
  • Publication Date: October 15, 2025
  • Paper Link: https://arxiv.org/abs/2510.12365

Abstract

This paper investigates the problem of planted clique recovery in random geometric graphs, focusing on two distinct algorithmic approaches: the vertex degree (VD) method and the common neighbors (CN) method. The authors analyze the performance of these methods across different parameter regimes, including the average degree of the graph and the size of the planted clique. The research demonstrates that under specific parameter sets, exact recovery can be achieved with high probability as the graph size increases. Notably, the CN algorithm significantly outperforms the VD algorithm. Particularly within the connectivity regime, the CN algorithm can correctly identify minute planted cliques (even single edges), which has important implications for anomaly detection. Finally, numerical experiments validate the theoretical results, demonstrating the practical effectiveness of the designed algorithms.

Research Background and Motivation

Problem Definition

The planted clique problem is a classical problem in graph theory, originally formulated for Erdős-Rényi random graphs. The problem can be formalized as follows: given a random graph, randomly select k vertices and force them to form a complete subgraph (clique), then design a polynomial-time algorithm to identify this planted clique.

Research Significance

  1. Practical Application Value: Planted clique detection has important applications in multiple domains, including:
    • Community detection in social networks
    • Identification of functional modules in biological networks
    • Network anomaly detection
    • Detection of hidden information in steganography
  2. Theoretical Importance: Random geometric graphs better simulate real-world networks compared to Erdős-Rényi graphs due to their clustering tendency and spatial structure properties.

Limitations of Existing Methods

  • Classical vertex degree-based algorithms (VD algorithm) require planted clique size k = Ω(√n log n) to succeed in Erdős-Rényi graphs
  • Lack of systematic research on the planted clique problem for random geometric graphs
  • Existing methods struggle to detect small-scale planted structures

Research Motivation

The authors argue that the geometric structure of random geometric graphs makes the detection of artificial structures (such as planted cliques) more effective than in Erdős-Rényi graphs, and may overcome the theoretical limitations of traditional algorithms.

Core Contributions

  1. Theoretical Analysis of VD Algorithm: First systematic analysis of the recovery threshold of vertex degree-based algorithms in random geometric graphs, determining the parameter regimes for algorithm success.
  2. Proposal and Analysis of CN Algorithm: Introduction of a polynomial-time algorithm based on common neighbors and proof of its effectiveness over a broader range of parameter regimes.
  3. Breakthrough Theoretical Results: Proof that the CN algorithm can recover extremely small planted cliques, even planted edges (k=2 case), which is impossible in Erdős-Rényi graphs.
  4. Experimental Validation: Numerical experiments verify the theoretical results, demonstrating the practical effectiveness of the algorithms.

Methodology Details

Task Definition

Input: Random geometric graph G_k(n,r_n) containing a planted clique of size k Output: Accurately identify the vertex set of the planted clique K Objective: Achieve exact recovery, i.e., lim_{n→∞} P(K_n = K̂_n) = 1

Random Geometric Graph Model

Construction of random geometric graph G(n,r_n):

  • Vertex positions: X_i uniformly distributed on the d-dimensional unit torus 0,1^d
  • Edge rule: Vertices i and j are connected if and only if d_T(X_i, X_j) ≤ r_n
  • Average degree: μ_n = nφ_d r_n^d, where φ_d is the volume of the d-dimensional unit ball

VD Algorithm (Vertex Degree Algorithm)

Algorithm Procedure:

  1. Compute the degree Z_i = |N(i)| for all vertices
  2. Select the k vertices with the largest degrees as the estimate of the planted clique

Theoretical Results:

  • Positive Result (Theorem 2): When k > (1+ε)(T(n)-t(n)), the VD algorithm successfully recovers the planted clique with high probability
  • Negative Result (Theorem 3): In certain parameter regimes, the VD algorithm necessarily fails

CN Algorithm (Common Neighbors Algorithm)

Algorithm Procedure:

  1. Iterate over all edges (i,j) ∈ E
  2. Check whether i and j have exactly k-2 common neighbors
  3. Verify whether these k-2 common neighbors form a clique
  4. If conditions are satisfied, return the k-clique consisting of {i,j} and its common neighbors

Core Idea: Exploit the geometric structure properties of random geometric graphs. As shown in Figure 1, naturally formed edges have common neighbors distributed in two disjoint regions R₁ and R₂, where vertices in these regions cannot connect to each other and thus cannot form a clique. In contrast, edges within the planted clique are not subject to this constraint.

Technical Innovations

  1. Geometric Structure Exploitation: The CN algorithm cleverly utilizes the spatial constraints of random geometric graphs
  2. Threshold Breakthrough: The CN algorithm can detect planted cliques much smaller than natural clique sizes
  3. Parameter Regime Extension: Compared to the VD algorithm, the CN algorithm is effective over a broader μ-k parameter plane

Experimental Setup

Experimental Parameters

  • Graph size: n = 10⁴
  • Average degree: μ ∈ {1, 5, 20}
  • Planted clique size: k varies to larger values
  • Number of repetitions: 1000 independent experiments for each parameter combination

Evaluation Metrics

Success rate: Proportion of experimental runs where the algorithm correctly recovers the planted clique

Comparison Methods

Direct comparison between VD algorithm and CN algorithm

Experimental Results

Main Results

Experimental results (Figure 3) fully validate the theoretical predictions:

  1. When μ = 1: Both algorithms perform similarly, succeeding at larger k values
  2. When μ = 5: CN algorithm begins to show advantages, recovering smaller planted cliques
  3. When μ = 20: CN algorithm significantly outperforms VD algorithm, nearly recovering all tested planted clique sizes

Key Findings

  • CN algorithm outperforms VD algorithm in all tested scenarios
  • As average degree μ increases, VD algorithm performance degrades while CN algorithm remains stable
  • CN algorithm successfully detects planted edges (k=2), providing experimental verification of theoretical results

Theoretical Analysis

VD Algorithm Analysis

Success Condition: min_{i∈K} Z_i > max_{i∈V\K} Z_i

Through analysis of the asymptotic behavior of maximum degree Δ_n and minimum degree δ_n in random geometric graphs:

  • When α = μ_n/log(n) ∈ [0,∞): requires k > (1+ε)(T(n)-t(n))
  • When α = ∞: requires k > εμ_n

CN Algorithm Analysis

Failure Condition Analysis: Algorithm fails if and only if one of the following events occurs:

  • Event A: All edge pairs in the planted clique have common neighbors outside the clique
  • Event B₁∩B₂: There exists an edge outside the clique with exactly k-2 common neighbors that form a clique

Success Regime (Theorem 4):

  1. When k_n ≤ αn and μ_n ne^{-c₁,d μ_n} = o(1)
  2. Or when more complex conditions (8) are satisfied

Classical Planted Clique Problem

  • Kučera (1995): First proposed the VD algorithm, applicable when k = Ω(√n log n)
  • Alon et al. (1998): Proved existence of polynomial algorithms succeeding when k > c√n

Random Geometric Graph Research

  • Asymptotic behavior of clique numbers (McDiarmid, Penrose, et al.)
  • Application domains: social networks, biological networks, machine learning

Contributions of This Paper

First extension of the planted clique problem to random geometric graphs, discovering the advantages brought by geometric structure.

Conclusions and Discussion

Main Conclusions

  1. CN algorithm significantly outperforms traditional VD algorithm in random geometric graphs
  2. Geometric structure enables detection of extremely small planted cliques (even planted edges)
  3. Theoretical results are fully validated by experiments

Limitations

  1. Analysis limited to hard random geometric graph models
  2. Theoretical guarantees for certain parameter regimes remain incomplete
  3. Algorithm complexity may be high: CN algorithm worst-case complexity is O(μ_n n(n + k²))

Future Directions

  1. Extension to soft random geometric graphs (e.g., Waxman graphs)
  2. Investigation of performance in high-dimensional settings
  3. Consideration of geometrically-defined planted cliques (e.g., all vertices within a circular region)
  4. Optimization of algorithm complexity and practical implementation

In-Depth Evaluation

Strengths

  1. Theoretical Innovation: First systematic study of planted clique problem in random geometric graphs, filling an important theoretical gap
  2. Method Superiority: CN algorithm demonstrates breakthrough performance, capable of detecting minute structures
  3. Analysis Depth: Provides comprehensive theoretical analysis framework, including both positive and negative results
  4. Experimental Validation: High consistency between theory and experiments, enhancing result credibility

Weaknesses

  1. Model Limitations: Only considers hard random geometric graphs; real networks may be more complex
  2. Theoretical Gaps: Theoretical guarantees incomplete for certain parameter regimes (beige region in Figure 2)
  3. Algorithm Complexity: CN algorithm has high complexity, potentially limiting practical applications
  4. Dimension Limitations: Analysis primarily focuses on low-dimensional cases

Impact

  1. Academic Value: Provides new perspectives for random graph theory and algorithm design
  2. Application Prospects: Potential applications in network anomaly detection, community discovery, and related fields
  3. Theoretical Significance: Demonstrates the importance of geometric structure in graph algorithms

Applicable Scenarios

  1. Network Security: Detection of anomalous connection patterns in networks
  2. Social Network Analysis: Identification of artificially constructed false communities
  3. Bioinformatics: Discovery of functional modules in protein interaction networks
  4. Data Mining: Detection of anomalous patterns in spatially-structured data

References

The paper cites 24 important references covering classical works in random graph theory, algorithm design, network analysis, and other related fields, providing a solid theoretical foundation for the research.


Overall Assessment: This is a high-quality paper with important contributions in both theory and practice. By extending the classical planted clique problem to random geometric graphs, the authors not only fill a theoretical gap but also discover significant advantages brought by geometric structure. The superior performance and theoretical guarantees of the CN algorithm make it highly promising for practical applications.