2025-11-17T05:52:13.175509

Accelerated Evolving Set Processes for Local PageRank Computation

Huang, Luo, Xiao et al.
This work proposes a novel framework based on nested evolving set processes to accelerate Personalized PageRank (PPR) computation. At each stage of the process, we employ a localized inexact proximal point iteration to solve a simplified linear system. We show that the time complexity of such localized methods is upper bounded by $\min\{\tilde{\mathcal{O}}(R^2/ε^2), \tilde{\mathcal{O}}(m)\}$ to obtain an $ε$-approximation of the PPR vector, where $m$ denotes the number of edges in the graph and $R$ is a constant defined via nested evolving set processes. Furthermore, the algorithms induced by our framework require solving only $\tilde{\mathcal{O}}(1/\sqrtα)$ such linear systems, where $α$ is the damping factor. When $1/ε^2\ll m$, this implies the existence of an algorithm that computes an $\ epsilon $-approximation of the PPR vector with an overall time complexity of $\tilde{\mathcal{O}}\left(R^2 / (\sqrtαε^2)\right)$, independent of the underlying graph size. Our result resolves an open conjecture from existing literature. Experimental results on real-world graphs validate the efficiency of our methods, demonstrating significant convergence in the early stages.
academic

Accelerated Evolving Set Processes for Local PageRank Computation

Basic Information

  • Paper ID: 2510.08010
  • Title: Accelerated Evolving Set Processes for Local PageRank Computation
  • Authors: Binbin Huang, Baojian Zhou, Luo Luo, Deqing Yang, Yanghua Xiao (Fudan University)
  • Category: cs.LG
  • Conference: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
  • Paper Link: https://arxiv.org/abs/2510.08010v2

Abstract

This paper proposes a novel framework based on nested evolving set processes to accelerate personalized PageRank (PPR) computation. At each stage, localized inexact proximal point iterations are employed to solve simplified linear systems. The research demonstrates that the time complexity upper bound for such localized methods is min{O~(R2/ε2),O~(m)}\min\{\tilde{\mathcal{O}}(R^2/ε^2), \tilde{\mathcal{O}}(m)\} to obtain an ε-approximation of the PPR vector, where m denotes the number of edges in the graph and R is a constant defined through nested evolving set processes. The framework-induced algorithm requires solving only O~(1/α)\tilde{\mathcal{O}}(1/\sqrt{α}) such linear systems, where α is the damping factor. When 1/ε2m1/ε^2 \ll m, this implies an algorithm that computes an ε-approximation of the PPR vector with total time complexity O~(R2/(αε2))\tilde{\mathcal{O}}(R^2/(\sqrt{α}ε^2)), independent of the underlying graph size.

Research Background and Motivation

Problem Definition

The personalized PageRank (PPR) vector π ∈ ℝⁿ is defined as:

(I - (1-α)(I + AD⁻¹)/2)π = αeₛ

where eₛ is the standard basis vector corresponding to source node s, α ∈ (0,1) is the damping factor, and A and D are the adjacency matrix and degree matrix of undirected graph G(V,E), respectively.

Research Motivation

  1. Importance: PPR is a core tool for graph analysis, widely applied in local clustering, diffusion process modeling, graph neural networks, and other domains
  2. Existing Limitations:
    • Standard methods such as APPR have time complexity O(1/(αε))
    • Accelerated methods face challenges where momentum terms destroy residual monotonicity
    • Existing accelerated methods may access the entire graph, resulting in O(m/√α) time complexity
  3. Open Question: Whether there exists a localized accelerated method with time complexity dependent on 1/√α rather than 1/α

Core Contributions

  1. AESP Framework: Proposes the Accelerated Evolving Set Processes (AESP) framework, using O~(1/α)\tilde{O}(1/\sqrt{α}) short evolving set processes rather than a single long process
  2. Theoretical Guarantees: Establishes time complexity of O~(vol(St)/(αγt))\tilde{O}(\text{vol}(S_t)/(\sqrt{α}γ_t)), matching the conjectured acceleration bounds in existing literature
  3. Locality Guarantees: Proves an upper bound on vol(St)/γt\text{vol}(S_t)/γ_t of min{O(R2/ε2),2m}\min\{O(R^2/ε^2), 2m\}, achieving graph-size-independent complexity when 1/ε2m1/ε^2 \ll m
  4. Experimental Validation: Verifies method effectiveness on large-scale real graphs, demonstrating significant acceleration in early stages

Methodology Details

Task Definition

Given precision parameter ε, design a local algorithm to compute ε-approximation π̂ satisfying D1(π^π)ε\|D^{-1}(π̂ - π)\|_∞ ≤ ε, while avoiding access to the entire graph.

Core Idea: Nested Evolving Set Processes

1. Problem Reformulation

Reformulate the linear system as a strongly convex optimization problem:

min f(x) = (1/2)x⊤Qx - αx⊤D^(-1/2)b

where Q = ((1+α)/2)I - ((1-α)/2)D^(-1/2)AD^(-1/2), with eigenvalues λ(Q) ∈ α,1.

2. Nested ESP Definition

In outer loop iteration t, local solver M maintains a sequence of active sets {S_t^(k)}_{k≥0} on inner loop iterations k, with updates restricted to nodes in the active set.

3. AESP Update Rule

x^(t) = M(φ_t, y^(t-1), η, α, b, G)
y^(t) = x^(t) + β_t(x^(t) - x^(t-1))

where β_t is the momentum weight and M is the local operator.

Localized Inexact Proximal Operators

LOCGD (Local Gradient Descent)

z_t^(k+1) = z_t^(k) - (2∇h_t(z_t^(k)) ∘ 1_{S_t^k})/(1 + α + 2η)

Active node set: Stk={u:uht1/2(zt(k))εt}S_t^k = \{u : |∇_u h_t^{-1/2}(z_t^{(k)})| ≥ ε_t\}

LOCAPPR (Local APPR)

Coordinate-level updates for uiStku_i ∈ S_t^k:

z_t^(k_{i+1}) = z_t^(k_i) - (2∇h_t(z_t^{(k_i)}) ∘ 1_{u_i})/(1 + α + 2η)

Technical Innovations

  1. Monotonicity Preservation: By solving regularized PPR linear systems with constant condition number, ensures monotonic decrease of ℓ₁ norm of D^{1/2}-scaled gradients
  2. Locality Control: Introduces constant R to bound gradient ratios in nested ESP processes
  3. Acceleration-Locality Balance: Achieves trade-off between condition number 1/α dependency and per-round time complexity O(R²/ε²)

Experimental Setup

Datasets

Experiments use 19 real-world graphs ranging from small to ultra-large scale:

  • Medium-scale: com-dblp (317K nodes, 1M edges)
  • Large-scale: ogb-mag240m (244M nodes, 1.7B edges), ogbn-papers100M (111M nodes, 1.6B edges)
  • Others: com-friendster, wiki-en21, etc.

Evaluation Metrics

  • Error Measure: logD1(π^π)\log \|D^{-1}(π̂-π)\|_∞
  • Efficiency Measure: Number of operations, running time
  • Acceleration Ratio: Performance improvement relative to baseline methods

Comparison Methods

  • APPR: Standard approximate personalized PageRank algorithm
  • APPR-opt: APPR with optimal step size
  • LOCGD: Local gradient descent
  • ASPR: Accelerated sparse personalized PageRank
  • FISTA: Fast iterative shrinkage-thresholding algorithm

Implementation Details

  • Damping factor: α = 0.01, 0.1
  • Precision threshold: ε = 10⁻⁶
  • Random selection of 5 source nodes for testing
  • Python implementation with numba acceleration

Experimental Results

Main Results

Large-scale Graph Performance

On four large-scale graphs (ogb-mag240m, ogbn-papers100M, com-friendster, wiki-en21):

  • AESP-LOCAPPR performs best, benefiting from online coordinate updates
  • Early-stage Acceleration: AESP methods achieve faster convergence than baseline methods in early stages
  • Operation Reduction: 2-10x fewer operations compared to APPR

α Dependency Analysis

When α is small, AESP significantly accelerates standard local methods:

  • Testing in range α ∈ 10⁻³, 10⁻¹
  • Acceleration ratio increases as α decreases, validating √α dependency
  • Both running time and operation count are significantly reduced

Initialization Strategy Comparison

Comparison of three initialization strategies for z_t^(0):

  1. Cold Start: z_t^(0) = 0
  2. Previous Estimate: z_t^(0) = x^(t-1)
  3. Momentum Initialization: z_t^(0) = y^(t-1) (recommended)

Momentum initialization strategy performs best, requiring the fewest outer loop iterations.

Ablation Studies

Constant R Analysis

  • R remains a small constant (1.0-1.4) across 19 graphs
  • R is insensitive to graph size and condition number
  • Validates the reasonableness of bounded R assumption in theoretical analysis

vol(S_t)/γ_t Ratio Analysis

Experiments verify the theoretical bound vol(St)/γtmin{Ct0/εt,2m}\text{vol}(S_t)/γ_t ≤ \min\{C_t^0/ε_t, 2m\}.

PPR Computation Methods

  • Iterative Methods: Conjugate gradient, Chebyshev methods, complexity O(m/√α)
  • Local Methods: APPR (O(1/(αε))), variational methods (O(1/((α+μ²)ε)))
  • Acceleration Attempts: FISTA, linear coupling destroy monotonicity, cannot guarantee locality

Evolving Set Processes

  • Concept proposed by Morris and Peres for analyzing mixing times
  • Zhou et al.'s localized Chebyshev method, but lacking convergence guarantees

Accelerated Optimization

  • Catalyst framework: inexact accelerated proximal point methods
  • This paper adapts it to local methods while preserving monotonicity

Conclusions and Discussion

Main Conclusions

  1. Theoretical Breakthrough: First to achieve provably local acceleration for PPR computation, improving time complexity from O(1/α) to O(1/√α)
  2. Practicality: When ε ≥ 1/√m, algorithm complexity is independent of graph size
  3. Generality: Framework extends to variational forms and other related problems

Limitations

  1. Constant R Bound: Cannot universally bound R by graph size or input parameters
  2. ε² Dependency: When ε < 1/√m, local bound degrades to O(m/√α)
  3. Theory-Practice Gap: Conservative estimates of ε_t may lead to suboptimal bounds

Future Directions

  1. Improved Bounds: Explore whether conjectured O(1/(√αε)) complexity is achievable
  2. Eliminate R Dependency: Remove constant R through constraints or adaptive restarts
  3. Extended Applications: Apply techniques to bidirectional PPR, single-source PPR estimation, etc.

In-Depth Evaluation

Strengths

  1. Significant Theoretical Contribution: Resolves open conjecture in literature, establishes rigorous theoretical guarantees
  2. Strong Method Innovation: Cleverly combines Catalyst framework with local evolving set processes
  3. Comprehensive Experiments: Validation on 19 graphs of varying scales, from small to ultra-large
  4. Clear Presentation: Rigorous mathematical derivations, detailed algorithm descriptions

Weaknesses

  1. Practical Limitations: Advantages not apparent when ε is very small; R bounding issue affects practical application
  2. Implementation Complexity: Nested structure and parameter tuning increase implementation difficulty
  3. Incomplete Comparisons: Lacks detailed comparison with latest acceleration methods

Impact

  1. Academic Value: Provides new insights for graph algorithm acceleration, significant theoretical importance
  2. Practical Value: Potential applications in large-scale graph analysis, recommendation systems, etc.
  3. Reproducibility: Code publicly available, experimental setup detailed

Applicable Scenarios

  1. Large-scale Graph Analysis: Especially when precision requirements are not extreme (ε ≥ 1/√m)
  2. Real-time Recommendation Systems: Requires fast computation of local importance scores
  3. Graph Neural Networks: As preprocessing step for computing node importance

References

This paper cites 52 relevant references covering multiple domains including PPR computation, accelerated optimization, and local algorithms, providing solid theoretical foundation for the research.


Overall Assessment: This is a high-quality paper balancing theory and practice, achieving significant progress on the important problem of PPR computation acceleration. Despite some theoretical limitations, its innovation and practicality make it an important contribution to the field.