The development of self-supervised graph pre-training methods is a crucial ingredient in recent efforts to design robust graph foundation models (GFMs). Structure-based pre-training methods are under-explored yet crucial for downstream applications which rely on underlying graph structure. In addition, pre-training traditional message passing GNNs to capture global and regional structure is often challenging due to the risk of oversmoothing as network depth increases. We address these gaps by proposing the Laplacian Eigenvector Learning Module (LELM), a novel pre-training module for graph neural networks (GNNs) based on predicting the low-frequency eigenvectors of the graph Laplacian. Moreover, LELM introduces a novel architecture that overcomes oversmoothing, allowing the GNN model to learn long-range interdependencies. Empirically, we show that models pre-trained via our framework outperform baseline models on downstream molecular property prediction tasks.
- Paper ID: 2509.02803
- Title: A Graph Laplacian Eigenvector-based Pre-training Method for Graph Neural Networks
- Authors: Howard Dai, Nyambura Njenga, Hiren Madhu, Siddharth Viswanath, Ryan Pellico, Ian Adelstein, Smita Krishnaswamy
- Classification: cs.LG (Machine Learning)
- Publication Date: October 11, 2025 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2509.02803v2
This paper proposes a pre-training method for Graph Neural Networks (GNNs) based on graph Laplacian eigenvectors. Addressing the insufficiency of structured pre-training methods in graph foundation models (GFMs), the authors develop the Laplacian Eigenvector Learning Module (LELM), which performs pre-training by predicting the low-frequency eigenvectors of the graph Laplacian. The method introduces novel architectural designs that overcome the over-smoothing problem, enabling GNN models to learn long-range dependencies. Experiments demonstrate that models pre-trained using this framework outperform baseline models on molecular property prediction tasks.
- Insufficient Structured Pre-training Methods: Existing GNN pre-training approaches primarily rely on feature reconstruction and contrastive learning, while pre-training methods based on graph structural properties remain relatively unexplored.
- Over-smoothing Problem: Traditional message-passing GNNs face challenges in capturing global and regional structures, and tend to exhibit over-smoothing phenomena as network depth increases.
- Difficulty in Learning Long-range Dependencies: Existing GNN architectures have limited expressiveness in learning long-range interdependencies within graphs.
- Development of graph foundation models requires effective self-supervised pre-training tasks
- Structure-aware downstream applications require pre-training methods capable of capturing underlying graph structures
- Applications such as molecular property prediction depend on understanding global graph structures
- Contrastive Methods: Primarily use Jensen-Shannon estimators or InfoNCE objectives, lacking direct modeling of structural information
- Prediction Methods: Mostly focus on graph reconstruction tasks, with few methods based on graph property prediction
- Structural Representation Capacity: Existing methods struggle to effectively capture global graph structure information
- Proposes LELM Framework: The first method using graph Laplacian eigenvectors as pre-training targets
- Innovative Architectural Design: Introduces graph-level MLP heads enabling GNNs to capture large-scale structures without requiring excessively deep networks
- Node Feature Enhancement: Proposes enhanced node features based on graph diffusion operators, overcoming GNN expressiveness limitations
- Experimental Validation: Demonstrates method effectiveness on molecular datasets, serving as both a standalone pre-training method and a plug-in for existing pipelines
Given a graph G=(V,E), the objective is to pre-train a GNN model to predict the k lowest-frequency eigenvectors ψ1,ψ2,…,ψk of the graph Laplacian matrix L=D−A, where Lψi=λiψi.
The LELM framework comprises three core components:
Wavelet Positional Encoding: Encodes relative position information between nodes
- Randomly select two nodes i,j and construct Dirac signals δi,δj
- Apply wavelet operator Ψk=P2j−1−P2j, where P=D−1A is the diffusion operator
- Wavelet positional encoding for node m: wm=[wm,1…wm,J]
Diffusion Dirac Encoding: Encodes local connectivity structure
- For each node m, compute dm,k=Ψk(m,⋅)P(m,⋅)T
- Diffusion Dirac encoding: dm=[dm,1…dm,J]
- Base GNN: Processes the enhanced feature graph to generate node representations
- Graph-level Aggregation: Concatenates all node representations into a graph-level vector Z=[z1,…,zn]∈Rnd
- MLP Prediction Head: U~=MLP(Z) outputs predicted eigenvectors
Orthogonality constraints are imposed via QR decomposition: U^=QR(U~)
Loss Function:
- Energy Loss: Lenergy=k1∑i=1ku^iTLu^i
- Eigenvector Loss: Leigvec=k1∑i=1k∥Lu^i−λiu^i∥
- Total Loss: L=α⋅Lenergy+β⋅Leigvec
- Graph-level MLP Design: Avoids the problem of node-level MLPs failing to learn long-range interactions
- Eigenvector Objectives: Low-frequency Laplacian eigenvectors naturally encode global, regional, and local graph structures
- Diffusion Operator Enhancement: Provides structural context information, enhancing GNN expressiveness
- Dual Loss Mechanism: Energy loss ensures subspace correctness, eigenvector loss ensures proper ordering
- ZINC-12k: 12,000 molecular graphs
- ZINC-250k: 250,000 molecular graphs
- QM9: 134,000 molecular graphs with multiple quantum chemical properties
- MAE (Mean Absolute Error): Primary evaluation metric
- ROC-AUC: Used for binary classification tasks
- Baseline Models: Untrained GIN and GPS models
- Alternative Pre-training Targets: Node degree, local clustering coefficient, cycle counting, Laplacian eigenvalues
- Existing Pre-training Methods: ContextPred, Masking, etc.
- Pre-training Epochs: 100-200 rounds
- Fine-tuning Epochs: 150-500 rounds
- Number of Eigenvectors: k=6
- Loss Weights: α=2,β=1 (main experiments)
- Optimizer: Adam
- Learning Rate: 0.001
Performance Comparison on ZINC and QM9 Datasets:
| Model | ZINC full | ZINC subset | QM9 μ | QM9 α | QM9 εHOMO |
|---|
| GIN + LELM | 0.130 | 0.353 | 0.484 | 0.489 | 0.00353 |
| GIN (baseline) | 0.228 | 0.438 | 0.472 | 1.132 | 0.00386 |
| GPS + LELM | 0.104 | 0.210 | 0.502 | 0.592 | 0.00372 |
| GPS (baseline) | 0.150 | 0.358 | 0.413 | 0.718 | 0.00434 |
LELM significantly improves performance on most tasks, with particularly notable improvements on the ZINC dataset.
Graph-level MLP vs Node-level MLP:
| Model | ZINC full | ZINC subset |
|---|
| GIN + LELM (graph-level) | 0.130 | 0.353 |
| GIN + LELM (node-level) | 0.152 | 0.435 |
| GPS + LELM (graph-level) | 0.104 | 0.210 |
| GPS + LELM (node-level) | 0.126 | 0.261 |
Graph-level MLP significantly outperforms node-level MLP across both architectures.
Comparison of Alternative Structural Pre-training Targets:
| Pre-training Target | ZINC full | ZINC subset |
|---|
| LELM | 0.130 | 0.353 |
| Node Degree | 0.238 | 0.471 |
| Local Clustering Coefficient | 1.493 | 1.551 |
| Cycle Counting | 0.285 | 0.420 |
| Laplacian Eigenvalues | 0.250 | 0.520 |
LELM clearly outperforms other structured pre-training targets.
Adding LELM as a plug-in to existing pre-training pipelines on molecular prediction tasks:
- Masking + LELM: Shows improvements across all 5 datasets
- ContextPred + LELM: Shows improvements on most tasks
- Importance of Graph-level Architecture: Graph-level MLP effectively learns long-range dependencies
- Superiority of Eigenvectors: Laplacian eigenvectors are more suitable for pre-training than other structural targets
- Generalizability: LELM can be combined with existing pre-training methods
- Scalability: The method applies to different GNN architectures (GIN, GPS)
- Contrastive Methods:
- Graph-node contrast (Deep Graph Infomax, etc.)
- Subgraph-node contrast (InfoGraph, etc.)
- Subgraph-subgraph contrast (GraphCL, etc.)
- Prediction Methods:
- Graph reconstruction (node/edge masking, autoencoders)
- Property prediction (k-hop connectivity, meta-paths)
- Positional Encoding: Standard positional encoding in graph Transformers
- Spectral Graph Neural Networks: Learning filters in the signal domain
- Spectral Clustering: Generating low-dimensional embeddings for clustering
- Graph Partitioning: Fiedler vector generating optimal graph cuts
LELM is the first property prediction method using graph Laplacian eigenvectors as pre-training targets, filling a gap in structured pre-training methods.
- Effectiveness Validation: LELM significantly improves GNN performance on molecular property prediction tasks
- Architectural Innovation: Graph-level MLP effectively addresses the over-smoothing problem
- Universal Framework: Can serve as both a standalone method and an enhancement component for existing pipelines
- Theoretical Guarantees: Loss function possesses necessary sign and basis invariance properties
- Unexplored Transfer Learning Capability: Currently validated only on same or related domain datasets
- Computational Complexity: Requires Laplacian eigendecomposition, which may be challenging for large graphs
- Cross-domain Generalization: Effects on synthetic or cross-domain datasets remain unknown
- Statistical Significance: Error bars not reported due to computational cost constraints
- Cross-domain Pre-training: Explore pre-training effects on synthetic or cross-domain datasets
- Large-scale Applications: Investigate scalability on larger-scale graphs
- Theoretical Analysis: Provide deeper analysis of why Laplacian eigenvectors are good pre-training targets
- Architecture Optimization: Further optimize graph-level MLP design
- Strong Novelty: First application of Laplacian eigenvectors to GNN pre-training with innovative approach
- Solid Theoretical Foundation: Laplacian eigenvectors have deep theoretical foundations in graph theory
- Clever Architectural Design: Graph-level MLP effectively addresses long-range dependency learning
- Comprehensive Experiments: Includes multiple comparative, ablation, and enhancement experiments
- Good Generalizability: Compatible with different GNN architectures and existing pre-training methods
- Limited Application Domains: Primarily validated on molecular data; effects on other graph types unknown
- Computational Overhead: Eigendecomposition computational cost may limit large-scale applications
- Hyperparameter Sensitivity: Lack of systematic analysis for hyperparameter selection (e.g., loss weights)
- Insufficient Theoretical Explanation: Lacks deep theoretical analysis of why the method is effective
- Academic Value: Provides new research directions for graph pre-training
- Practical Value: Potential value in practical applications such as molecular property prediction
- Reproducibility: Complete code and experimental settings provided
- Inspirational Value: May inspire more pre-training methods based on graph spectral properties
- Molecular Property Prediction: Validated effective application scenario
- Social Network Analysis: Tasks requiring understanding of global structure
- Knowledge Graphs: Graph reasoning tasks where structural information is important
- Biological Networks: Biological applications such as protein interaction networks
The paper cites multiple important related works, including:
- Hu et al. (2019): "Strategies for pre-training graph neural networks" - Classical work on graph pre-training
- Shaham et al. (2018): "SpectralNet" - Neural network approach to spectral clustering
- Dwivedi et al. (2021): "Graph neural networks with learnable structural and positional representations" - Structural positional representation learning
- Rampášek et al. (2022): "Recipe for a general, powerful, scalable graph transformer" - GPS architecture
Overall Assessment: This is a high-quality research paper proposing an innovative graph neural network pre-training method. While there is room for improvement in certain aspects, its core ideas are novel, experimental validation is comprehensive, and it makes important contributions to the graph pre-training field. The method's generalizability and extensibility demonstrate good application prospects.