2025-11-23T22:46:17.287043

Beyond Single-Granularity Prompts: A Multi-Scale Chain-of-Thought Prompt Learning for Graph

Zheng, Yang, Guan et al.

The "pre-train, prompt'' paradigm, designed to bridge the gap between pre-training tasks and downstream objectives, has been extended from the NLP domain to the graph domain and has achieved remarkable progress. Current mainstream graph prompt-tuning methods modify input or output features using learnable prompt vectors. However, existing approaches are confined to single-granularity (e.g., node-level or subgraph-level) during prompt generation, overlooking the inherently multi-scale structural information in graph data, which limits the diversity of prompt semantics. To address this issue, we pioneer the integration of multi-scale information into graph prompt and propose a Multi-Scale Graph Chain-of-Thought (MSGCOT) prompting framework. Specifically, we design a lightweight, low-rank coarsening network to efficiently capture multi-scale structural features as hierarchical basis vectors for prompt generation. Subsequently, mimicking human cognition from coarse-to-fine granularity, we dynamically integrate multi-scale information at each reasoning step, forming a progressive coarse-to-fine prompt chain. Extensive experiments on eight benchmark datasets demonstrate that MSGCOT outperforms the state-of-the-art single-granularity graph prompt-tuning method, particularly in few-shot scenarios, showcasing superior performance.

academic

Beyond Single-Granularity Prompts: A Multi-Scale Chain-of-Thought Prompt Learning for Graph

Basic Information

Paper ID: 2510.09394
Title: Higher-order interactions of multi-layer prompt (Beyond Single-Granularity Prompts: A Multi-Scale Chain-of-Thought Prompt Learning for Graph)
Authors: Ziyu Zheng, Yaming Yang, Ziyu Guan, Wei Zhao, Xinyan Huang, Weigang Lu
Classification: cs.CL, cs.AI
Publication Time/Conference: Conference acronym 'XX, June 03–05, 2018, Woodstock, NY (Forthcoming)
Paper Link: https://arxiv.org/abs/2510.09394

Abstract

The "pretraining-prompting" paradigm aims to bridge the gap between pretraining tasks and downstream objectives, and has been successfully extended from the NLP domain to the graph domain with significant progress. Current mainstream graph prompt tuning methods employ learnable prompt vectors to modify input or output features. However, existing approaches are constrained to single-granularity prompts during generation (e.g., node-level or subgraph-level), overlooking the inherent multi-scale structural information in graph data, which limits the semantic diversity of prompts. To address this issue, this paper for the first time integrates multi-scale information into graph prompts, proposing a Multi-Scale Graph Chain-of-Thought (MSGCOT) prompt framework. Specifically, we design a lightweight low-rank coarsening network to efficiently capture multi-scale structural features as hierarchical basis vectors for prompt generation. Subsequently, simulating the human cognitive process from coarse to fine granularity, we dynamically integrate multi-scale information at each reasoning step, forming a progressive coarse-to-fine prompt chain. Extensive experiments on eight benchmark datasets demonstrate that MSGCOT surpasses state-of-the-art single-granularity graph prompt tuning methods, exhibiting superior performance particularly in few-shot scenarios.

Research Background and Motivation

Problem Definition

The core problem addressed in this research is the single-granularity limitation in prompt generation processes of existing graph neural network prompt learning methods. Specifically manifested as:

Single-granularity constraint: Existing methods (e.g., GPF+, GCOT) focus only on information at a single level (node-level, edge-level, or subgraph-level), overlooking the co-existing multi-scale information from nodes to hierarchical subgraphs in graph data
Insufficient semantic diversity: Single-granularity prompt generation limits the expressiveness and semantic richness of prompts
Underutilization of structural information: Failure to fully leverage the hierarchical structural features inherent in graph data

Importance Analysis

The significance of this problem is reflected in:

Practical application demands: Real-world graph data (social networks, molecular graphs, recommendation systems, etc.) inherently contain multi-level structural information
Performance improvement potential: Effective utilization of multi-scale information can significantly enhance model generalization in few-shot learning scenarios
Theoretical completeness: Fills the gap in graph prompt learning theoretical frameworks regarding multi-granularity information modeling

Limitations of Existing Methods

GCOT: Although employing multi-step prompt generation, each step remains constrained to node-level granularity
Single-step prompt methods (GPF+, EdgePrompt, etc.): Generate complete prompts directly, lacking progressive optimization mechanisms
Pretraining-dependent methods: Require specific pretraining strategies with limited generalizability

Core Contributions

First multi-scale graph chain-of-thought framework: Proposes the first graph chain-of-thought prompt learning framework integrating multi-granularity information, breaking through the single-granularity limitation of existing methods
Innovative cognitive simulation mechanism: Designs low-rank coarsening networks for multi-scale feature extraction and proposes a backtracking prompt mechanism enabling progressive prompt generation, simulating human coarse-to-fine cognitive processes
Lightweight efficient design: Significantly reduces parameter count (47.1%-85.7% reduction compared to GCOT) through low-rank decomposition while maintaining superior performance
Comprehensive experimental validation: Achieves optimal performance on 8 benchmark datasets across node classification and graph classification tasks, with particularly pronounced advantages in few-shot scenarios

Methodology Details

Task Definition

Input: Graph $G = (V, E)$ , where $V$ is the node set, $E$ is the edge set, node feature matrix $X \in \mathbb{R}^{N \times F}$ , adjacency matrix $A \in \mathbb{R}^{N \times N}$

Output: Optimized representations for downstream tasks (node classification/graph classification)

Constraint: Pretraining model parameters are frozen; only lightweight prompt parameters are updated

Model Architecture

1. Overall Framework

The MSGCOT framework comprises three core modules:

Node-level prompt generation: Generates task-specific node prompt vectors
Multi-scale thought construction: Constructs hierarchical representations through coarsening networks
Coarse-to-fine backtracking prompts: Progressive multi-scale prompt integration

2. Node-level Prompt Generation

P_x = \text{CONDNET}(H)

\hat{H} = \text{GNN}(X \odot P_x, A)

where $H$ is the pretraining embedding, $P_x$ is the node-level prompt, and $\hat{H}$ is the prompted embedding.

3. Multi-scale Thought Construction

Employs low-rank decomposition to design lightweight coarsening networks:

S^l = \text{Softmax}(W_{up}^l(\sigma(W_{down}^{lT} T^{l-1})))

T^l = S^{lT} T^{l-1}

where $W_{down} \in \mathbb{R}^{d \times r}$ , $W_{up} \in \mathbb{R}^{r \times C_l}$ ( $r \ll d$ ), and $T^l$ is the $l$ -th layer coarsened representation.

4. Coarse-to-Fine Backtracking Prompt Mechanism

p_i^{l+1} = \sum_{j=1}^{C_l} \alpha_{ij}^{l+1} t_j^l

\alpha_{ij}^{l+1} = \frac{\exp(t_j^l \hat{h}_i^l)}{\sum_k \exp(t_k^l \hat{h}_i^l)}

\hat{h}_i^{l+1} = \hat{h}_i^l + p_i^{l+1}

Technical Innovations

1. Low-rank Coarsening Network Design

Parameter efficiency: Reduces parameter count from $O(d \times C_l)$ to $O(d \times r + r \times C_l)$ through low-rank decomposition
Multi-scale capture: Progressive coarsening generates structural representations at different granularities
Task adaptability: Learnable assignment matrices adapt to different downstream tasks

2. Cognition-inspired Prompt Chain

Simulates human cognition: Progressive understanding from global topology to local details
Structured reasoning: Employs hierarchical coarsened representations as "structured reasoning" replacing text templates
Dynamic integration: Dynamically selects and integrates information at different granularities at each step

3. Constraint Mechanisms

Introduces cosine reconstruction loss to prevent node information loss:

L_r = \frac{1}{N}(1 - \frac{\hat{h}_i \cdot h_i}{||\hat{h}_i|| \cdot ||h_i||})^\gamma

Experimental Setup

Datasets

Node Classification:

Cora (2,708 nodes, 7 classes)
Citeseer (3,327 nodes, 6 classes)
Pubmed (19,717 nodes, 3 classes)
Photo (7,650 nodes, 8 classes)

Graph Classification:

MUTAG (188 graphs, molecular compounds)
COX2 (467 graphs, cyclooxygenase inhibitors)
BZR (405 graphs, benzodiazepine receptor ligands)
PROTEINS (1,113 graphs, protein structures)

Evaluation Metrics

Accuracy: Standard evaluation metric for classification tasks
Statistical significance: Mean and variance across 100 random samplings

Baseline Methods

Supervised learning: GCN, GAT
Pretraining + fine-tuning: LP, GraphCL, DGI/InfoGraph
Pretraining + prompting:
- Single-step: All-in-One, GPF+, SUPT, GraphPrompt, EdgePrompt+, DAGPrompT
- Multi-step: GCOT

Implementation Details

Backbone network: GCN (hidden dimension 256)
Coarsening layers: 2 layers
Coarsening ratios: {0.01, 0.1, 0.2, 0.3}
Low-rank dimension: r=8 for node tasks, r=1 for graph tasks
Constraint weight: α=1 for node classification, α=0 for graph classification

Experimental Results

Main Results

Single-sample Classification Performance

MSGCOT achieves optimal performance across all 8 datasets:

Node Classification:

Cora: 62.13% (vs GCOT 59.54%, +4.35%)
Citeseer: 49.05% (vs GCOT 48.13%, +1.91%)
Pubmed: 64.67% (vs GCOT 63.38%, +2.04%)
Photo: 68.01% (vs GCOT 66.98%, +1.54%)

Graph Classification:

MUTAG: 63.54% (vs GCOT 60.34%, +5.30%)
COX2: 73.62% (vs DAGPrompt 55.00%, +33.85%)
BZR: 69.85% (vs DAGPrompt 55.49%, +25.87%)
PROTEINS: 57.83% (vs DAGPrompt 56.22%, +2.86%)

Few-shot Learning Performance

Under 1-3 sample settings, MSGCOT surpasses baseline methods by 5-8% on average, demonstrating superior few-shot generalization capability.

Ablation Study

Systematic ablation experiments validate the contribution of each component:

Multi-scale prompts (MSP): Average decline of 5.52% for node tasks and 17.7% for graph tasks upon removal
Reconstruction loss (RE): Significant impact on node classification; graph classification focuses on global information
Backtracking mechanism (TB): Particularly critical for graph classification; unidirectional prompts result in 12-15% performance degradation
Incremental updates (IU): Progressive updates yield 2-5% performance improvement

Parameter Efficiency Analysis

Significant parameter reduction compared to GCOT:

Node classification: 47.1%-68.3% parameter reduction
Graph classification: 29.1%-85.7% parameter reduction
Time efficiency: Average training time per epoch reduced by 34.8% for graph classification tasks

Hyperparameter Sensitivity

Coarsening ratio: Optimal range 0.1-0.3 for node tasks; stable within 0.05-0.3 for graph tasks
Coarsening layers: Optimal at 2 layers for node tasks; supports deeper layers for graph tasks
Hidden dimension: Optimal at r=8 for node tasks; already exhibits superior performance at r=1 for graph tasks

Graph Pretraining

Contrastive learning: GraphCL, DGI, etc. learn representations through positive-negative sample contrasts
Generative learning: Pretraining through node feature or graph structure reconstruction
Limitations: Gap between pretraining objectives and downstream tasks limits performance

Graph Prompt Learning

Pretraining-dependent methods: GPPT, GraphPrompt, All-in-One
Pretraining-independent methods: GPF+, SUPT, EdgePrompt
Multi-step prompting: GCOT introduces chain-of-thought concepts but remains constrained to single-granularity

Graph Coarsening Techniques

Traditional methods: Spectral clustering, non-negative matrix factorization
Learnable methods: DiffPool and similar approaches achieve hierarchical representations through learnable assignment matrices
This work's contribution: Combines graph coarsening with prompt learning to enable multi-scale prompt generation

Conclusions and Discussion

Main Conclusions

Importance of multi-scale information: Experiments demonstrate that multi-scale structural information is crucial for graph prompt learning
Effectiveness of cognition-inspired design: Simulating human coarse-to-fine cognitive processes significantly enhances performance
Balance between parameter efficiency and performance: Low-rank design maintains superior performance while substantially reducing parameters
Few-shot learning advantages: Multi-scale prompts demonstrate particularly outstanding performance in data-scarce scenarios

Limitations

Computational complexity: Multi-step reasoning introduces certain computational overhead
Hyperparameter sensitivity: Coarsening ratios and layer numbers require task-specific tuning
Insufficient theoretical analysis: Lacks theoretical guarantees for multi-scale prompt effectiveness

Future Directions

Adaptive coarsening strategies: Investigate task-adaptive coarsening mechanisms
Theoretical analysis: Establish theoretical frameworks for multi-scale prompt learning
Extended applications: Explore application potential in more graph learning tasks

In-depth Evaluation

Strengths

Strong novelty: First systematic integration of multi-scale information into graph prompt learning
Reasonable design: Ingenious design of low-rank coarsening networks and backtracking mechanisms, balancing efficiency and effectiveness
Comprehensive experiments: 8 datasets, multiple baseline methods, detailed ablation studies
High practical value: Pronounced advantages in few-shot scenarios, aligning with practical application needs

Weaknesses

Weak theoretical foundation: Lacks theoretical analysis and guarantees for method effectiveness
Insufficient computational overhead analysis: While complexity analysis is provided, actual runtime comparisons are limited
Inadequate applicability discussion: Insufficient analysis of applicability to different types of graph data

Impact

Academic contribution: Provides new research directions for the graph prompt learning field
Practical value: Important application value in resource-constrained few-shot learning scenarios
Reproducibility: Provides detailed implementation details and hyperparameter settings

Applicable Scenarios

Few-shot graph learning: Graph analysis tasks with scarce labeled data
Multi-scale graph analysis: Applications requiring capture of multi-level structural information
Resource-constrained environments: Deployment scenarios with parameter efficiency requirements

References

This paper cites 38 relevant references covering multiple related domains including graph neural networks, graph pretraining, prompt learning, and graph coarsening, providing a solid theoretical foundation for the research.

Overall Assessment: This is a high-quality graph neural network prompt learning paper that innovatively addresses the single-granularity limitation of existing methods. The method design is sound, experimental validation is comprehensive, and it holds important significance in both theoretical contribution and practical value. Although there remains room for improvement in theoretical analysis, the paper makes important contributions to the graph prompt learning field overall.