2025-11-23T22:46:17.287043

Beyond Single-Granularity Prompts: A Multi-Scale Chain-of-Thought Prompt Learning for Graph

Zheng, Yang, Guan et al.
The "pre-train, prompt'' paradigm, designed to bridge the gap between pre-training tasks and downstream objectives, has been extended from the NLP domain to the graph domain and has achieved remarkable progress. Current mainstream graph prompt-tuning methods modify input or output features using learnable prompt vectors. However, existing approaches are confined to single-granularity (e.g., node-level or subgraph-level) during prompt generation, overlooking the inherently multi-scale structural information in graph data, which limits the diversity of prompt semantics. To address this issue, we pioneer the integration of multi-scale information into graph prompt and propose a Multi-Scale Graph Chain-of-Thought (MSGCOT) prompting framework. Specifically, we design a lightweight, low-rank coarsening network to efficiently capture multi-scale structural features as hierarchical basis vectors for prompt generation. Subsequently, mimicking human cognition from coarse-to-fine granularity, we dynamically integrate multi-scale information at each reasoning step, forming a progressive coarse-to-fine prompt chain. Extensive experiments on eight benchmark datasets demonstrate that MSGCOT outperforms the state-of-the-art single-granularity graph prompt-tuning method, particularly in few-shot scenarios, showcasing superior performance.
academic

Beyond Single-Granularity Prompts: A Multi-Scale Chain-of-Thought Prompt Learning for Graph

Basic Information

  • Paper ID: 2510.09394
  • Title: Higher-order interactions of multi-layer prompt (Beyond Single-Granularity Prompts: A Multi-Scale Chain-of-Thought Prompt Learning for Graph)
  • Authors: Ziyu Zheng, Yaming Yang, Ziyu Guan, Wei Zhao, Xinyan Huang, Weigang Lu
  • Classification: cs.CL, cs.AI
  • Publication Time/Conference: Conference acronym 'XX, June 03–05, 2018, Woodstock, NY (Forthcoming)
  • Paper Link: https://arxiv.org/abs/2510.09394

Abstract

The "pretraining-prompting" paradigm aims to bridge the gap between pretraining tasks and downstream objectives, and has been successfully extended from the NLP domain to the graph domain with significant progress. Current mainstream graph prompt tuning methods employ learnable prompt vectors to modify input or output features. However, existing approaches are constrained to single-granularity prompts during generation (e.g., node-level or subgraph-level), overlooking the inherent multi-scale structural information in graph data, which limits the semantic diversity of prompts. To address this issue, this paper for the first time integrates multi-scale information into graph prompts, proposing a Multi-Scale Graph Chain-of-Thought (MSGCOT) prompt framework. Specifically, we design a lightweight low-rank coarsening network to efficiently capture multi-scale structural features as hierarchical basis vectors for prompt generation. Subsequently, simulating the human cognitive process from coarse to fine granularity, we dynamically integrate multi-scale information at each reasoning step, forming a progressive coarse-to-fine prompt chain. Extensive experiments on eight benchmark datasets demonstrate that MSGCOT surpasses state-of-the-art single-granularity graph prompt tuning methods, exhibiting superior performance particularly in few-shot scenarios.

Research Background and Motivation

Problem Definition

The core problem addressed in this research is the single-granularity limitation in prompt generation processes of existing graph neural network prompt learning methods. Specifically manifested as:

  1. Single-granularity constraint: Existing methods (e.g., GPF+, GCOT) focus only on information at a single level (node-level, edge-level, or subgraph-level), overlooking the co-existing multi-scale information from nodes to hierarchical subgraphs in graph data
  2. Insufficient semantic diversity: Single-granularity prompt generation limits the expressiveness and semantic richness of prompts
  3. Underutilization of structural information: Failure to fully leverage the hierarchical structural features inherent in graph data

Importance Analysis

The significance of this problem is reflected in:

  1. Practical application demands: Real-world graph data (social networks, molecular graphs, recommendation systems, etc.) inherently contain multi-level structural information
  2. Performance improvement potential: Effective utilization of multi-scale information can significantly enhance model generalization in few-shot learning scenarios
  3. Theoretical completeness: Fills the gap in graph prompt learning theoretical frameworks regarding multi-granularity information modeling

Limitations of Existing Methods

  1. GCOT: Although employing multi-step prompt generation, each step remains constrained to node-level granularity
  2. Single-step prompt methods (GPF+, EdgePrompt, etc.): Generate complete prompts directly, lacking progressive optimization mechanisms
  3. Pretraining-dependent methods: Require specific pretraining strategies with limited generalizability

Core Contributions

  1. First multi-scale graph chain-of-thought framework: Proposes the first graph chain-of-thought prompt learning framework integrating multi-granularity information, breaking through the single-granularity limitation of existing methods
  2. Innovative cognitive simulation mechanism: Designs low-rank coarsening networks for multi-scale feature extraction and proposes a backtracking prompt mechanism enabling progressive prompt generation, simulating human coarse-to-fine cognitive processes
  3. Lightweight efficient design: Significantly reduces parameter count (47.1%-85.7% reduction compared to GCOT) through low-rank decomposition while maintaining superior performance
  4. Comprehensive experimental validation: Achieves optimal performance on 8 benchmark datasets across node classification and graph classification tasks, with particularly pronounced advantages in few-shot scenarios

Methodology Details

Task Definition

Input: Graph G=(V,E)G = (V, E), where VV is the node set, EE is the edge set, node feature matrix XRN×FX \in \mathbb{R}^{N \times F}, adjacency matrix ARN×NA \in \mathbb{R}^{N \times N}

Output: Optimized representations for downstream tasks (node classification/graph classification)

Constraint: Pretraining model parameters are frozen; only lightweight prompt parameters are updated

Model Architecture

1. Overall Framework

The MSGCOT framework comprises three core modules:

  • Node-level prompt generation: Generates task-specific node prompt vectors
  • Multi-scale thought construction: Constructs hierarchical representations through coarsening networks
  • Coarse-to-fine backtracking prompts: Progressive multi-scale prompt integration

2. Node-level Prompt Generation

Px=CONDNET(H)P_x = \text{CONDNET}(H) H^=GNN(XPx,A)\hat{H} = \text{GNN}(X \odot P_x, A)

where HH is the pretraining embedding, PxP_x is the node-level prompt, and H^\hat{H} is the prompted embedding.

3. Multi-scale Thought Construction

Employs low-rank decomposition to design lightweight coarsening networks:

Sl=Softmax(Wupl(σ(WdownlTTl1)))S^l = \text{Softmax}(W_{up}^l(\sigma(W_{down}^{lT} T^{l-1}))) Tl=SlTTl1T^l = S^{lT} T^{l-1}

where WdownRd×rW_{down} \in \mathbb{R}^{d \times r}, WupRr×ClW_{up} \in \mathbb{R}^{r \times C_l} (rdr \ll d), and TlT^l is the ll-th layer coarsened representation.

4. Coarse-to-Fine Backtracking Prompt Mechanism

pil+1=j=1Clαijl+1tjlp_i^{l+1} = \sum_{j=1}^{C_l} \alpha_{ij}^{l+1} t_j^l αijl+1=exp(tjlh^il)kexp(tklh^il)\alpha_{ij}^{l+1} = \frac{\exp(t_j^l \hat{h}_i^l)}{\sum_k \exp(t_k^l \hat{h}_i^l)} h^il+1=h^il+pil+1\hat{h}_i^{l+1} = \hat{h}_i^l + p_i^{l+1}

Technical Innovations

1. Low-rank Coarsening Network Design

  • Parameter efficiency: Reduces parameter count from O(d×Cl)O(d \times C_l) to O(d×r+r×Cl)O(d \times r + r \times C_l) through low-rank decomposition
  • Multi-scale capture: Progressive coarsening generates structural representations at different granularities
  • Task adaptability: Learnable assignment matrices adapt to different downstream tasks

2. Cognition-inspired Prompt Chain

  • Simulates human cognition: Progressive understanding from global topology to local details
  • Structured reasoning: Employs hierarchical coarsened representations as "structured reasoning" replacing text templates
  • Dynamic integration: Dynamically selects and integrates information at different granularities at each step

3. Constraint Mechanisms

Introduces cosine reconstruction loss to prevent node information loss:

Lr=1N(1h^ihih^ihi)γL_r = \frac{1}{N}(1 - \frac{\hat{h}_i \cdot h_i}{||\hat{h}_i|| \cdot ||h_i||})^\gamma

Experimental Setup

Datasets

Node Classification:

  • Cora (2,708 nodes, 7 classes)
  • Citeseer (3,327 nodes, 6 classes)
  • Pubmed (19,717 nodes, 3 classes)
  • Photo (7,650 nodes, 8 classes)

Graph Classification:

  • MUTAG (188 graphs, molecular compounds)
  • COX2 (467 graphs, cyclooxygenase inhibitors)
  • BZR (405 graphs, benzodiazepine receptor ligands)
  • PROTEINS (1,113 graphs, protein structures)

Evaluation Metrics

  • Accuracy: Standard evaluation metric for classification tasks
  • Statistical significance: Mean and variance across 100 random samplings

Baseline Methods

  1. Supervised learning: GCN, GAT
  2. Pretraining + fine-tuning: LP, GraphCL, DGI/InfoGraph
  3. Pretraining + prompting:
    • Single-step: All-in-One, GPF+, SUPT, GraphPrompt, EdgePrompt+, DAGPrompT
    • Multi-step: GCOT

Implementation Details

  • Backbone network: GCN (hidden dimension 256)
  • Coarsening layers: 2 layers
  • Coarsening ratios: {0.01, 0.1, 0.2, 0.3}
  • Low-rank dimension: r=8 for node tasks, r=1 for graph tasks
  • Constraint weight: α=1 for node classification, α=0 for graph classification

Experimental Results

Main Results

Single-sample Classification Performance

MSGCOT achieves optimal performance across all 8 datasets:

Node Classification:

  • Cora: 62.13% (vs GCOT 59.54%, +4.35%)
  • Citeseer: 49.05% (vs GCOT 48.13%, +1.91%)
  • Pubmed: 64.67% (vs GCOT 63.38%, +2.04%)
  • Photo: 68.01% (vs GCOT 66.98%, +1.54%)

Graph Classification:

  • MUTAG: 63.54% (vs GCOT 60.34%, +5.30%)
  • COX2: 73.62% (vs DAGPrompt 55.00%, +33.85%)
  • BZR: 69.85% (vs DAGPrompt 55.49%, +25.87%)
  • PROTEINS: 57.83% (vs DAGPrompt 56.22%, +2.86%)

Few-shot Learning Performance

Under 1-3 sample settings, MSGCOT surpasses baseline methods by 5-8% on average, demonstrating superior few-shot generalization capability.

Ablation Study

Systematic ablation experiments validate the contribution of each component:

  1. Multi-scale prompts (MSP): Average decline of 5.52% for node tasks and 17.7% for graph tasks upon removal
  2. Reconstruction loss (RE): Significant impact on node classification; graph classification focuses on global information
  3. Backtracking mechanism (TB): Particularly critical for graph classification; unidirectional prompts result in 12-15% performance degradation
  4. Incremental updates (IU): Progressive updates yield 2-5% performance improvement

Parameter Efficiency Analysis

Significant parameter reduction compared to GCOT:

  • Node classification: 47.1%-68.3% parameter reduction
  • Graph classification: 29.1%-85.7% parameter reduction
  • Time efficiency: Average training time per epoch reduced by 34.8% for graph classification tasks

Hyperparameter Sensitivity

  1. Coarsening ratio: Optimal range 0.1-0.3 for node tasks; stable within 0.05-0.3 for graph tasks
  2. Coarsening layers: Optimal at 2 layers for node tasks; supports deeper layers for graph tasks
  3. Hidden dimension: Optimal at r=8 for node tasks; already exhibits superior performance at r=1 for graph tasks

Graph Pretraining

  • Contrastive learning: GraphCL, DGI, etc. learn representations through positive-negative sample contrasts
  • Generative learning: Pretraining through node feature or graph structure reconstruction
  • Limitations: Gap between pretraining objectives and downstream tasks limits performance

Graph Prompt Learning

  • Pretraining-dependent methods: GPPT, GraphPrompt, All-in-One
  • Pretraining-independent methods: GPF+, SUPT, EdgePrompt
  • Multi-step prompting: GCOT introduces chain-of-thought concepts but remains constrained to single-granularity

Graph Coarsening Techniques

  • Traditional methods: Spectral clustering, non-negative matrix factorization
  • Learnable methods: DiffPool and similar approaches achieve hierarchical representations through learnable assignment matrices
  • This work's contribution: Combines graph coarsening with prompt learning to enable multi-scale prompt generation

Conclusions and Discussion

Main Conclusions

  1. Importance of multi-scale information: Experiments demonstrate that multi-scale structural information is crucial for graph prompt learning
  2. Effectiveness of cognition-inspired design: Simulating human coarse-to-fine cognitive processes significantly enhances performance
  3. Balance between parameter efficiency and performance: Low-rank design maintains superior performance while substantially reducing parameters
  4. Few-shot learning advantages: Multi-scale prompts demonstrate particularly outstanding performance in data-scarce scenarios

Limitations

  1. Computational complexity: Multi-step reasoning introduces certain computational overhead
  2. Hyperparameter sensitivity: Coarsening ratios and layer numbers require task-specific tuning
  3. Insufficient theoretical analysis: Lacks theoretical guarantees for multi-scale prompt effectiveness

Future Directions

  1. Adaptive coarsening strategies: Investigate task-adaptive coarsening mechanisms
  2. Theoretical analysis: Establish theoretical frameworks for multi-scale prompt learning
  3. Extended applications: Explore application potential in more graph learning tasks

In-depth Evaluation

Strengths

  1. Strong novelty: First systematic integration of multi-scale information into graph prompt learning
  2. Reasonable design: Ingenious design of low-rank coarsening networks and backtracking mechanisms, balancing efficiency and effectiveness
  3. Comprehensive experiments: 8 datasets, multiple baseline methods, detailed ablation studies
  4. High practical value: Pronounced advantages in few-shot scenarios, aligning with practical application needs

Weaknesses

  1. Weak theoretical foundation: Lacks theoretical analysis and guarantees for method effectiveness
  2. Insufficient computational overhead analysis: While complexity analysis is provided, actual runtime comparisons are limited
  3. Inadequate applicability discussion: Insufficient analysis of applicability to different types of graph data

Impact

  1. Academic contribution: Provides new research directions for the graph prompt learning field
  2. Practical value: Important application value in resource-constrained few-shot learning scenarios
  3. Reproducibility: Provides detailed implementation details and hyperparameter settings

Applicable Scenarios

  1. Few-shot graph learning: Graph analysis tasks with scarce labeled data
  2. Multi-scale graph analysis: Applications requiring capture of multi-level structural information
  3. Resource-constrained environments: Deployment scenarios with parameter efficiency requirements

References

This paper cites 38 relevant references covering multiple related domains including graph neural networks, graph pretraining, prompt learning, and graph coarsening, providing a solid theoretical foundation for the research.


Overall Assessment: This is a high-quality graph neural network prompt learning paper that innovatively addresses the single-granularity limitation of existing methods. The method design is sound, experimental validation is comprehensive, and it holds important significance in both theoretical contribution and practical value. Although there remains room for improvement in theoretical analysis, the paper makes important contributions to the graph prompt learning field overall.