2025-11-20T17:34:15.321910

ReMindRAG: Low-Cost LLM-Guided Knowledge Graph Traversal for Efficient RAG

Hu, Zhu, Tang et al.
Knowledge graphs (KGs), with their structured representation capabilities, offer promising avenue for enhancing Retrieval Augmented Generation (RAG) systems, leading to the development of KG-RAG systems. Nevertheless, existing methods often struggle to achieve effective synergy between system effectiveness and cost efficiency, leading to neither unsatisfying performance nor excessive LLM prompt tokens and inference time. To this end, this paper proposes REMINDRAG, which employs an LLM-guided graph traversal featuring node exploration, node exploitation, and, most notably, memory replay, to improve both system effectiveness and cost efficiency. Specifically, REMINDRAG memorizes traversal experience within KG edge embeddings, mirroring the way LLMs "memorize" world knowledge within their parameters, but in a train-free manner. We theoretically and experimentally confirm the effectiveness of REMINDRAG, demonstrating its superiority over existing baselines across various benchmark datasets and LLM backbones. Our code is available at https://github.com/kilgrims/ReMindRAG.
academic

ReMindRAG: Low-Cost LLM-Guided Knowledge Graph Traversal for Efficient RAG

Basic Information

  • Paper ID: 2510.13193
  • Title: ReMindRAG: Low-Cost LLM-Guided Knowledge Graph Traversal for Efficient RAG
  • Authors: Yikuan Hu, Jifeng Zhu, Lanrui Tang, Chen Huang
  • Category: cs.IR (Information Retrieval)
  • Conference: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
  • Paper Link: https://arxiv.org/abs/2510.13193
  • Code Link: https://github.com/kilgrims/ReMindRAG

Abstract

Knowledge graphs (KG), with their structured representation capabilities, offer promising avenues for enhancing retrieval-augmented generation (RAG) systems, driving the development of KG-RAG systems. However, existing methods often struggle to achieve effective synergy between system effectiveness and cost efficiency, resulting in poor performance or excessive LLM prompt tokens and inference time. To address this, we propose REMINDRAG, which employs LLM-guided graph traversal incorporating node exploration, node exploitation, and most importantly, a memory replay mechanism to enhance both system effectiveness and cost efficiency. Specifically, REMINDRAG memorizes traversal experiences in KG edge embeddings, similar to how LLMs "memorize" world knowledge in their parameters, but in a training-free manner. We validate REMINDRAG's effectiveness both theoretically and empirically, demonstrating its superiority over existing baselines across various benchmark datasets and LLM backbones.

Research Background and Motivation

Problem Definition

Traditional RAG methods primarily rely on dense vector retrieval to identify relevant text passages but show limited performance on complex tasks requiring multi-hop reasoning or capturing long-range dependencies. Knowledge graphs, with their structured representation of entities and relations, provide a new avenue for addressing this challenge.

Limitations of Existing Methods

  1. Traditional graph search algorithms: Methods such as PageRank and GNNs struggle to capture fine-grained semantic relationships in graphs, resulting in insufficient system effectiveness
  2. LLM-guided graph traversal methods: While demonstrating superior performance, they require extensive LLM calls, significantly increasing costs and inference time
  3. Efficiency-effectiveness trade-off: Existing KG-RAG systems struggle to find an effective balance between system effectiveness and cost efficiency

Research Motivation

This work aims to address the synergistic optimization problem between system effectiveness and cost efficiency in KG-RAG systems, which represents a major challenge for practical deployment and scalability.

Core Contributions

  1. Identifying key challenges: Clearly articulates the challenge of synergistic optimization between system effectiveness and cost efficiency in KG-RAG systems
  2. Proposing the REMINDRAG framework: Employs LLM-guided KG traversal incorporating node exploration, node exploitation, and memory replay mechanisms
  3. Theoretical analysis: Theoretically proves the effectiveness of graph traversal memory replay
  4. Experimental validation: Validates REMINDRAG's superiority across multiple benchmark datasets and LLM backbones

Methodology Details

Task Definition

Given unstructured text documents and user queries, the objective is to construct a knowledge graph and retrieve relevant information through an efficient graph traversal mechanism to generate accurate answers while minimizing LLM invocation costs.

Model Architecture

1. Knowledge Graph Construction

REMINDRAG constructs a heterogeneous knowledge graph containing:

  • Entity nodes: Named entities extracted from text
  • Anchor nodes: Storing text chunk titles
  • Text chunk collection: Segmented original documents
  • Relation connections: Entity-relation-entity triples and contextual skeleton networks

2. LLM-Guided Knowledge Graph Traversal

Node Exploration Strategy:

  • Prioritizes exploring potential nodes likely to lead to answers
  • In each iteration, the LLM evaluates all nodes in subgraph S, selecting the target node a most likely to lead to the answer

Node Exploitation Strategy:

  • Focuses on exploiting previously explored nodes, extending paths along these nodes
  • Given selected node a, the LLM selects the optimal expansion node p from its adjacency set Sa

3. Memory Replay Mechanism

Memory Content:

  • Effective paths: Paths leading to correct answers (positive reinforcement)
  • Ineffective paths: Paths not leading to answers (negative reinforcement)

Memory Method: Updates edge embeddings using closed-form equations:

Weight function: δ(x) = (2/π)cos(π||x||₂/2)
Enhance effective paths: v̂ = v + δ(v) · q/||q||₂
Penalize ineffective paths: v̂ = v - δ(v·q/||q||₂) · v·q/||q||₂

Fast Awakening and Damped Updates:

  • Fast awakening: When edge embedding v has small norm, δ function produces large directional updates
  • Damped updates: When edge embedding v has large norm, δ function produces only small updates, maintaining stability

Technical Innovations

  1. Training-free memory mechanism: Memorizes traversal experiences through edge embeddings without additional training
  2. Balanced exploration and exploitation: Combines node exploration and exploitation strategies to achieve both global and local optimal search
  3. Adaptive weight updates: Adaptive update strategy based on vector norms, balancing rapid learning and long-term stability

Experimental Setup

Datasets

  1. Long-dependency QA: LooGLE dataset, testing long-range semantic retrieval capabilities
  2. Multi-hop QA: HotpotQA dataset, evaluating multi-step reasoning abilities
  3. Simple QA: LooGLE short-dependency QA, testing direct information extraction capabilities

Evaluation Metrics

  1. Effectiveness evaluation: Uses GPT-4o as LLM judge to assess answer accuracy
  2. Cost efficiency evaluation: Average LLM tokens consumed per query during traversal

Baseline Methods

  1. Traditional retrieval methods: BM25, NaiveRAG
  2. KG-RAG systems using graph search algorithms: GraphRAG, LightRAG, HippoRAG2
  3. LLM-guided KG-RAG systems: Plan-on-Graph

Implementation Details

  • LLM backbone: GPT-4o-mini, Deepseek-V3
  • Embedding model: nomic-ai/nomic-embed-text-v2-moe
  • Text chunking: 750 token length
  • Key parameters: α=0.1 (node relevance weight), λ=0.55 (strong connection threshold)

Experimental Results

Main Results

QA TypeGPT-4o-miniDeepseek-V3
Long-dependency QA57.04%59.73%
Multi-hop QA74.22%79.38%
Simple QA76.67%77.01%

REMINDRAG significantly outperforms baseline methods across all tasks:

  • Long-dependency QA: Average improvement of 12.08%
  • Multi-hop QA: Average improvement of 10.31%
  • Simple QA: Average improvement of 4.66%

Cost Efficiency Analysis

Setting TypeAccuracyToken ConsumptionCost Reduction
No memory57.04%14.91K-
1-round memory56.48%9.68K35.1%
2-round memory58.01%7.55K49.4%
3-round memory60.31%6.71K55.0%

After multiple rounds of memory, REMINDRAG achieves an average token consumption reduction of 58.8%.

Ablation Studies

Impact of contextual skeleton networks:

  • Removing contextual skeleton networks reduces long-dependency QA performance from 57.04% to 51.01%
  • Validates the importance of contextual information capture

Impact of hop count settings:

  • System performance monotonically increases with maximum hop count
  • Larger hop counts enable nodes to access broader neighborhood information

Case Analysis

Self-correction capability:

  • After initial incorrect answers, the system penalizes irrelevant nodes based on memory rules
  • Subsequent queries switch to memory-optimized subgraphs, achieving error self-correction

Memory stability:

  • Maintains stable performance under complex multi-round memory settings
  • Demonstrates robustness when handling heterogeneous datasets alternately

Theoretical Analysis

Memory Capacity Theorem

For a collection of queries with certain semantic similarity, when embedding dimension d is sufficiently large, edge embeddings can effectively memorize query information under the condition:

θ ≤ lim[d→∞] [2 arcsin(√(1/2 sin(arccos(λ))))]

where θ is the maximum angle between query embedding pairs, and λ is the strong connection threshold.

Theoretical Guarantees

  • The theoretical upper bound for λ is 0.775, consistent with existing semantic similarity threshold research of 0.6
  • When embedding dimension exceeds 100, theoretical approximations demonstrate significant practical utility

Development of KG-RAG Systems

  1. Traditional graph search methods: PageRank, Random Walk, GNN, etc., struggle to capture fine-grained semantic relationships
  2. LLM-guided methods: Leverage LLM's semantic understanding capabilities but are cost-prohibitive
  3. Graph pruning and path planning: Existing optimization methods show limited effectiveness

Memory Replay Mechanisms

  • Draws inspiration from memory replay concepts in reinforcement learning
  • Innovatively embeds memory as weights in graphs rather than explicit sample replay

Conclusions and Discussion

Main Conclusions

  1. REMINDRAG successfully achieves synergistic optimization of system effectiveness and cost efficiency
  2. The memory replay mechanism significantly improves efficiency for subsequent queries
  3. Self-correction capability enhances system robustness

Limitations

  1. Initial graph traversal cost: First traversal still requires multiple LLM calls
  2. Large-scale document processing: Constructing knowledge graphs requires substantial time and computational resources
  3. Memory capacity constraints: Theoretical analysis assumes infinite dimensionality, potentially limiting practical applications

Future Directions

  1. Pre-trained memory initialization: Use domain-specific FAQ to pre-initialize model memory
  2. Distributed graph construction: Optimize graph construction efficiency for large-scale documents
  3. Dynamic memory management: Investigate forgetting and update mechanisms for long-term memory

In-Depth Evaluation

Strengths

  1. Strong novelty: First to propose a training-free graph traversal memory mechanism
  2. Solid theory: Provides theoretical analysis and guarantees for memory capacity
  3. Comprehensive experiments: Full evaluation across multiple datasets and LLM backbones
  4. High practical value: Significant performance improvements and cost reduction

Weaknesses

  1. Parameter sensitivity: Multiple hyperparameter settings may affect performance
  2. Scalability concerns: Applicability to ultra-large-scale knowledge graphs remains insufficiently validated
  3. Memory update strategy: Simple linear updates may not suit all scenarios

Impact

  1. Academic contribution: Provides new optimization perspectives for the KG-RAG field
  2. Practical applications: Broad application prospects in QA systems, information retrieval, and related domains
  3. Reproducibility: Open-source code facilitates verification and extension by the research community

Applicable Scenarios

  1. Multi-turn dialogue systems: Can memorize historical interactions to improve response efficiency
  2. Domain-specific QA: Capable of accumulating and leveraging traversal experience within specific domains
  3. Cost-sensitive applications: Suitable for scenarios with strict requirements on LLM invocation costs

References

This paper cites important works from multiple domains including RAG, knowledge graphs, and graph neural networks, including:

  • Lewis et al. (2020): Retrieval-augmented generation for knowledge-intensive NLP tasks
  • Edge et al. (2024): GraphRAG approach to query-focused summarization
  • Guo et al. (2024): LightRAG simple and fast retrieval-augmented generation
  • And 55 additional related references

Overall Assessment: REMINDRAG is a high-quality research work that proposes an innovative solution in the KG-RAG field. The method not only achieves technical breakthroughs but, more importantly, addresses a critical practical problem—balancing effectiveness and efficiency. The theoretical analysis is rigorous, experimental design is sound, and results are convincing. While certain limitations exist, its contributions are significant and hold important implications for advancing the practical application of KG-RAG technology.