2025-11-20T17:34:15.321910

ReMindRAG: Low-Cost LLM-Guided Knowledge Graph Traversal for Efficient RAG

Hu, Zhu, Tang et al.

Knowledge graphs (KGs), with their structured representation capabilities, offer promising avenue for enhancing Retrieval Augmented Generation (RAG) systems, leading to the development of KG-RAG systems. Nevertheless, existing methods often struggle to achieve effective synergy between system effectiveness and cost efficiency, leading to neither unsatisfying performance nor excessive LLM prompt tokens and inference time. To this end, this paper proposes REMINDRAG, which employs an LLM-guided graph traversal featuring node exploration, node exploitation, and, most notably, memory replay, to improve both system effectiveness and cost efficiency. Specifically, REMINDRAG memorizes traversal experience within KG edge embeddings, mirroring the way LLMs "memorize" world knowledge within their parameters, but in a train-free manner. We theoretically and experimentally confirm the effectiveness of REMINDRAG, demonstrating its superiority over existing baselines across various benchmark datasets and LLM backbones. Our code is available at https://github.com/kilgrims/ReMindRAG.

academic

ReMindRAG: Low-Cost LLM-Guided Knowledge Graph Traversal for Efficient RAG

Basic Information

Paper ID: 2510.13193
Title: ReMindRAG: Low-Cost LLM-Guided Knowledge Graph Traversal for Efficient RAG
Authors: Yikuan Hu, Jifeng Zhu, Lanrui Tang, Chen Huang
Category: cs.IR (Information Retrieval)
Conference: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Paper Link: https://arxiv.org/abs/2510.13193
Code Link: https://github.com/kilgrims/ReMindRAG

Abstract

Knowledge graphs (KG), with their structured representation capabilities, offer promising avenues for enhancing retrieval-augmented generation (RAG) systems, driving the development of KG-RAG systems. However, existing methods often struggle to achieve effective synergy between system effectiveness and cost efficiency, resulting in poor performance or excessive LLM prompt tokens and inference time. To address this, we propose REMINDRAG, which employs LLM-guided graph traversal incorporating node exploration, node exploitation, and most importantly, a memory replay mechanism to enhance both system effectiveness and cost efficiency. Specifically, REMINDRAG memorizes traversal experiences in KG edge embeddings, similar to how LLMs "memorize" world knowledge in their parameters, but in a training-free manner. We validate REMINDRAG's effectiveness both theoretically and empirically, demonstrating its superiority over existing baselines across various benchmark datasets and LLM backbones.

Research Background and Motivation

Problem Definition

Traditional RAG methods primarily rely on dense vector retrieval to identify relevant text passages but show limited performance on complex tasks requiring multi-hop reasoning or capturing long-range dependencies. Knowledge graphs, with their structured representation of entities and relations, provide a new avenue for addressing this challenge.

Limitations of Existing Methods

Traditional graph search algorithms: Methods such as PageRank and GNNs struggle to capture fine-grained semantic relationships in graphs, resulting in insufficient system effectiveness
LLM-guided graph traversal methods: While demonstrating superior performance, they require extensive LLM calls, significantly increasing costs and inference time
Efficiency-effectiveness trade-off: Existing KG-RAG systems struggle to find an effective balance between system effectiveness and cost efficiency

Research Motivation

This work aims to address the synergistic optimization problem between system effectiveness and cost efficiency in KG-RAG systems, which represents a major challenge for practical deployment and scalability.

Core Contributions

Identifying key challenges: Clearly articulates the challenge of synergistic optimization between system effectiveness and cost efficiency in KG-RAG systems
Proposing the REMINDRAG framework: Employs LLM-guided KG traversal incorporating node exploration, node exploitation, and memory replay mechanisms
Theoretical analysis: Theoretically proves the effectiveness of graph traversal memory replay
Experimental validation: Validates REMINDRAG's superiority across multiple benchmark datasets and LLM backbones

Methodology Details

Task Definition

Given unstructured text documents and user queries, the objective is to construct a knowledge graph and retrieve relevant information through an efficient graph traversal mechanism to generate accurate answers while minimizing LLM invocation costs.

Model Architecture

1. Knowledge Graph Construction

REMINDRAG constructs a heterogeneous knowledge graph containing:

Entity nodes: Named entities extracted from text
Anchor nodes: Storing text chunk titles
Text chunk collection: Segmented original documents
Relation connections: Entity-relation-entity triples and contextual skeleton networks

2. LLM-Guided Knowledge Graph Traversal

Node Exploration Strategy:

Prioritizes exploring potential nodes likely to lead to answers
In each iteration, the LLM evaluates all nodes in subgraph S, selecting the target node a most likely to lead to the answer

Node Exploitation Strategy:

Focuses on exploiting previously explored nodes, extending paths along these nodes
Given selected node a, the LLM selects the optimal expansion node p from its adjacency set Sa

3. Memory Replay Mechanism

Memory Content:

Effective paths: Paths leading to correct answers (positive reinforcement)
Ineffective paths: Paths not leading to answers (negative reinforcement)

Memory Method: Updates edge embeddings using closed-form equations:

Weight function: δ(x) = (2/π)cos(π||x||₂/2)
Enhance effective paths: v̂ = v + δ(v) · q/||q||₂
Penalize ineffective paths: v̂ = v - δ(v·q/||q||₂) · v·q/||q||₂

Fast Awakening and Damped Updates:

Fast awakening: When edge embedding v has small norm, δ function produces large directional updates
Damped updates: When edge embedding v has large norm, δ function produces only small updates, maintaining stability

Technical Innovations

Training-free memory mechanism: Memorizes traversal experiences through edge embeddings without additional training
Balanced exploration and exploitation: Combines node exploration and exploitation strategies to achieve both global and local optimal search
Adaptive weight updates: Adaptive update strategy based on vector norms, balancing rapid learning and long-term stability

Experimental Setup

Datasets

Long-dependency QA: LooGLE dataset, testing long-range semantic retrieval capabilities
Multi-hop QA: HotpotQA dataset, evaluating multi-step reasoning abilities
Simple QA: LooGLE short-dependency QA, testing direct information extraction capabilities

Evaluation Metrics

Effectiveness evaluation: Uses GPT-4o as LLM judge to assess answer accuracy
Cost efficiency evaluation: Average LLM tokens consumed per query during traversal

Baseline Methods

Traditional retrieval methods: BM25, NaiveRAG
KG-RAG systems using graph search algorithms: GraphRAG, LightRAG, HippoRAG2
LLM-guided KG-RAG systems: Plan-on-Graph

Implementation Details

LLM backbone: GPT-4o-mini, Deepseek-V3
Embedding model: nomic-ai/nomic-embed-text-v2-moe
Text chunking: 750 token length
Key parameters: α=0.1 (node relevance weight), λ=0.55 (strong connection threshold)

Experimental Results

Main Results

QA Type	GPT-4o-mini	Deepseek-V3
Long-dependency QA	57.04%	59.73%
Multi-hop QA	74.22%	79.38%
Simple QA	76.67%	77.01%

REMINDRAG significantly outperforms baseline methods across all tasks:

Long-dependency QA: Average improvement of 12.08%
Multi-hop QA: Average improvement of 10.31%
Simple QA: Average improvement of 4.66%

Cost Efficiency Analysis

Setting Type	Accuracy	Token Consumption	Cost Reduction
No memory	57.04%	14.91K	-
1-round memory	56.48%	9.68K	35.1%
2-round memory	58.01%	7.55K	49.4%
3-round memory	60.31%	6.71K	55.0%

After multiple rounds of memory, REMINDRAG achieves an average token consumption reduction of 58.8%.

Ablation Studies

Impact of contextual skeleton networks:

Removing contextual skeleton networks reduces long-dependency QA performance from 57.04% to 51.01%
Validates the importance of contextual information capture

Impact of hop count settings:

System performance monotonically increases with maximum hop count
Larger hop counts enable nodes to access broader neighborhood information

Case Analysis

Self-correction capability:

After initial incorrect answers, the system penalizes irrelevant nodes based on memory rules
Subsequent queries switch to memory-optimized subgraphs, achieving error self-correction

Memory stability:

Maintains stable performance under complex multi-round memory settings
Demonstrates robustness when handling heterogeneous datasets alternately

Theoretical Analysis

Memory Capacity Theorem

For a collection of queries with certain semantic similarity, when embedding dimension d is sufficiently large, edge embeddings can effectively memorize query information under the condition:

θ ≤ lim[d→∞] [2 arcsin(√(1/2 sin(arccos(λ))))]

where θ is the maximum angle between query embedding pairs, and λ is the strong connection threshold.

Theoretical Guarantees

The theoretical upper bound for λ is 0.775, consistent with existing semantic similarity threshold research of 0.6
When embedding dimension exceeds 100, theoretical approximations demonstrate significant practical utility

Development of KG-RAG Systems

Traditional graph search methods: PageRank, Random Walk, GNN, etc., struggle to capture fine-grained semantic relationships
LLM-guided methods: Leverage LLM's semantic understanding capabilities but are cost-prohibitive
Graph pruning and path planning: Existing optimization methods show limited effectiveness

Memory Replay Mechanisms

Draws inspiration from memory replay concepts in reinforcement learning
Innovatively embeds memory as weights in graphs rather than explicit sample replay

Conclusions and Discussion

Main Conclusions

REMINDRAG successfully achieves synergistic optimization of system effectiveness and cost efficiency
The memory replay mechanism significantly improves efficiency for subsequent queries
Self-correction capability enhances system robustness

Limitations

Initial graph traversal cost: First traversal still requires multiple LLM calls
Large-scale document processing: Constructing knowledge graphs requires substantial time and computational resources
Memory capacity constraints: Theoretical analysis assumes infinite dimensionality, potentially limiting practical applications

Future Directions

Pre-trained memory initialization: Use domain-specific FAQ to pre-initialize model memory
Distributed graph construction: Optimize graph construction efficiency for large-scale documents
Dynamic memory management: Investigate forgetting and update mechanisms for long-term memory

In-Depth Evaluation

Strengths

Strong novelty: First to propose a training-free graph traversal memory mechanism
Solid theory: Provides theoretical analysis and guarantees for memory capacity
Comprehensive experiments: Full evaluation across multiple datasets and LLM backbones
High practical value: Significant performance improvements and cost reduction

Weaknesses

Parameter sensitivity: Multiple hyperparameter settings may affect performance
Scalability concerns: Applicability to ultra-large-scale knowledge graphs remains insufficiently validated
Memory update strategy: Simple linear updates may not suit all scenarios

Impact

Academic contribution: Provides new optimization perspectives for the KG-RAG field
Practical applications: Broad application prospects in QA systems, information retrieval, and related domains
Reproducibility: Open-source code facilitates verification and extension by the research community

Applicable Scenarios

Multi-turn dialogue systems: Can memorize historical interactions to improve response efficiency
Domain-specific QA: Capable of accumulating and leveraging traversal experience within specific domains
Cost-sensitive applications: Suitable for scenarios with strict requirements on LLM invocation costs

References

This paper cites important works from multiple domains including RAG, knowledge graphs, and graph neural networks, including:

Lewis et al. (2020): Retrieval-augmented generation for knowledge-intensive NLP tasks
Edge et al. (2024): GraphRAG approach to query-focused summarization
Guo et al. (2024): LightRAG simple and fast retrieval-augmented generation
And 55 additional related references

Overall Assessment: REMINDRAG is a high-quality research work that proposes an innovative solution in the KG-RAG field. The method not only achieves technical breakthroughs but, more importantly, addresses a critical practical problem—balancing effectiveness and efficiency. The theoretical analysis is rigorous, experimental design is sound, and results are convincing. While certain limitations exist, its contributions are significant and hold important implications for advancing the practical application of KG-RAG technology.