2025-11-22T07:19:16.386176

MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-Augmented Generation Systems

Zhao, Ji, Niu et al.
The traditional RAG paradigm, which typically engages in the comprehension of relevant text chunks in response to received queries, inherently restricts both the depth of knowledge internalization and reasoning capabilities. To address this limitation, our research transforms the text processing in RAG from passive chunking to proactive understanding, defining this process as document memory extraction with the objective of simulating human cognitive processes during reading. Building upon this, we propose the Mixtures of scenario-aware document Memories (MoM) framework, engineered to efficiently handle documents from multiple domains and train small language models (SLMs) to acquire the ability to proactively explore and construct document memories. The MoM initially instructs large language models (LLMs) to simulate domain experts in generating document logical outlines, thereby directing structured chunking and core content extraction. It employs a multi-path sampling and multi-perspective evaluation mechanism, specifically designing comprehensive metrics that represent chunk clarity and extraction completeness to select the optimal document memories. Additionally, to infuse deeper human-like reading abilities during the training of SLMs, we incorporate a reverse reasoning strategy, which deduces refined expert thinking paths from high-quality outcomes. Finally, leveraging diverse forms of content generated by MoM, we develop a three-layer document memory retrieval mechanism, which is grounded in our theoretical proof from the perspective of probabilistic modeling. Extensive experimental results across three distinct domains demonstrate that the MoM framework not only resolves text chunking challenges in existing RAG systems, providing LLMs with semantically complete document memories, but also paves the way for SLMs to achieve human-centric intelligent text processing.
academic

MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-Augmented Generation Systems

Basic Information

  • Paper ID: 2510.14252
  • Title: MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-Augmented Generation Systems
  • Authors: Jihao Zhao, Zhiyuan Ji, Simin Niu, Hanyu Wang, Feiyu Xiong, Zhiyu Li
  • Category: cs.CL (Computational Linguistics)
  • Publication Date: October 16, 2024 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.14252
  • Code Link: https://github.com/MemTensor/MoM

Abstract

Traditional retrieval-augmented generation (RAG) paradigms typically respond to queries by understanding relevant text chunks, an approach that inherently limits the depth of knowledge internalization and reasoning capabilities. To address this limitation, this research transforms text processing in RAG from passive chunking to active comprehension, defined as a document memory extraction process that aims to simulate the cognitive processes of human reading. Based on this, the authors propose a scenario-aware document memory mixture (MoM) framework, designed to efficiently process multi-domain documents and train small language models (SLMs) to acquire the ability to actively explore and construct document memories.

Research Background and Motivation

Core Problem

Traditional RAG systems suffer from a fundamental cognitive gap: reducing document processing to mechanized preprocessing steps, employing a passive "chunk-first-understand-later" approach that contradicts the cognitive processes of human experts.

Problem Significance

  1. Loss of Semantic Integrity: Traditional chunking methods (fixed-length, recursive chunking, etc.) ignore the deep semantic coherence and logical structure of documents
  2. Knowledge Fragmentation: Existing methods follow bottom-up construction logic, lacking macroscopic understanding of document architecture
  3. Limited Reasoning Capacity: Passive chunking constrains the model's knowledge internalization depth and reasoning capabilities

Limitations of Existing Methods

  • Rule-Based Methods: Completely ignore semantic coherence, segmenting based on fixed sizes or syntactic boundaries
  • Semantic Chunking Methods: While preserving local semantics, still lack global document understanding
  • LLM Iterative Segmentation: Computationally expensive, essentially still seeking breakpoints locally

Research Motivation

Simulate the cognitive process of human experts reading complex documents: first grasp macroscopic logical structure, identify key arguments, and ultimately form structured, hierarchical memories.

Core Contributions

  1. Active Memory Extraction Paradigm: Proposes replacing passive text chunking with active memory extraction, constructing structured document memories through global understanding
  2. Three-Layer Document Memory Retrieval Mechanism: Develops a theoretically proven retrieval algorithm based on probabilistic modeling that more effectively reduces information loss compared to traditional fusion strategies
  3. Reverse Reasoning Strategy: Designs the Chain of Memory (CoM) extraction construction method, enabling SLMs to autonomously execute complex memory extraction tasks
  4. Multi-Domain Validation: Validates the MoM framework effectiveness on three different domain datasets, constructing 40K training samples and training multiple MemReader models

Methodology Details

Task Definition

Document memory is defined as a triplet: Mdoc = {O, C, A}, where:

  • O (Outline): The macroscopic logical structure of the document, an ordered set of core topics
  • C (Core Content): Core viewpoints of the document, highly condensed knowledge points corresponding to each outline node
  • A (Atomic Chunks): Structured, fine-grained content segmentation guided by O

Model Architecture

1. Scenario-Aware Document Memory Extraction

Expert Simulation: Uses a large language model MG to simulate domain-specific experts, generating document logical outlines O through scenario-aware prompting.

Multi-Path Sampling: Adjusts MG's decoding parameters to generate N candidate document memory sets for the same document D.

Multi-Dimensional Evaluation: Designs two key quantitative evaluation metrics:

  • Atomic Chunk Clarity:
Sclarity(Mdoc) = 1/(n-1) * Σ PMeval(bi,i+1|ai, ai+1)
  • Core Content Completeness:
Scomp(Mdoc) = 1/n * Σ 1/(PPL(ai|ci) · log(|ci|))

Optimal Selection: Uses reciprocal rank fusion (RRF) algorithm to compute composite scores:

SRRF(M(i)doc) = 1/(k + rank(i)clarity) + 1/(k + rank(i)comp)

2. CoM Reverse Construction

Utilizes the guidance model MG, inputting the original document D and optimal document memory Mdoc, to generate reasoning paths P, constituting high-quality CoM data.

3. MemReader Training

Trains SLM based on triplets (D, P, Mdoc), with loss function:

LF(θ) = -1/τ * Σ log P(ot|o<t, s; θ)

Three-Layer Document Memory Retrieval Mechanism

Theoretical Foundation

Assumption 1 (Semantic Divergence Hypothesis): Global queries and local queries have significantly separated semantic centers in embedding space:

||μabs - μquery||2 > 0

Theorem 1: For user queries, hierarchical multi-vector (HMV) outperforms single-vector fusion (SVF) in expected similarity.

Theorem 2: HMV strategy has lower probability of deviating from ideal cases than SVF strategy, providing stronger probabilistic guarantees.

Retrieval Algorithm

Constructs a three-layer retrieval mechanism corresponding to O, C, and A, retrieving independently and then fusing results, theoretically proven to more effectively avoid information loss.

Experimental Setup

Datasets

  1. CRUD: News domain, focused on long-form answer generation
  2. OmniEval: Financial domain, containing 5 task types and 16 financial topics
  3. MultiFieldQA_zh: Multi-domain dataset sourced from LongBench benchmark

Evaluation Metrics

  • BLEU Series: Measures n-gram overlap
  • ROUGE-L: Longest common subsequence
  • METEOR: Synonym and syntactic variation matching

Baseline Methods

  1. Original chunking: Fixed-length chunking
  2. Llama_index: Chunking preserving sentence boundaries
  3. Similarity chunking: Segmentation based on semantic similarity
  4. LumberChunker: First method introducing LLM-based segmentation
  5. MoC MetaChunker: Parameter-efficient chunking balancing accuracy and efficiency

Implementation Details

  • Guidance Model: DeepSeek-R1
  • Base Models: Qwen2.5 series (1.5B, 3B, 7B, 14B)
  • Embedding Model: bge-base-zh-v1.5
  • Hardware: NVIDIA A800 80G (training), MetaX C500 64G (evaluation)

Experimental Results

Main Results

MethodCRUD (ROUGE-L)OmniEval (ROUGE-L)MultiFieldQA (ROUGE-L)
Original0.56540.22540.2315
Llama_index0.58960.23500.2363
Semantic Chunking0.58230.22400.2191
LumberChunker0.57010.23750.2426
MoC MetaChunker0.60310.24570.2255
MemReader-7B0.61520.25000.2637

Key Findings

  1. Scale Effects: Even smaller MemReader-3B and MemReader-1.5B outperform all baseline methods
  2. Domain Adaptability: Encounters challenges in the financial domain (OmniEval), but MemReader-7B still performs well across three metrics
  3. Semantic Advantages: Excels in ROUGE-L and METEOR metrics, demonstrating advantages in semantic similarity

Ablation Studies

Evaluation Metric Effectiveness

Atomic chunk clarity shows correlation coefficients with ROUGE-L of 0.7044, 0.7585, and 0.7248 across three evaluation models, demonstrating strong positive correlation.

Information Support Analysis

Designs information support score to evaluate how retrieved content supports answers:

Ssupport(A|C) = -1/m * Σ log P(ai|a1,...,ai-1,C)

MemReader-3B achieves optimal performance across all evaluation models, proving that extracted memories provide more information for downstream tasks.

Text Chunking in RAG

  • Traditional Methods: Fixed-size chunking, recursive chunking, syntax boundary-based segmentation
  • Semantic Chunking: Merging text based on sentence embedding similarity or decomposing into atomic facts
  • Limitations: Lack macroscopic understanding of document architecture

Memory Systems in RAG

  • Conversational Memory: Mem0, LangMem, MemoryScope and other systems focus on conversational scenarios
  • Document Memory: Relatively simple, such as MemGPT's pagination mechanism, MemoRAG's pointer navigation
  • Research Gap: Lack of advanced mechanisms for actively constructing structured, semantically coherent document memories

Conclusions and Discussion

Main Conclusions

  1. The MoM framework successfully elevates document processing from surface operations to deep cognition
  2. The three-layer document memory retrieval mechanism outperforms traditional methods both theoretically and practically
  3. SLMs empowered by MoM demonstrate superior multi-domain document understanding and organization capabilities

Limitations

  1. Domain Dependency: Performance is constrained in information-dense discrete domains like finance
  2. Computational Cost: Multi-path sampling and evaluation increase computational overhead
  3. Training Data: Depends on high-quality expert simulation data

Future Directions

  1. Extend adaptability to more specialized domains
  2. Optimize computational efficiency and inference speed
  3. Explore more complex memory structures and retrieval strategies

In-Depth Evaluation

Strengths

  1. Strong Innovation: First to propose active memory extraction paradigm, breaking through traditional RAG limitations
  2. Solid Theory: Provides complete probabilistic modeling theoretical proofs
  3. Comprehensive Experiments: Full evaluation across three domains with detailed ablation studies
  4. High Practical Value: Open-source code directly applicable to existing RAG systems

Weaknesses

  1. Evaluation Limitations: Primarily validated on Chinese datasets with limited internationalization
  2. Baseline Comparisons: Lacks comparison with latest SOTA methods
  3. Computational Analysis: Lacks detailed analysis of computational complexity and inference efficiency

Impact

  1. Academic Contribution: Provides new research paradigm for RAG field
  2. Engineering Value: Can significantly enhance performance of existing RAG systems
  3. Reproducibility: Provides complete code and detailed implementation details

Applicable Scenarios

  1. Knowledge-Intensive Applications: Legal document analysis, academic paper understanding
  2. Multi-Domain QA Systems: Applications requiring cross-domain document understanding
  3. Enterprise Knowledge Management: Intelligent retrieval and question-answering for internal documents

References

The paper cites 32 related references covering RAG foundational theory, text chunking methods, memory system design and other key areas, providing solid theoretical foundation for the research.


Overall Assessment: This is an important paper with significant innovation in the RAG field. By introducing a cognitive science perspective to redefine document processing paradigms, it achieves breakthroughs both theoretically and practically. Despite some limitations, its pioneering approach and rigorous experimental validation make it an important contribution to the field.