2025-11-22T07:19:16.386176

MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-Augmented Generation Systems

Zhao, Ji, Niu et al.

The traditional RAG paradigm, which typically engages in the comprehension of relevant text chunks in response to received queries, inherently restricts both the depth of knowledge internalization and reasoning capabilities. To address this limitation, our research transforms the text processing in RAG from passive chunking to proactive understanding, defining this process as document memory extraction with the objective of simulating human cognitive processes during reading. Building upon this, we propose the Mixtures of scenario-aware document Memories (MoM) framework, engineered to efficiently handle documents from multiple domains and train small language models (SLMs) to acquire the ability to proactively explore and construct document memories. The MoM initially instructs large language models (LLMs) to simulate domain experts in generating document logical outlines, thereby directing structured chunking and core content extraction. It employs a multi-path sampling and multi-perspective evaluation mechanism, specifically designing comprehensive metrics that represent chunk clarity and extraction completeness to select the optimal document memories. Additionally, to infuse deeper human-like reading abilities during the training of SLMs, we incorporate a reverse reasoning strategy, which deduces refined expert thinking paths from high-quality outcomes. Finally, leveraging diverse forms of content generated by MoM, we develop a three-layer document memory retrieval mechanism, which is grounded in our theoretical proof from the perspective of probabilistic modeling. Extensive experimental results across three distinct domains demonstrate that the MoM framework not only resolves text chunking challenges in existing RAG systems, providing LLMs with semantically complete document memories, but also paves the way for SLMs to achieve human-centric intelligent text processing.

academic

MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-Augmented Generation Systems

Basic Information

Paper ID: 2510.14252
Title: MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-Augmented Generation Systems
Authors: Jihao Zhao, Zhiyuan Ji, Simin Niu, Hanyu Wang, Feiyu Xiong, Zhiyu Li
Category: cs.CL (Computational Linguistics)
Publication Date: October 16, 2024 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.14252
Code Link: https://github.com/MemTensor/MoM

Abstract

Traditional retrieval-augmented generation (RAG) paradigms typically respond to queries by understanding relevant text chunks, an approach that inherently limits the depth of knowledge internalization and reasoning capabilities. To address this limitation, this research transforms text processing in RAG from passive chunking to active comprehension, defined as a document memory extraction process that aims to simulate the cognitive processes of human reading. Based on this, the authors propose a scenario-aware document memory mixture (MoM) framework, designed to efficiently process multi-domain documents and train small language models (SLMs) to acquire the ability to actively explore and construct document memories.

Research Background and Motivation

Core Problem

Traditional RAG systems suffer from a fundamental cognitive gap: reducing document processing to mechanized preprocessing steps, employing a passive "chunk-first-understand-later" approach that contradicts the cognitive processes of human experts.

Problem Significance

Loss of Semantic Integrity: Traditional chunking methods (fixed-length, recursive chunking, etc.) ignore the deep semantic coherence and logical structure of documents
Knowledge Fragmentation: Existing methods follow bottom-up construction logic, lacking macroscopic understanding of document architecture
Limited Reasoning Capacity: Passive chunking constrains the model's knowledge internalization depth and reasoning capabilities

Limitations of Existing Methods

Rule-Based Methods: Completely ignore semantic coherence, segmenting based on fixed sizes or syntactic boundaries
Semantic Chunking Methods: While preserving local semantics, still lack global document understanding
LLM Iterative Segmentation: Computationally expensive, essentially still seeking breakpoints locally

Research Motivation

Simulate the cognitive process of human experts reading complex documents: first grasp macroscopic logical structure, identify key arguments, and ultimately form structured, hierarchical memories.

Core Contributions

Active Memory Extraction Paradigm: Proposes replacing passive text chunking with active memory extraction, constructing structured document memories through global understanding
Three-Layer Document Memory Retrieval Mechanism: Develops a theoretically proven retrieval algorithm based on probabilistic modeling that more effectively reduces information loss compared to traditional fusion strategies
Reverse Reasoning Strategy: Designs the Chain of Memory (CoM) extraction construction method, enabling SLMs to autonomously execute complex memory extraction tasks
Multi-Domain Validation: Validates the MoM framework effectiveness on three different domain datasets, constructing 40K training samples and training multiple MemReader models

Methodology Details

Task Definition

Document memory is defined as a triplet: Mdoc = {O, C, A}, where:

O (Outline): The macroscopic logical structure of the document, an ordered set of core topics
C (Core Content): Core viewpoints of the document, highly condensed knowledge points corresponding to each outline node
A (Atomic Chunks): Structured, fine-grained content segmentation guided by O

Model Architecture

1. Scenario-Aware Document Memory Extraction

Expert Simulation: Uses a large language model MG to simulate domain-specific experts, generating document logical outlines O through scenario-aware prompting.

Multi-Path Sampling: Adjusts MG's decoding parameters to generate N candidate document memory sets for the same document D.

Multi-Dimensional Evaluation: Designs two key quantitative evaluation metrics:

Atomic Chunk Clarity:

Sclarity(Mdoc) = 1/(n-1) * Σ PMeval(bi,i+1|ai, ai+1)

Core Content Completeness:

Scomp(Mdoc) = 1/n * Σ 1/(PPL(ai|ci) · log(|ci|))

Optimal Selection: Uses reciprocal rank fusion (RRF) algorithm to compute composite scores:

SRRF(M(i)doc) = 1/(k + rank(i)clarity) + 1/(k + rank(i)comp)

2. CoM Reverse Construction

Utilizes the guidance model MG, inputting the original document D and optimal document memory Mdoc, to generate reasoning paths P, constituting high-quality CoM data.

3. MemReader Training

Trains SLM based on triplets (D, P, Mdoc), with loss function:

LF(θ) = -1/τ * Σ log P(ot|o<t, s; θ)

Three-Layer Document Memory Retrieval Mechanism

Theoretical Foundation

Assumption 1 (Semantic Divergence Hypothesis): Global queries and local queries have significantly separated semantic centers in embedding space:

||μabs - μquery||2 > 0

Theorem 1: For user queries, hierarchical multi-vector (HMV) outperforms single-vector fusion (SVF) in expected similarity.

Theorem 2: HMV strategy has lower probability of deviating from ideal cases than SVF strategy, providing stronger probabilistic guarantees.

Retrieval Algorithm

Constructs a three-layer retrieval mechanism corresponding to O, C, and A, retrieving independently and then fusing results, theoretically proven to more effectively avoid information loss.

Experimental Setup

Datasets

CRUD: News domain, focused on long-form answer generation
OmniEval: Financial domain, containing 5 task types and 16 financial topics
MultiFieldQA_zh: Multi-domain dataset sourced from LongBench benchmark

Evaluation Metrics

BLEU Series: Measures n-gram overlap
ROUGE-L: Longest common subsequence
METEOR: Synonym and syntactic variation matching

Baseline Methods

Original chunking: Fixed-length chunking
Llama_index: Chunking preserving sentence boundaries
Similarity chunking: Segmentation based on semantic similarity
LumberChunker: First method introducing LLM-based segmentation
MoC MetaChunker: Parameter-efficient chunking balancing accuracy and efficiency

Implementation Details

Guidance Model: DeepSeek-R1
Base Models: Qwen2.5 series (1.5B, 3B, 7B, 14B)
Embedding Model: bge-base-zh-v1.5
Hardware: NVIDIA A800 80G (training), MetaX C500 64G (evaluation)

Experimental Results

Main Results

Method	CRUD (ROUGE-L)	OmniEval (ROUGE-L)	MultiFieldQA (ROUGE-L)
Original	0.5654	0.2254	0.2315
Llama_index	0.5896	0.2350	0.2363
Semantic Chunking	0.5823	0.2240	0.2191
LumberChunker	0.5701	0.2375	0.2426
MoC MetaChunker	0.6031	0.2457	0.2255
MemReader-7B	0.6152	0.2500	0.2637

Key Findings

Scale Effects: Even smaller MemReader-3B and MemReader-1.5B outperform all baseline methods
Domain Adaptability: Encounters challenges in the financial domain (OmniEval), but MemReader-7B still performs well across three metrics
Semantic Advantages: Excels in ROUGE-L and METEOR metrics, demonstrating advantages in semantic similarity

Ablation Studies

Evaluation Metric Effectiveness

Atomic chunk clarity shows correlation coefficients with ROUGE-L of 0.7044, 0.7585, and 0.7248 across three evaluation models, demonstrating strong positive correlation.

Information Support Analysis

Designs information support score to evaluate how retrieved content supports answers:

Ssupport(A|C) = -1/m * Σ log P(ai|a1,...,ai-1,C)

MemReader-3B achieves optimal performance across all evaluation models, proving that extracted memories provide more information for downstream tasks.

Text Chunking in RAG

Traditional Methods: Fixed-size chunking, recursive chunking, syntax boundary-based segmentation
Semantic Chunking: Merging text based on sentence embedding similarity or decomposing into atomic facts
Limitations: Lack macroscopic understanding of document architecture

Memory Systems in RAG

Conversational Memory: Mem0, LangMem, MemoryScope and other systems focus on conversational scenarios
Document Memory: Relatively simple, such as MemGPT's pagination mechanism, MemoRAG's pointer navigation
Research Gap: Lack of advanced mechanisms for actively constructing structured, semantically coherent document memories

Conclusions and Discussion

Main Conclusions

The MoM framework successfully elevates document processing from surface operations to deep cognition
The three-layer document memory retrieval mechanism outperforms traditional methods both theoretically and practically
SLMs empowered by MoM demonstrate superior multi-domain document understanding and organization capabilities

Limitations

Domain Dependency: Performance is constrained in information-dense discrete domains like finance
Computational Cost: Multi-path sampling and evaluation increase computational overhead
Training Data: Depends on high-quality expert simulation data

Future Directions

Extend adaptability to more specialized domains
Optimize computational efficiency and inference speed
Explore more complex memory structures and retrieval strategies

In-Depth Evaluation

Strengths

Strong Innovation: First to propose active memory extraction paradigm, breaking through traditional RAG limitations
Solid Theory: Provides complete probabilistic modeling theoretical proofs
Comprehensive Experiments: Full evaluation across three domains with detailed ablation studies
High Practical Value: Open-source code directly applicable to existing RAG systems

Weaknesses

Evaluation Limitations: Primarily validated on Chinese datasets with limited internationalization
Baseline Comparisons: Lacks comparison with latest SOTA methods
Computational Analysis: Lacks detailed analysis of computational complexity and inference efficiency

Impact

Academic Contribution: Provides new research paradigm for RAG field
Engineering Value: Can significantly enhance performance of existing RAG systems
Reproducibility: Provides complete code and detailed implementation details

Applicable Scenarios

Knowledge-Intensive Applications: Legal document analysis, academic paper understanding
Multi-Domain QA Systems: Applications requiring cross-domain document understanding
Enterprise Knowledge Management: Intelligent retrieval and question-answering for internal documents

References

The paper cites 32 related references covering RAG foundational theory, text chunking methods, memory system design and other key areas, providing solid theoretical foundation for the research.

Overall Assessment: This is an important paper with significant innovation in the RAG field. By introducing a cognitive science perspective to redefine document processing paradigms, it achieves breakthroughs both theoretically and practically. Despite some limitations, its pioneering approach and rigorous experimental validation make it an important contribution to the field.