2025-11-13T01:58:10.933950

MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity

Tang, Gao, Li et al.
Retrieval Augmented Generation (RAG) has proven to be highly effective in boosting the generative performance of language model in knowledge-intensive tasks. However, existing RAG framework either indiscriminately perform retrieval or rely on rigid single-class classifiers to select retrieval methods, leading to inefficiencies and suboptimal performance across queries of varying complexity. To address these challenges, we propose a reinforcement learning-based framework that dynamically selects the most suitable retrieval strategy based on query complexity. % our solution Our approach leverages a multi-armed bandit algorithm, which treats each retrieval method as a distinct ``arm'' and adapts the selection process by balancing exploration and exploitation. Additionally, we introduce a dynamic reward function that balances accuracy and efficiency, penalizing methods that require more retrieval steps, even if they lead to a correct result. Our method achieves new state of the art results on multiple single-hop and multi-hop datasets while reducing retrieval costs. Our code are available at https://github.com/FUTUREEEEEE/MBA .
academic

MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity

Basic Information

  • Paper ID: 2412.01572
  • Title: MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity
  • Authors: Xiaqiang Tang, Qiang Gao, Jian Li, Nan Du, Qi Li, Sihong Xie
  • Affiliations: Hong Kong University of Science and Technology (Guangzhou), Tencent Hunyuan, Wuhan University, Iowa State University
  • Classification: cs.AI
  • Publication Date: January 1, 2025 (arXiv v4)
  • Paper Link: https://arxiv.org/abs/2412.01572
  • Code Link: https://github.com/FUTUREEEEEE/MBA

Abstract

Retrieval-Augmented Generation (RAG) significantly enhances the generation performance of language models in knowledge-intensive tasks. However, existing RAG frameworks either indiscriminately perform retrieval or rely on rigid single-class classifiers to select retrieval methods, resulting in inefficiency and suboptimal performance across queries of varying complexity. To address these challenges, this paper proposes a reinforcement learning-based framework that dynamically selects the most appropriate retrieval strategy based on query complexity. The method leverages multi-armed bandit algorithms, treating each retrieval method as a different "arm," and balances exploration and exploitation to adapt the selection process. Additionally, a dynamic reward function is introduced that balances accuracy and efficiency, penalizing methods requiring more retrieval steps even when obtaining correct results. The method achieves new state-of-the-art results on multiple single-hop and multi-hop datasets while reducing retrieval costs.

Research Background and Motivation

Problem Definition

Existing RAG systems face the following core issues:

  1. Inappropriate Retrieval Strategy Selection: Most RAG frameworks perform retrieval indiscriminately across all queries, potentially introducing unnecessary or off-topic passages
  2. Single Method Limitations: Using a single retrieval method for all queries is inefficient—simple queries incur unnecessary computational overhead, while complex queries may not receive sufficient processing
  3. Inaccurate Supervision Signals: Existing adaptive methods such as AdaptiveRAG use heuristic supervision, assuming each query has a single optimal strategy and tending to select the path with minimal retrieval cost

Research Motivation

The core motivation of this paper is to develop a framework that can:

  1. Dynamically Adapt to Query Complexity: Intelligently select retrieval strategies based on problem complexity
  2. Balance Accuracy and Efficiency: Minimize computational costs while ensuring answer quality
  3. Support Multi-Strategy Exploration: Allow multiple strategies to potentially produce correct answers rather than forcing selection of a single "optimal" path

Core Contributions

  1. Proposes MBA-RAG Framework: First application of multi-armed bandit algorithms to RAG system retrieval strategy selection, enabling dynamic adaptive retrieval
  2. Designs Dynamic Reward Function: Innovatively combines accuracy and computational efficiency by penalizing high-cost methods to optimize resource utilization
  3. Achieves State-of-the-Art Performance: Obtains best results on 6 datasets while reducing retrieval costs by 20%
  4. Provides Flexible Supervision Mechanism: Uses partial information supervision instead of strict single-label supervision, allowing the model to explore multiple effective strategies

Methodology Details

Task Definition

Given a query x, the RAG system needs to:

  1. Retrieval Phase: Module R retrieves relevant documents D for query x
  2. Generation Phase: LLM generates response ā = LLM(yt|x,D) using x and D

This paper redefines this as a multi-armed bandit problem, where each retrieval method (no retrieval, single retrieval, multiple retrievals) serves as an "arm."

Model Architecture

1. Query Encoding and Arm Selection

  • Encoder: Uses DistilBERT to encode user queries, generating action distribution z = fθ(x)
  • Selection Strategy: Employs ε-greedy strategy to balance exploration and exploitation:
    • Select a = argmax(z) with probability (1-ε)
    • Randomly select generation method with probability ε

2. Learning Algorithm

The objective function minimizes squared error between actual reward ra and predicted reward fθ(x)a:

min_θ (ra - fθ(x)a)²

Parameter update rule:

θt+1 = θt - α∇θ((ra - fθ(x)a)²)

3. Dynamic Reward Function

ra = A(y, ŷa) - λC(a)

Where:

  • A(y, ŷa): Generation quality metric (e.g., exact match)
  • C(a): Computational cost of method a (e.g., number of retrieval steps)
  • λ: Scaling factor balancing accuracy and efficiency

Technical Innovations

  1. Multi-Armed Bandit Adaptation: Models retrieval strategy selection as a multi-armed bandit problem, with each retrieval method corresponding to an "arm"
  2. Partial Information Supervision: Provides feedback only for selected strategies without penalizing unselected ones
  3. Cost-Aware Rewards: Dynamic reward function simultaneously considers accuracy and computational efficiency
  4. Exploration-Exploitation Balance: Uses ε-greedy strategy to avoid premature convergence to suboptimal solutions

Experimental Setup

Datasets

Single-Hop QA Datasets:

  • SQuAD v1.1: Reading comprehension task
  • Natural Questions: Open-domain question answering
  • TriviaQA: Knowledge-based question answering

Multi-Hop QA Datasets:

  • MuSiQue: Multi-step reasoning question answering
  • HotpotQA: Multi-hop reasoning question answering
  • 2WikiMultiHopQA: Wikipedia-based multi-hop question answering

Evaluation Metrics

Performance Metrics:

  • EM (Exact Match): Predicted result exactly matches ground truth
  • F1: Lexical overlap between predicted and ground truth answers
  • Acc (Accuracy): Whether predicted answer contains ground truth

Efficiency Metrics:

  • Step: Number of retrieval steps required by selected strategy

Baseline Methods

  1. No-Retrieval: Direct answer generation without retrieval
  2. Adaptive-Retrieval: Dynamically determines whether retrieval is needed
  3. Self-RAG: Dynamically decides retrieval needs through self-reflection
  4. DRAGIN: Activates retrieval based on token uncertainty
  5. SEAKR: Determines retrieval based on self-aware uncertainty
  6. Adaptive-RAG: Uses classifier to select retrieval strategies based on query complexity

Implementation Details

  • Query Encoding Model: DistilBERT
  • Retrieval Model: BM25
  • Generation Model: FLAN-T5-XL (3B)
  • Learning Rate: 5e-5
  • Exploration Strategy: ε-greedy algorithm

Experimental Results

Main Results

MethodEMF1AccStep
No Retrieval14.8721.1215.970.00
Adaptive Retrieval23.8732.2426.730.50
Self-RAG9.9020.7931.570.72
Adaptive-RAG37.1746.9442.102.17
MBA-RAG (Ours)38.8048.6143.571.80

Key Findings

  1. Performance Improvement: MBA-RAG surpasses baseline methods on all performance metrics
  2. Efficiency Optimization: Compared to Adaptive-RAG, retrieval steps reduced by approximately 17% (from 2.17 to 1.80)
  3. Single-Hop Dataset Performance: Achieves significant improvements on SQuAD and TriviaQA with substantially reduced retrieval costs
  4. Multi-Hop Dataset Performance: Achieves outstanding improvements on 2WikiMultiHopQA with retrieval cost reduction exceeding 20%

Classification Accuracy Analysis

MBA-RAG achieves classification accuracy of 56.1%, significantly higher than:

  • Adaptive Retrieval: 42.0%
  • Self-RAG: 41.5%
  • Adaptive-RAG: 54.0%

Ablation Study

Comparison with multi-label classifier results shows that while traditional multi-label methods achieve good performance, retrieval costs are excessive (Step reaches 4.514), whereas MBA-RAG achieves optimal balance between performance and efficiency.

RAG System Development

  1. Traditional RAG: Retrieval-generation framework proposed by Lewis et al. (2020)
  2. Adaptive Retrieval: Methods such as SEAKR and FLARE implement on-demand retrieval
  3. Complexity-Aware Approaches: AdaptiveRAG selects strategies based on query complexity

Multi-Armed Bandit Applications

This paper is the first to apply multi-armed bandit algorithms to RAG systems, providing a new theoretical framework for retrieval strategy selection.

Conclusions and Discussion

Main Conclusions

  1. Effectiveness Validation: MBA-RAG achieves state-of-the-art performance on multiple datasets
  2. Efficiency Improvement: Significantly reduces retrieval costs with average reduction of 20%
  3. Strong Adaptability: Dynamically adjusts strategies based on query complexity

Limitations

  1. Algorithm Dependency: Framework relies on specific multi-armed bandit algorithm structure
  2. Scalability Challenges: May face adaptability issues when encountering new unseen query types
  3. Computational Requirements: Reinforcement learning approach may introduce additional computational overhead

Future Directions

  1. Algorithm Optimization: Explore more efficient algorithms to reduce computational requirements
  2. Generalization Capability: Improve adaptability to new query types
  3. Application Extension: Apply method to broader range of NLP tasks

In-Depth Evaluation

Strengths

  1. Strong Innovation: First to introduce multi-armed bandits to RAG systems with solid theoretical foundation
  2. High Practical Value: Simultaneously optimizes accuracy and efficiency with important application value
  3. Comprehensive Experiments: Thorough evaluation across 6 different types of datasets
  4. Reasonable Methodology: Cleverly designed dynamic reward function balancing multiple objectives

Weaknesses

  1. Increased Complexity: Introduces additional algorithmic complexity compared to simple classification methods
  2. Parameter Sensitivity: Balance parameter λ in reward function requires adjustment for different datasets
  3. Insufficient Theoretical Analysis: Lacks theoretical guarantees for convergence and optimality

Impact

  1. Academic Contribution: Provides new research direction for RAG system optimization
  2. Practical Application: Method demonstrates strong practicality applicable to real systems
  3. Reproducibility: Provides complete code implementation facilitating reproduction and extension

Applicable Scenarios

  1. Knowledge-Intensive Question Answering: Particularly suitable for scenarios requiring balance between accuracy and efficiency
  2. Multi-Complexity Query Processing: Handles queries ranging from simple to complex
  3. Resource-Constrained Environments: Optimizes retrieval costs when computational resources are limited

References

  1. Lewis, P., et al. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. NeurIPS.
  2. Jeong, S., et al. (2024). Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. arXiv preprint.
  3. Katehakis, M. N., & Veinott Jr, A. F. (1987). The multi-armed bandit problem: decomposition and computation. Mathematics of Operations Research.

Overall Assessment: This paper proposes an innovative and practical RAG optimization framework that achieves dynamic retrieval strategy selection through multi-armed bandit algorithms, significantly reducing computational costs while maintaining high accuracy. The method has solid theoretical foundations, convincing experimental results, and provides valuable insights for further development of RAG systems.