MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity
Tang, Gao, Li et al.
Retrieval Augmented Generation (RAG) has proven to be highly effective in boosting the generative performance of language model in knowledge-intensive tasks. However, existing RAG framework either indiscriminately perform retrieval or rely on rigid single-class classifiers to select retrieval methods, leading to inefficiencies and suboptimal performance across queries of varying complexity. To address these challenges, we propose a reinforcement learning-based framework that dynamically selects the most suitable retrieval strategy based on query complexity. % our solution Our approach leverages a multi-armed bandit algorithm, which treats each retrieval method as a distinct ``arm'' and adapts the selection process by balancing exploration and exploitation. Additionally, we introduce a dynamic reward function that balances accuracy and efficiency, penalizing methods that require more retrieval steps, even if they lead to a correct result. Our method achieves new state of the art results on multiple single-hop and multi-hop datasets while reducing retrieval costs. Our code are available at https://github.com/FUTUREEEEEE/MBA .
academic
MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity
Retrieval-Augmented Generation (RAG) significantly enhances the generation performance of language models in knowledge-intensive tasks. However, existing RAG frameworks either indiscriminately perform retrieval or rely on rigid single-class classifiers to select retrieval methods, resulting in inefficiency and suboptimal performance across queries of varying complexity. To address these challenges, this paper proposes a reinforcement learning-based framework that dynamically selects the most appropriate retrieval strategy based on query complexity. The method leverages multi-armed bandit algorithms, treating each retrieval method as a different "arm," and balances exploration and exploitation to adapt the selection process. Additionally, a dynamic reward function is introduced that balances accuracy and efficiency, penalizing methods requiring more retrieval steps even when obtaining correct results. The method achieves new state-of-the-art results on multiple single-hop and multi-hop datasets while reducing retrieval costs.
Existing RAG systems face the following core issues:
Inappropriate Retrieval Strategy Selection: Most RAG frameworks perform retrieval indiscriminately across all queries, potentially introducing unnecessary or off-topic passages
Single Method Limitations: Using a single retrieval method for all queries is inefficient—simple queries incur unnecessary computational overhead, while complex queries may not receive sufficient processing
Inaccurate Supervision Signals: Existing adaptive methods such as AdaptiveRAG use heuristic supervision, assuming each query has a single optimal strategy and tending to select the path with minimal retrieval cost
The core motivation of this paper is to develop a framework that can:
Dynamically Adapt to Query Complexity: Intelligently select retrieval strategies based on problem complexity
Balance Accuracy and Efficiency: Minimize computational costs while ensuring answer quality
Support Multi-Strategy Exploration: Allow multiple strategies to potentially produce correct answers rather than forcing selection of a single "optimal" path
Proposes MBA-RAG Framework: First application of multi-armed bandit algorithms to RAG system retrieval strategy selection, enabling dynamic adaptive retrieval
Designs Dynamic Reward Function: Innovatively combines accuracy and computational efficiency by penalizing high-cost methods to optimize resource utilization
Achieves State-of-the-Art Performance: Obtains best results on 6 datasets while reducing retrieval costs by 20%
Provides Flexible Supervision Mechanism: Uses partial information supervision instead of strict single-label supervision, allowing the model to explore multiple effective strategies
Retrieval Phase: Module R retrieves relevant documents D for query x
Generation Phase: LLM generates response ā = LLM(yt|x,D) using x and D
This paper redefines this as a multi-armed bandit problem, where each retrieval method (no retrieval, single retrieval, multiple retrievals) serves as an "arm."
Multi-Armed Bandit Adaptation: Models retrieval strategy selection as a multi-armed bandit problem, with each retrieval method corresponding to an "arm"
Partial Information Supervision: Provides feedback only for selected strategies without penalizing unselected ones
Cost-Aware Rewards: Dynamic reward function simultaneously considers accuracy and computational efficiency
Exploration-Exploitation Balance: Uses ε-greedy strategy to avoid premature convergence to suboptimal solutions
Comparison with multi-label classifier results shows that while traditional multi-label methods achieve good performance, retrieval costs are excessive (Step reaches 4.514), whereas MBA-RAG achieves optimal balance between performance and efficiency.
Lewis, P., et al. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. NeurIPS.
Jeong, S., et al. (2024). Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. arXiv preprint.
Katehakis, M. N., & Veinott Jr, A. F. (1987). The multi-armed bandit problem: decomposition and computation. Mathematics of Operations Research.
Overall Assessment: This paper proposes an innovative and practical RAG optimization framework that achieves dynamic retrieval strategy selection through multi-armed bandit algorithms, significantly reducing computational costs while maintaining high accuracy. The method has solid theoretical foundations, convincing experimental results, and provides valuable insights for further development of RAG systems.