2025-11-14T03:13:11.609221

Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning

Zhao, Yu, Xu

Reasoning-augmented search agents, such as Search-R1, are trained to reason, search, and generate the final answer iteratively. Nevertheless, due to their limited capabilities in reasoning and search, their performance on multi-hop QA benchmarks remains far from satisfactory. To handle complex or compound queries, we train an LLM-based search agent with the native capability of query expansion through reinforcement learning. In each turn, our search agent proposes several query variants, which are searched simultaneously to cover more relevant information. Meanwhile, given limited post-training data and computing resources, it is very challenging for a search agent to master multiple tasks, including query generation, retrieved information understanding, and answer generation. Therefore, we propose incorporating a pre-trained squeezer model that helps the search agent understand the retrieved documents, allowing the search agent to focus on query generation for high retrieval recall. With the assistance of the squeezer model, we discover that even a small-scale 3B LLM can demonstrate a strong capability of query expansion and achieve state-of-the-art accuracy on the multi-hop QA benchmarks. To be specific, our experiments across seven question-answering benchmarks demonstrate that our method, named ExpandSearch, achieves an average improvement of 4.4% compared to state-of-the-art baselines, with strong gains on multi-hop reasoning tasks requiring diverse evidence aggregation.

academic

Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning

基本信息

论文ID: 2510.10009
标题: Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning
作者: Shu Zhao (NVIDIA & Pennsylvania State University), Tan Yu (NVIDIA), Anbang Xu (NVIDIA)
分类: cs.CL cs.AI cs.IR
发表时间: 2025-10-14 (arXiv预印本)
论文链接: https://arxiv.org/abs/2510.10009

摘要

推理增强搜索代理（如Search-R1）被训练来迭代地推理、搜索和生成最终答案。然而，由于其在推理和搜索方面的能力有限，它们在多跳问答基准测试上的表现仍然不尽人意。为了处理复杂或复合查询，作者通过强化学习训练了一个具有原生查询扩展能力的基于LLM的搜索代理。在每一轮中，搜索代理提出多个查询变体，同时搜索以覆盖更多相关信息。同时，考虑到有限的后训练数据和计算资源，搜索代理很难掌握多项任务，包括查询生成、检索信息理解和答案生成。因此，作者提出结合一个预训练的压缩器模型来帮助搜索代理理解检索到的文档，使搜索代理能够专注于查询生成以获得高检索召回率。在压缩器模型的帮助下，作者发现即使是小规模的3B LLM也能展现出强大的查询扩展能力，并在多跳问答基准测试上取得最先进的准确率。具体来说，在七个问答基准测试中的实验表明，该方法ExpandSearch相比最先进的基线平均提升了4.4%，在需要多样化证据聚合的多跳推理任务上取得了显著提升。