2025-11-11T07:07:11.632178

HiRA: A Hierarchical Reasoning Framework for Decoupled Planning and Execution in Deep Search

Jin, Li, Dong et al.
Complex information needs in real-world search scenarios demand deep reasoning and knowledge synthesis across diverse sources, which traditional retrieval-augmented generation (RAG) pipelines struggle to address effectively. Current reasoning-based approaches suffer from a fundamental limitation: they use a single model to handle both high-level planning and detailed execution, leading to inefficient reasoning and limited scalability. In this paper, we introduce HiRA, a hierarchical framework that separates strategic planning from specialized execution. Our approach decomposes complex search tasks into focused subtasks, assigns each subtask to domain-specific agents equipped with external tools and reasoning capabilities, and coordinates the results through a structured integration mechanism. This separation prevents execution details from disrupting high-level reasoning while enabling the system to leverage specialized expertise for different types of information processing. Experiments on four complex, cross-modal deep search benchmarks demonstrate that HiRA significantly outperforms state-of-the-art RAG and agent-based systems. Our results show improvements in both answer quality and system efficiency, highlighting the effectiveness of decoupled planning and execution for multi-step information seeking tasks. Our code is available at https://github.com/ignorejjj/HiRA.
academic

HiRA: A Hierarchical Reasoning Framework for Decoupled Planning and Execution in Deep Search

Basic Information

  • Paper ID: 2507.02652
  • Title: HiRA: A Hierarchical Reasoning Framework for Decoupled Planning and Execution in Deep Search
  • Authors: Jiajie Jin, Xiaoxi Li, Yuyao Zhang, Guanting Dong, Yutao Zhu, Zhao Yang, Hongjin Qian, Zhicheng Dou
  • Classification: cs.AI cs.CL cs.IR
  • Publication Date/Conference: 2025 (Submitted to AAAI 2026)
  • Paper Link: https://arxiv.org/abs/2507.02652

Abstract

Complex information needs in real-world search scenarios require deep reasoning and knowledge synthesis across multiple sources, which traditional retrieval-augmented generation (RAG) pipelines struggle to address effectively. Current reasoning-based methods suffer from a fundamental limitation: they employ a single model to simultaneously handle high-level planning and detailed execution, resulting in inefficient reasoning and limited scalability. This paper proposes HiRA, a hierarchical framework that decouples strategic planning from specialized execution. The approach decomposes complex search tasks into focused subtasks, assigns each subtask to domain-specific agents equipped with external tools and reasoning capabilities, and coordinates results through structured integration mechanisms. This decoupling prevents execution details from interfering with high-level reasoning while enabling the system to leverage specialized expertise for different types of information processing. Experiments on four complex cross-modal deep search benchmarks demonstrate that HiRA significantly outperforms state-of-the-art RAG and agent-based systems.

Research Background and Motivation

Problem Definition

Traditional search engines return ranked web pages based solely on keyword matching, requiring users to manually filter and collect information. While large language models (LLMs) equipped with web search can provide direct answers, they typically only leverage direct information from search results, lacking deep reasoning and comprehensive synthesis capabilities.

Problem Significance

With the explosion of internet information, finding answers to complex queries has become increasingly difficult, driving rapid development of deep search tasks that require understanding complex information needs and synthesizing accurate answers from multiple sources.

Limitations of Existing Methods

  1. Monolithic Architecture Constraints: Existing methods rely on a single reasoning model to handle all tasks, triggering tool activation through special tokens generated by the reasoning model
  2. Limited Capability Scalability: Adding new tools or capabilities requires careful prompt redesign to teach the model how to use new token patterns
  3. Reasoning Interference: External execution results are directly injected into the main reasoning chain, introducing noise that disrupts the core reasoning process

Research Motivation

The authors argue that effective agent execution should follow a hierarchical structure: including a meta-agent for high-level planning, a coordinator for task reasoning transfer, and specialized execution agents for specific operations.

Core Contributions

  1. Hierarchical Reasoning Architecture: Proposes a novel hierarchical reasoning framework that integrates specialized tool-augmented reasoning agents as modules, eliminating the need for external tool orchestration or rigid predefined pipelines in existing methods
  2. Enhanced Capability Integration: Domain-specialized executors support plug-and-play integration of diverse reasoning capabilities and tools. Existing search agents can be directly integrated without prompt engineering or model retraining
  3. Superior Empirical Performance: Experiments on four complex cross-modal search tasks demonstrate significant improvements compared to traditional RAG and current agent-based methods

Methodology Details

Task Definition

Given a complex question q requiring information search and a predefined external environment E, the goal is to design a framework that generates a final solution containing an answer A and corresponding reasoning process R. The generation process is formulated as:

P(R,aq,E)=t=1TRP(RtR<t,q,E<t)P(aq,R)P(R, a | q, E) = \prod_{t=1}^{T_R} P(R_t | R_{<t}, q, E_{<t}) \cdot P(a | q, R)

where TRT_R denotes the token generation steps in the reasoning process, and E<t={E(R<s)}s<tE_{<t} = \{E(R_{<s})\}_{s<t} represents the set of all environment interaction results before timestep t.

Model Architecture

The HiRA framework comprises three core modules:

1. Meta Reasoning Planner

  • Responsible for planning, reasoning, and answer generation
  • Decomposes tasks into high-level subtasks containing strategic instructions for expert agents
  • Uses special tokens for dynamic subtask generation:

PM(sk)=PM(skq,O<t,{E(sj)}j<k)P_M(s_k) = P_M(s_k | q, O_{<t}, \{E(s_j)\}_{j<k})

2. Adaptive Reasoning Coordinator

Contains three core functions:

Reasoning Transfer Process: Ak=argmaxAEPC(Odele(k),Ask,IE,Iselect)A^*_k = \arg\max_{A \in E} P_C(O^{(k)}_{dele}, A | s_k, I_E, I_{select})

Reasoning Distillation Process: PC(Odist(k),Rdist(k)sk,Oexpert(k))=PC(Odist(k)Oexpert(k),)PC(Rdist(k)Odist(k),Oexpert(k),)P_C(O^{(k)}_{dist}, R^{(k)}_{dist} | s_k, O^{(k)}_{expert}) = P_C(O^{(k)}_{dist} | O^{(k)}_{expert}, \cdot) \cdot P_C(R^{(k)}_{dist} | O^{(k)}_{dist}, O^{(k)}_{expert}, \cdot)

Dual-Channel Memory Mechanism: Includes factual memory MfM_f and resource memory MrM_r

3. Domain-Specialized Executors

Designed based on three orthogonal agent capability dimensions:

  • Information Acquisition: Responsible for retrieving and integrating information from the web
  • Cross-Modal Understanding: Handles understanding and fusion of multi-modal information
  • Computational Reasoning: Handles computational reasoning tasks such as mathematical calculations and file processing

Technical Innovations

  1. Decoupled Design: Separates high-level strategic planning from low-level execution details, preventing execution noise from interfering with the planning process
  2. Dynamic Task Assignment: Intelligently selects the most suitable expert agent based on task complexity and required capabilities
  3. Bidirectional Reasoning Transfer: Supports reasoning delegation from the meta-agent to expert agents, as well as reverse reasoning distillation
  4. Modular Extensibility: New expert agents can be seamlessly integrated without redesigning the entire system

Experimental Setup

Datasets

  1. GAIA: Covers multi-step reasoning and retrieval, using all validation samples (text, multi-modal, file-based)
  2. WebWalkerQA: Tests web navigation and extraction in English and Chinese, sampling 200 questions
  3. SimpleQA: Evaluates factual and broad knowledge, sampling 200 questions
  4. Humanity's Last Exam: High-difficulty benchmark requiring complex reasoning and external retrieval, using 500 validation samples

Evaluation Metrics

Accuracy is computed using Qwen2.5-72B-Instruct as the LLM judge

Baseline Methods

  1. Direct Reasoning: Using model's native reasoning capabilities (Qwen3-32B, QwQ-32B, DeepSeek-R1-32B, GPT-4o, etc.)
  2. Single-Capability Enhancement: Using single specialized tool-augmented reasoning (Search-o1, WebThinker, CodeAct, etc.)
  3. Multi-Capability Reasoning: Integrating multiple tools or structured workflows (Plan-and-Solve, ReAct)

Implementation Details

  • Base Model: QwQ-32B
  • Coordinator: Qwen2.5-Instruct
  • Temperature: 0.7, top_p: 0.95, top_k: 20
  • Context Window: 128k tokens
  • Maximum Subtasks: 10

Experimental Results

Main Results

Method CategoryGAIA AvgWebWalkerQA AvgHLE AvgSimpleQA
Direct Reasoning (Best)25.210.011.142.7
Single-Capability Enhancement (WebThinker)36.252.513.078.0
Multi-Capability Enhancement (ReAct)30.735.013.873.5
HiRA (This Work)42.554.514.281.5

Key Findings

  1. Overall Performance Advantage: HiRA outperforms baseline methods on all tasks
  2. Pronounced Advantage on Complex Tasks: More significant improvements on complex tasks (GAIA, HLE)
  3. Hierarchical Design Benefits: Achieves better performance compared to methods using the same tool set through hierarchical design

Ablation Study

ComponentGAIA-BGAIA-FWebWalkerHLESimpleQA
Complete HiRA42.542.154.514.281.5
Without Reasoning Transfer33.936.844.510.476.5
Without Memory Mechanism37.831.652.011.879.0
Without Search Agent15.731.64.012.49.5
Without Code Agent33.928.951.512.876.5

Efficiency Analysis

  1. Reasoning Length: HiRA's reasoning chain is shorter than WebThinker, indicating more efficient subtask invocation
  2. Interaction Count: HiRA requires fewer environment interactions compared to methods that directly integrate tools
  3. Computational Overhead: The hierarchical structure achieves more targeted tool usage

Development from single-step retrieval to iterative pipelines with query decomposition, document refinement, and multi-round search. However, RAG methods rely on predefined workflows, limiting adaptive decision-making.

Planning-Execution Separation Methods

  • Action-Level Separation: Assigns executors for single-step tasks (Plan-Act, CoAct)
  • Query-Level Separation: Decomposes problems at higher granularity (REMA, LLMCompiler)

This work addresses limitations of these methods through dynamic reasoning delegation and domain-specialized agents within a hierarchical framework.

Conclusion and Discussion

Main Conclusions

HiRA effectively addresses limitations of monolithic models in deep search tasks by separating strategic planning from specialized execution. The multi-agent architecture supports scalable and modular reasoning.

Limitations

  1. Computational Overhead: The multi-agent architecture may increase computational costs
  2. Coordination Complexity: Coordination mechanisms between agents require careful design
  3. Error Propagation: Errors in subtask execution may impact overall performance

Future Directions

  1. Further optimize coordination mechanisms between agents
  2. Explore additional domain-specialized executors
  3. Investigate dynamic agent selection strategies

In-Depth Evaluation

Strengths

  1. Innovative Architecture Design: The hierarchical decoupled design has both theoretical and practical value
  2. Comprehensive Experimental Validation: Systematic evaluation on multiple complex benchmarks
  3. Strong Practicality: The framework supports plug-and-play integration of existing agents
  4. Thorough Analysis: Provides detailed ablation studies and efficiency analysis

Weaknesses

  1. Baseline Selection: Some baseline methods may not represent the latest SOTA
  2. Evaluation Limitations: Primarily uses LLM-as-Judge, which may introduce evaluation bias
  3. Scalability Verification: Lacks validation on larger scales or more domains

Impact

  1. Academic Contribution: Provides a new design paradigm for multi-agent reasoning systems
  2. Practical Value: Can be directly applied to complex information retrieval scenarios
  3. Reproducibility: Provides detailed implementation details and code

Applicable Scenarios

  1. Complex question-answering systems requiring multi-step reasoning
  2. Cross-modal information retrieval and synthesis
  3. Research and analysis tasks requiring specialized tool support
  4. Enterprise-level knowledge management and decision support systems

References

The paper cites multiple important works, including foundational RAG work (Lewis et al. 2020), recent reasoning models (OpenAI o1, DeepSeek-R1), and related research on multi-agent systems. These citations reflect the authors' deep understanding of the field's development trajectory.


Overall Assessment: This is a high-quality research paper that proposes an innovative hierarchical reasoning framework with solid theoretical design and experimental validation. This work has significant value for the development of multi-agent reasoning systems, particularly with broad application prospects in the complex information retrieval domain.