2025-11-11T07:07:11.632178

HiRA: A Hierarchical Reasoning Framework for Decoupled Planning and Execution in Deep Search

Jin, Li, Dong et al.

Complex information needs in real-world search scenarios demand deep reasoning and knowledge synthesis across diverse sources, which traditional retrieval-augmented generation (RAG) pipelines struggle to address effectively. Current reasoning-based approaches suffer from a fundamental limitation: they use a single model to handle both high-level planning and detailed execution, leading to inefficient reasoning and limited scalability. In this paper, we introduce HiRA, a hierarchical framework that separates strategic planning from specialized execution. Our approach decomposes complex search tasks into focused subtasks, assigns each subtask to domain-specific agents equipped with external tools and reasoning capabilities, and coordinates the results through a structured integration mechanism. This separation prevents execution details from disrupting high-level reasoning while enabling the system to leverage specialized expertise for different types of information processing. Experiments on four complex, cross-modal deep search benchmarks demonstrate that HiRA significantly outperforms state-of-the-art RAG and agent-based systems. Our results show improvements in both answer quality and system efficiency, highlighting the effectiveness of decoupled planning and execution for multi-step information seeking tasks. Our code is available at https://github.com/ignorejjj/HiRA.

academic

HiRA: A Hierarchical Reasoning Framework for Decoupled Planning and Execution in Deep Search

Basic Information

Paper ID: 2507.02652
Title: HiRA: A Hierarchical Reasoning Framework for Decoupled Planning and Execution in Deep Search
Authors: Jiajie Jin, Xiaoxi Li, Yuyao Zhang, Guanting Dong, Yutao Zhu, Zhao Yang, Hongjin Qian, Zhicheng Dou
Classification: cs.AI cs.CL cs.IR
Publication Date/Conference: 2025 (Submitted to AAAI 2026)
Paper Link: https://arxiv.org/abs/2507.02652

Abstract

Complex information needs in real-world search scenarios require deep reasoning and knowledge synthesis across multiple sources, which traditional retrieval-augmented generation (RAG) pipelines struggle to address effectively. Current reasoning-based methods suffer from a fundamental limitation: they employ a single model to simultaneously handle high-level planning and detailed execution, resulting in inefficient reasoning and limited scalability. This paper proposes HiRA, a hierarchical framework that decouples strategic planning from specialized execution. The approach decomposes complex search tasks into focused subtasks, assigns each subtask to domain-specific agents equipped with external tools and reasoning capabilities, and coordinates results through structured integration mechanisms. This decoupling prevents execution details from interfering with high-level reasoning while enabling the system to leverage specialized expertise for different types of information processing. Experiments on four complex cross-modal deep search benchmarks demonstrate that HiRA significantly outperforms state-of-the-art RAG and agent-based systems.

Research Background and Motivation

Problem Definition

Traditional search engines return ranked web pages based solely on keyword matching, requiring users to manually filter and collect information. While large language models (LLMs) equipped with web search can provide direct answers, they typically only leverage direct information from search results, lacking deep reasoning and comprehensive synthesis capabilities.

Problem Significance

With the explosion of internet information, finding answers to complex queries has become increasingly difficult, driving rapid development of deep search tasks that require understanding complex information needs and synthesizing accurate answers from multiple sources.

Limitations of Existing Methods

Monolithic Architecture Constraints: Existing methods rely on a single reasoning model to handle all tasks, triggering tool activation through special tokens generated by the reasoning model
Limited Capability Scalability: Adding new tools or capabilities requires careful prompt redesign to teach the model how to use new token patterns
Reasoning Interference: External execution results are directly injected into the main reasoning chain, introducing noise that disrupts the core reasoning process

Research Motivation

The authors argue that effective agent execution should follow a hierarchical structure: including a meta-agent for high-level planning, a coordinator for task reasoning transfer, and specialized execution agents for specific operations.

Core Contributions

Hierarchical Reasoning Architecture: Proposes a novel hierarchical reasoning framework that integrates specialized tool-augmented reasoning agents as modules, eliminating the need for external tool orchestration or rigid predefined pipelines in existing methods
Enhanced Capability Integration: Domain-specialized executors support plug-and-play integration of diverse reasoning capabilities and tools. Existing search agents can be directly integrated without prompt engineering or model retraining
Superior Empirical Performance: Experiments on four complex cross-modal search tasks demonstrate significant improvements compared to traditional RAG and current agent-based methods

Methodology Details

Task Definition

Given a complex question q requiring information search and a predefined external environment E, the goal is to design a framework that generates a final solution containing an answer A and corresponding reasoning process R. The generation process is formulated as:

$P(R, a | q, E) = \prod_{t=1}^{T_R} P(R_t | R_{<t}, q, E_{<t}) \cdot P(a | q, R)$

where $T_R$ denotes the token generation steps in the reasoning process, and $E_{<t} = \{E(R_{<s})\}_{s<t}$ represents the set of all environment interaction results before timestep t.

Model Architecture

The HiRA framework comprises three core modules:

1. Meta Reasoning Planner

Responsible for planning, reasoning, and answer generation
Decomposes tasks into high-level subtasks containing strategic instructions for expert agents
Uses special tokens for dynamic subtask generation:

$P_M(s_k) = P_M(s_k | q, O_{<t}, \{E(s_j)\}_{j<k})$

2. Adaptive Reasoning Coordinator

Contains three core functions:

Reasoning Transfer Process: $A^*_k = \arg\max_{A \in E} P_C(O^{(k)}_{dele}, A | s_k, I_E, I_{select})$

Reasoning Distillation Process: $P_C(O^{(k)}_{dist}, R^{(k)}_{dist} | s_k, O^{(k)}_{expert}) = P_C(O^{(k)}_{dist} | O^{(k)}_{expert}, \cdot) \cdot P_C(R^{(k)}_{dist} | O^{(k)}_{dist}, O^{(k)}_{expert}, \cdot)$

Dual-Channel Memory Mechanism: Includes factual memory $M_f$ and resource memory $M_r$

3. Domain-Specialized Executors

Designed based on three orthogonal agent capability dimensions:

Information Acquisition: Responsible for retrieving and integrating information from the web
Cross-Modal Understanding: Handles understanding and fusion of multi-modal information
Computational Reasoning: Handles computational reasoning tasks such as mathematical calculations and file processing

Technical Innovations

Decoupled Design: Separates high-level strategic planning from low-level execution details, preventing execution noise from interfering with the planning process
Dynamic Task Assignment: Intelligently selects the most suitable expert agent based on task complexity and required capabilities
Bidirectional Reasoning Transfer: Supports reasoning delegation from the meta-agent to expert agents, as well as reverse reasoning distillation
Modular Extensibility: New expert agents can be seamlessly integrated without redesigning the entire system

Experimental Setup

Datasets

GAIA: Covers multi-step reasoning and retrieval, using all validation samples (text, multi-modal, file-based)
WebWalkerQA: Tests web navigation and extraction in English and Chinese, sampling 200 questions
SimpleQA: Evaluates factual and broad knowledge, sampling 200 questions
Humanity's Last Exam: High-difficulty benchmark requiring complex reasoning and external retrieval, using 500 validation samples

Evaluation Metrics

Accuracy is computed using Qwen2.5-72B-Instruct as the LLM judge

Baseline Methods

Direct Reasoning: Using model's native reasoning capabilities (Qwen3-32B, QwQ-32B, DeepSeek-R1-32B, GPT-4o, etc.)
Single-Capability Enhancement: Using single specialized tool-augmented reasoning (Search-o1, WebThinker, CodeAct, etc.)
Multi-Capability Reasoning: Integrating multiple tools or structured workflows (Plan-and-Solve, ReAct)

Implementation Details

Base Model: QwQ-32B
Coordinator: Qwen2.5-Instruct
Temperature: 0.7, top_p: 0.95, top_k: 20
Context Window: 128k tokens
Maximum Subtasks: 10

Experimental Results

Main Results

Method Category	GAIA Avg	WebWalkerQA Avg	HLE Avg	SimpleQA
Direct Reasoning (Best)	25.2	10.0	11.1	42.7
Single-Capability Enhancement (WebThinker)	36.2	52.5	13.0	78.0
Multi-Capability Enhancement (ReAct)	30.7	35.0	13.8	73.5
HiRA (This Work)	42.5	54.5	14.2	81.5

Key Findings

Overall Performance Advantage: HiRA outperforms baseline methods on all tasks
Pronounced Advantage on Complex Tasks: More significant improvements on complex tasks (GAIA, HLE)
Hierarchical Design Benefits: Achieves better performance compared to methods using the same tool set through hierarchical design

Ablation Study

Component	GAIA-B	GAIA-F	WebWalker	HLE	SimpleQA
Complete HiRA	42.5	42.1	54.5	14.2	81.5
Without Reasoning Transfer	33.9	36.8	44.5	10.4	76.5
Without Memory Mechanism	37.8	31.6	52.0	11.8	79.0
Without Search Agent	15.7	31.6	4.0	12.4	9.5
Without Code Agent	33.9	28.9	51.5	12.8	76.5

Efficiency Analysis

Reasoning Length: HiRA's reasoning chain is shorter than WebThinker, indicating more efficient subtask invocation
Interaction Count: HiRA requires fewer environment interactions compared to methods that directly integrate tools
Computational Overhead: The hierarchical structure achieves more targeted tool usage

Evolution from Retrieval-Augmented Generation to Deep Search

Development from single-step retrieval to iterative pipelines with query decomposition, document refinement, and multi-round search. However, RAG methods rely on predefined workflows, limiting adaptive decision-making.

Planning-Execution Separation Methods

Action-Level Separation: Assigns executors for single-step tasks (Plan-Act, CoAct)
Query-Level Separation: Decomposes problems at higher granularity (REMA, LLMCompiler)

This work addresses limitations of these methods through dynamic reasoning delegation and domain-specialized agents within a hierarchical framework.

Conclusion and Discussion

Main Conclusions

HiRA effectively addresses limitations of monolithic models in deep search tasks by separating strategic planning from specialized execution. The multi-agent architecture supports scalable and modular reasoning.

Limitations

Computational Overhead: The multi-agent architecture may increase computational costs
Coordination Complexity: Coordination mechanisms between agents require careful design
Error Propagation: Errors in subtask execution may impact overall performance

Future Directions

Further optimize coordination mechanisms between agents
Explore additional domain-specialized executors
Investigate dynamic agent selection strategies

In-Depth Evaluation

Strengths

Innovative Architecture Design: The hierarchical decoupled design has both theoretical and practical value
Comprehensive Experimental Validation: Systematic evaluation on multiple complex benchmarks
Strong Practicality: The framework supports plug-and-play integration of existing agents
Thorough Analysis: Provides detailed ablation studies and efficiency analysis

Weaknesses

Baseline Selection: Some baseline methods may not represent the latest SOTA
Evaluation Limitations: Primarily uses LLM-as-Judge, which may introduce evaluation bias
Scalability Verification: Lacks validation on larger scales or more domains

Impact

Academic Contribution: Provides a new design paradigm for multi-agent reasoning systems
Practical Value: Can be directly applied to complex information retrieval scenarios
Reproducibility: Provides detailed implementation details and code

Applicable Scenarios

Complex question-answering systems requiring multi-step reasoning
Cross-modal information retrieval and synthesis
Research and analysis tasks requiring specialized tool support
Enterprise-level knowledge management and decision support systems

References

The paper cites multiple important works, including foundational RAG work (Lewis et al. 2020), recent reasoning models (OpenAI o1, DeepSeek-R1), and related research on multi-agent systems. These citations reflect the authors' deep understanding of the field's development trajectory.

Overall Assessment: This is a high-quality research paper that proposes an innovative hierarchical reasoning framework with solid theoretical design and experimental validation. This work has significant value for the development of multi-agent reasoning systems, particularly with broad application prospects in the complex information retrieval domain.