The generate-filter-refine (iterative paradigm) based on large language models (LLMs) has achieved progress in reasoning, programming, and program discovery in AI+Science. However, the effectiveness of search depends on where to search, namely, how to encode the domain prior into an operationally structured hypothesis space. To this end, this paper proposes a compact formal theory that describes and measures LLM-assisted iterative search guided by domain priors. We represent an agent as a fuzzy relation operator on inputs and outputs to capture feasible transitions; the agent is thereby constrained by a fixed safety envelope. To describe multi-step reasoning/search, we weight all reachable paths by a single continuation parameter and sum them to obtain a coverage generating function; this induces a measure of reachability difficulty; and it provides a geometric interpretation of search on the graph induced by the safety envelope. We further provide the simplest testable inferences and validate them via a majority-vote instantiation. This theory offers a workable language and operational tools to measure agents and their search spaces, proposing a systematic formal description of iterative search constructed by LLMs.
- Paper ID: 2510.14846
- Title: Where to Search: Measure the Prior-Structured Search Space of LLM Agents
- Author: Zhuo-Yang Song
- Classification: cs.AI cs.CL cs.LO
- Publication Date: October 16, 2025 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2510.14846
The generate-filter-refine iterative paradigm based on Large Language Models (LLMs) has made progress in reasoning, programming, and program discovery in AI+Science. However, the effectiveness of search depends on where to search, i.e., how to encode domain priors into actionable structured hypothesis spaces. To this end, this paper proposes a compact formal theory to describe and measure LLM-assisted iterative search guided by domain priors. The authors represent agents as fuzzy relational operators on inputs and outputs to capture feasible transformations; agents are thus constrained by fixed safety envelopes. To describe multi-step reasoning/search, the authors weight and sum all reachable paths by a single continuation parameter, yielding a covering generating function; this induces a metric of reachability difficulty; and provides a geometric interpretation of search on graphs induced by safety envelopes.
The core problem this research addresses is: How to systematically measure and describe the search space of LLM agents. Specifically, in LLM-based iterative search processes, search efficiency is fundamentally limited by the question of "where to search," i.e., how to encode domain priors into operationalizable spaces for agents.
- Long-horizon task requirements: Long-horizon tasks impose higher demands on safety and controllability, requiring operation within verifiable and controllable boundaries
- Complexity challenges: Long-horizon problems often involve combinatorial explosions and sparse rewards; pure heuristics or 0/1 scoring are insufficient to quantify reachability difficulty
- Theoretical gap: Current practice relies primarily on engineering heuristics (prompt design, filters, scoring functions, etc.), lacking a unified language and quantitative tools
- Lack of unified agent-space-search measurement language
- Difficulty in comparably measuring reachability-safety tradeoffs across different agents
- Absence of clear characterization and explanation of agent long-horizon behavioral features
Establish a concise, computable, model-agnostic formal theory that unifies safety and reachability measurement, provides testable predictions, and offers engineering-applicable design principles.
- Proposes a compact formal theory: Formalizes agents as fuzzy relational operators and describes iterative search processes through covering generating functions
- Establishes a unified measurement framework: Introduces continuation parameters and covering indices, providing unified quantification of safety and reachability
- Provides geometric interpretation: Defines geometric quantities on directed graphs induced by safety envelopes, offering geometric interpretation of search processes
- Validates theoretical predictions: Verifies testable conclusions through majority voting instantiation, providing external validation
- Input space: C1 (agent input space)
- Output space: C2 (agent output space, satisfying C2⊆C1 to support iteration)
- Objective: Measure and describe iterative search processes under safety constraints
Ideal agent defined as a fuzzy relational operator:
T(f,g):=μf(g),μf:C2→[0,1]
Crisp ideal agent (safety envelope):
μf(g)∈{0,1},0≤T(f,g)≤T0(f,g)
Introducing continuation parameter p∈[0,1], define the covering generating function from f to g:
Pf,g(p):=∑n=0∞∑ST:f(0)=f,f(n)=gpn∏i=0n−1μf(i)(f(i+1))
When C1,C2 are countable, it can be expressed in matrix form:
P(p)=∑n≥0pnMn=(I−pM)−1
- Shortest distance: d0(f,g):=inf{n∈N:Nn(f,g)≥1}
- Number of shortest paths: Nd0(f,g)
- Critical parameter: pc(f,g):=inf{p∈[0,1]:Pf,gideal(p)≥1}
- Covering index: Rc(f,g):=1−pc(f,g)
Through fuzzy relational operators, agents are uniformly represented, enabling safety and reachability to be measured using identical mathematical notation and geometric quantities.
Introducing a single continuation parameter p to weight trajectory lengths, avoiding the complexity of probabilistic interpretation while providing computable measurement methods.
Defining search geometry on directed graphs induced by safety envelopes, transforming abstract search processes into concrete graph-theoretic problems.
Proposes two key hypotheses for iterative agents constructed for LLMs:
- Hypothesis 1: Approximately unidirectional search (closed-loop paths are sparse)
- Hypothesis 2: Low-order terms dominate (excessively long trajectories are relatively rare)
- Search space: Two-dimensional grid GN:={0,…,N−1}2
- Grid scales: N=3,5,8
- Target points: (1,2),(3,4),(6,7) respectively
- LLM model ensemble: gpt-4-mini, gpt-4, qwen3, qwen-plus, gemini-2.5-flash, deepseek-v3, grok-4, doubao
- Majority voting mechanism: For each position f, independently sample m=5 times and take the mode as decision
- Ideal agent: μf(t)(g):=n1∑Lμf(L,t)(g)
- Safety envelope: μf0,(t)(g):=1{μf(t)(g)>0}
- Shortest distance d0(f,t)
- Number of shortest paths Nd0(f,t)
- Verification inequality: logNd0(f,g)≪d0(f,g)
Experiments show that LLM-induced safety envelopes produce unidirectional and anisotropic reachable structures on 2D grids, with strictly decreasing Manhattan distance to the target, consistent with Hypothesis 1's finite-term premise.
Figure 2 shows the relationship between (d0,Nd0) across three grid scales:
- Data points lie below the theoretically predicted empirical upper bounds
- When d0 is larger, the inequality logNd0≪d0 fits better
- Supports empirical regularities in the small Rc limit
- Unidirectional graph structure: Observed graphs exhibit unidirectional features, supporting Hypothesis 1
- Finite path counting: Finite path counts are consistent with Hypothesis 2's setting
- Complexity dominance: Verifies that complexity (shortest distance) dominates while path diversity is limited
- Threshold behavior: Under small continuation parameters, search is in an insufficiently expanded state, with shortest path terms dominating Pf,g(p) behavior
- Geometric constraints: Semantic constraints from LLMs cause graphs to exhibit unidirectional structure, effectively limiting the search space
- Reachability patterns: Observed (d0,Nd0) relationships conform to theoretical prediction upper bound trends
- LLM reasoning paradigms: ReAct, Tree of Thoughts, Chain-of-Thought and other iterative reasoning methods
- Planning and tool use: Plan-and-Solve, Toolformer, Voyager and other agent frameworks
- AI+Science applications: LLM applications in program search, algorithm discovery, scientific computing and other domains
- Provides a unified theoretical framework, whereas existing methods are mostly empirical heuristics
- Establishes measurable safety-reachability tradeoff mechanisms
- Offers model-agnostic formal description
- Theoretical contribution: Establishes a compact formal theory of LLM-assisted iterative search
- Measurement tools: Provides operational tools for unified measurement of safety and reachability
- Geometric insights: Reveals geometric structures and constraint mechanisms of search processes
- Empirical validation: Verifies testable theoretical predictions through majority voting instantiation
- Experimental scale: Current validation is limited to small-scale 2D grids, requiring verification on larger scales and more complex tasks
- Model coverage: While multiple LLMs are used, broader model and task coverage is needed
- Theoretical completeness: Some theoretical predictions (such as direct estimation of Rc) have not been fully verified experimentally
- Detailed experimental validation: Test theoretical validity on more complex tasks
- Reinforcement learning connections: Link theoretical metrics to reinforcement learning rewards and training processes
- Practical applications: Apply measurement tools to agent design and training for complex tasks
- Strong theoretical innovation: First formal measurement theory of LLM agent search spaces
- Rigorous mathematical framework: Solid mathematical foundation based on fuzzy relational operators and generating functions
- High practical value: Provides operational measurement tools and design guidance principles
- Sufficient validation: Provides external validation of theory through concrete instantiation
- Limited experimental scale: Validation experiments are relatively simple, lacking tests on complex real-world tasks
- Hypothesis dependency: Theoretical predictions depend on the satisfaction of specific assumptions (unidirectionality, low-order dominance)
- Computational complexity: For large-scale problems, computation of generating functions may face complexity challenges
- Academic contribution: Provides new theoretical foundation and analytical tools for LLM agent research
- Practical value: Offers quantitative guidance for agent design in complex tasks
- Reproducibility: Provides detailed experimental setup and code with good reproducibility
- LLM agent design requiring safety constraints
- Performance analysis of long-horizon reasoning and planning tasks
- Structured analysis and optimization of complex search spaces
- Comparison and evaluation of multi-agent systems
The paper cites 32 related references covering important works in LLM reasoning, reinforcement learning, constrained optimization, fuzzy systems and other domains, providing a solid foundation for theoretical construction.