2025-11-19T01:19:13.619140

An approach for systematic decomposition of complex llm tasks

Zhou, Xu, Liu et al.
Large Language Models (LLMs) suffer from reliability issues on complex tasks, as existing decomposition methods are heuristic and rely on agent or manual decomposition. This work introduces a novel, systematic decomposition framework that we call Analysis of CONstraint-Induced Complexity (ACONIC), which models the task as a constraint problem and leveraging formal complexity measures to guide decomposition. On combinatorial (SATBench) and LLM database querying tasks (Spider), we find that by decomposing the tasks following the measure of complexity, agent can perform considerably better (10-40 percentage point).
academic

An Approach for Systematic Decomposition of Complex LLM Tasks

Basic Information

  • Paper ID: 2510.07772
  • Title: An Approach for Systematic Decomposition of Complex LLM Tasks
  • Authors: Tianle Zhou, Jiakai Xu, Guanhong Liu, Jiaxiang Liu, Haonan Wang, Eugene Wu (Columbia University)
  • Classification: cs.AI
  • Publication Date: October 13, 2025 (arXiv v2)
  • Paper Link: https://arxiv.org/abs/2510.07772v2

Abstract

Large Language Models (LLMs) suffer from reliability issues on complex tasks, and existing decomposition methods are heuristic-based, relying on agents or manual decomposition. This work introduces a novel systematic decomposition framework called Constraint-Induced Complexity Analysis (ACONIC), which models tasks as constraint satisfaction problems and leverages formal complexity metrics to guide decomposition. On combinatorial problems (SAT-Bench) and LLM database query tasks (Spider), decomposing tasks according to complexity metrics significantly improves agent performance (10-40 percentage points).

Research Background and Motivation

1. Problem Statement

Large Language Models often fail to produce correct results in a single forward pass when handling complex tasks requiring deep multi-step reasoning or combinatorial search, exhibiting reliability issues.

2. Problem Significance

With the widespread application of LLMs in various reasoning, programming, and problem-solving tasks, how to systematically decompose complex tasks to improve model performance has become a critical challenge. Existing methods lack principled complexity metrics and decomposition strategies.

3. Limitations of Existing Methods

  • Heuristic Decomposition: Existing methods such as Chain-of-Thought primarily rely on LLMs themselves for decomposition, lacking theoretical foundations
  • Manual Decomposition: Depends on domain experts to manually design workflows, lacking systematicity
  • Absence of Complexity Metrics: Unable to quantify task complexity, making it difficult to determine when and how to decompose

4. Research Motivation

Establish a formal task complexity framework that enables systematic decomposition strategies, provides the ability to study comparably difficult tasks, and guides when tool assistance is needed.

Core Contributions

  1. Proposes ACONIC Framework: The first formal complexity framework that systematically reduces LLM tasks to constraint satisfaction problems
  2. Establishes Complexity Metrics: Uses constraint graph size and treewidth as task complexity measures
  3. Systematic Decomposition Method: A decomposition strategy based on tree decomposition that minimizes subtask complexity while maintaining global satisfiability
  4. Empirical Validation: Verifies the difficulty boundaries defined by complexity metrics and decomposition effectiveness on SAT-Bench and Spider benchmarks
  5. Performance Improvements: Achieves 9-15% improvement on SAT-Bench and 30-40% improvement on Spider compared to Chain-of-Thought methods

Methodology Details

Task Definition

ACONIC defines LLM tasks as: given a context describing a set of constraints and a query that must reason over these constraints, reduce it to a formal constraint satisfaction problem, then decompose and construct a subtask workflow.

Model Architecture

1. Reduction to Planning Problems

Uses a state-based agent action framework to formalize tasks as Planning as Satisfiability (PaS) problems:

P = ⟨F, A, I, G⟩

Where:

  • F: Finite set of propositional fluents describing world facts
  • A: Finite set of actions
  • I, G: Initial and goal fluents
  • For action a: P(a) determines preconditions, A(a) determines fluents becoming true, D(a) determines fluents becoming false

2. Reduction to Constraint Satisfaction Problems

Reduces PaS problems to CSP instances by encoding:

  • Preconditions fp ∈ P(a)
  • Add effects fa ∈ A(a)
  • Delete effects fd ∈ D(a) As boolean dependency constraints between fluents and actions.

3. Tree Decomposition Strategy

Leverages Bodlaender's (1998) tree decomposition theory:

  • Finds tree decomposition D* with minimal maximum bag size (treewidth)
  • Treewidth characterizes intrinsic problem complexity
  • Local consistency ensures global consistency

Technical Innovations

  1. Formal Complexity Metrics: First application of treewidth from graph theory as a quantitative indicator of LLM task complexity
  2. Global Consistency Guarantee: Tree decomposition ensures that consistency on local subgraphs implies consistency of global CSP solutions
  3. Optimal Decomposition Strategy: Decomposition based on minimal treewidth minimizes local complexity
  4. Automatic Reduction Procedures: Develops automatic reduction procedures for specific benchmarks, reducing manual modeling

Experimental Setup

Datasets

1. SAT-Bench

  • Natural language story problems constructed from SAT problems
  • Contains CNF representations, natural language descriptions, and entity-to-SAT alignment mappings
  • Evaluates Claude 3.5-Sonnet (random sampling of half tasks) and Llama-3-70B (all tasks)

2. Spider

  • Popular NL2SQL benchmark dataset
  • Contains hundreds of databases, each with up to 37 tables, 90 foreign keys, and over 100 columns
  • Tasks include database schema S, natural language query q, and ground-truth SQL query q*

Evaluation Metrics

  • SAT-Bench: Task completion rate (success/failure)
  • Spider: SQL query accuracy, evaluated separately by difficulty levels (Easy/Medium/Hard/Extra)

Baseline Methods

  • Chain-of-Thought (CoT): Standard chain-of-thought prompting as baseline
  • Full Observation vs. Decomposed Observation: Compares global information access with local decomposed information access

Implementation Details

  • Uses SageMaker to compute tree decomposition, employing minimum fill heuristic and exact solvers
  • SAT-Bench employs incremental variable assignment strategy
  • Spider employs incremental construction strategy using WITH clauses

Experimental Results

Main Results

1. SAT-Bench Results

  • Claude 3.5-Sonnet: Improves from 49.3% to 58.1% (+8.8%)
  • Llama-3-70B: Improves from 21.5% to 36.5% (+15.0%)
  • Complexity metrics clearly define difficulty boundaries, with ACONIC pushing boundaries toward more complex problems

2. Spider Results

Compared to CoT baseline, ACONIC shows significant improvements across all difficulty levels:

  • Easy: Improves from 42.7% to 75.8% (+33.1%)
  • Medium: Improves from 38.1% to 58.1% (+20.0%)
  • Hard: Improves from 36.2% to 62.7% (+26.5%)
  • Extra: Improves from 19.3% to 37.9% (+18.6%)

Experimental Findings

  1. Complexity Boundaries: Experiments reveal fixed "total task complexity" boundaries based on problem treewidth and bag count
  2. Consistent Improvements: ACONIC decomposition shows consistent performance gains across two different models (Claude and LLaMA)
  3. Difficulty Gradient: Stronger models (e.g., Claude) push boundaries toward more complex problems
  4. Decomposition Effectiveness: Increasing trajectory count slightly improves accuracy, but complexity-guided decomposition yields more significant improvements

1. Task Decomposition Methods

  • Chain-of-Thought Series: Wei et al. (2022), Yao et al. (2023), Khot et al. (2023)
  • Tool-Assisted Methods: Wang et al. (2024), Singh et al. (2024)
  • Domain-Specific Decomposition: Pourreza and Rafiei (2023), Chen et al. (2024)

2. Constraint Satisfaction and Planning

  • Planning as Satisfiability: Classical work by Selman et al.
  • Tree Decomposition Theory: Graph-theoretic foundations by Bodlaender (1998)
  • Multi-Agent Path Planning: Surynek et al. (2016)

3. Database Theory Applications

  • Constraint Graph Modeling: Gottlob et al. (2001)
  • NL2SQL Methods: Relation-aware encoding by Wang et al. (2019)

Conclusions and Discussion

Main Conclusions

  1. Framework Effectiveness: ACONIC provides the first constraint satisfaction-based quantification framework for LLM task complexity
  2. Systematic Decomposition Advantages: Complexity-based decomposition significantly outperforms heuristic methods
  3. Generality: The framework is effective across different task types (combinatorial problems and database queries)
  4. Theory Guiding Practice: Graph-theoretic concepts like treewidth provide theoretical foundations for LLM task decomposition

Limitations

  1. Scope Restrictions: Only applicable to tasks conveniently modeled as constraint satisfaction problems
  2. Complete Representation Challenges: Real-world problems often cannot be fully logically represented due to problem ambiguity, opaque agent actions, or vague contextual information
  3. Non-Fully Autonomous: ACONIC does not constitute a fully autonomous decomposition or reasoning system
  4. Benchmark Specificity: Evaluated tasks can be solved directly using constraint solvers or simple algorithms

Future Directions

  1. Hybrid Decomposition Methods: Investigate hybrid decomposition methods combining logical and commonsense constraints
  2. Broader Task Types: Extend to more practical problems such as deadlock detection and resource scheduling
  3. Fully Autonomous Systems: Develop toward fully autonomous decomposition and reasoning systems
  4. Learning-Based Decomposition: Comparative studies with other theoretically-grounded or learning-based decomposition frameworks

In-Depth Evaluation

Strengths

  1. Theoretical Innovation: First systematic application of tree decomposition theory from graph theory to LLM task decomposition
  2. Formal Rigor: Provides a strict mathematical framework with complete reduction chains from PaS to CSP to tree decomposition
  3. Sufficient Empirical Validation: Verified on two different benchmark types with consistent and significant results
  4. Strong Interpretability: Complexity metrics provide intuitive understanding of task difficulty
  5. General Framework: Not limited to specific task types, with good generality

Shortcomings

  1. Modeling Complexity: Reducing practical tasks to CSP requires specialized knowledge and manual engineering
  2. Computational Overhead: Tree decomposition computation itself may have high complexity
  3. Limited Baseline Comparisons: Primarily compared with CoT, lacking comparison with other systematic decomposition methods
  4. Task Type Limitations: Validated only on two task types, generalization ability requires broader verification

Impact

  1. Theoretical Contribution: Provides new theoretical perspective for LLM task decomposition
  2. Methodological Value: ACONIC framework may inspire more formal method-based LLM research
  3. Practical Value: Significant performance improvements on specific task types have practical application value
  4. Research Direction: May open new research directions combining LLMs with traditional symbolic AI methods

Applicable Scenarios

  1. Combinatorial Optimization Problems: Scheduling, resource allocation, and other problems modelable as CSP
  2. Structured Query Tasks: Database queries, knowledge graph reasoning, etc.
  3. Multi-Constraint Planning: Planning tasks requiring satisfaction of multiple constraints
  4. Logical Reasoning Tasks: Reasoning problems formalizable as logical constraints

References

  1. Bodlaender, H. L. (1998). A partial k-arboretum of graphs with bounded treewidth. Theoretical computer science, 209(1-2):1–45.
  2. Wei, J., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
  3. Yu, T., et al. (2019). Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task.
  4. Gottlob, G., Leone, N., & Scarcello, F. (2001). Hypertree decompositions: A survey. International Symposium on Mathematical Foundations of Computer Science.

Summary: The ACONIC framework proposed in this paper represents an important theoretical advance in LLM task decomposition. By introducing formal complexity metrics and systematic decomposition strategies, it provides new insights for solving complex LLM tasks. Despite limitations in applicable scope and modeling complexity, its significant performance improvements on specific tasks and theoretical contributions make it an important work in this field.