2025-11-19T01:19:13.619140

An approach for systematic decomposition of complex llm tasks

Zhou, Xu, Liu et al.

Large Language Models (LLMs) suffer from reliability issues on complex tasks, as existing decomposition methods are heuristic and rely on agent or manual decomposition. This work introduces a novel, systematic decomposition framework that we call Analysis of CONstraint-Induced Complexity (ACONIC), which models the task as a constraint problem and leveraging formal complexity measures to guide decomposition. On combinatorial (SATBench) and LLM database querying tasks (Spider), we find that by decomposing the tasks following the measure of complexity, agent can perform considerably better (10-40 percentage point).

academic

An Approach for Systematic Decomposition of Complex LLM Tasks

Basic Information

Paper ID: 2510.07772
Title: An Approach for Systematic Decomposition of Complex LLM Tasks
Authors: Tianle Zhou, Jiakai Xu, Guanhong Liu, Jiaxiang Liu, Haonan Wang, Eugene Wu (Columbia University)
Classification: cs.AI
Publication Date: October 13, 2025 (arXiv v2)
Paper Link: https://arxiv.org/abs/2510.07772v2

Abstract

Large Language Models (LLMs) suffer from reliability issues on complex tasks, and existing decomposition methods are heuristic-based, relying on agents or manual decomposition. This work introduces a novel systematic decomposition framework called Constraint-Induced Complexity Analysis (ACONIC), which models tasks as constraint satisfaction problems and leverages formal complexity metrics to guide decomposition. On combinatorial problems (SAT-Bench) and LLM database query tasks (Spider), decomposing tasks according to complexity metrics significantly improves agent performance (10-40 percentage points).

Research Background and Motivation

1. Problem Statement

Large Language Models often fail to produce correct results in a single forward pass when handling complex tasks requiring deep multi-step reasoning or combinatorial search, exhibiting reliability issues.

2. Problem Significance

With the widespread application of LLMs in various reasoning, programming, and problem-solving tasks, how to systematically decompose complex tasks to improve model performance has become a critical challenge. Existing methods lack principled complexity metrics and decomposition strategies.

3. Limitations of Existing Methods

Heuristic Decomposition: Existing methods such as Chain-of-Thought primarily rely on LLMs themselves for decomposition, lacking theoretical foundations
Manual Decomposition: Depends on domain experts to manually design workflows, lacking systematicity
Absence of Complexity Metrics: Unable to quantify task complexity, making it difficult to determine when and how to decompose

4. Research Motivation

Establish a formal task complexity framework that enables systematic decomposition strategies, provides the ability to study comparably difficult tasks, and guides when tool assistance is needed.

Core Contributions

Proposes ACONIC Framework: The first formal complexity framework that systematically reduces LLM tasks to constraint satisfaction problems
Establishes Complexity Metrics: Uses constraint graph size and treewidth as task complexity measures
Systematic Decomposition Method: A decomposition strategy based on tree decomposition that minimizes subtask complexity while maintaining global satisfiability
Empirical Validation: Verifies the difficulty boundaries defined by complexity metrics and decomposition effectiveness on SAT-Bench and Spider benchmarks
Performance Improvements: Achieves 9-15% improvement on SAT-Bench and 30-40% improvement on Spider compared to Chain-of-Thought methods

Methodology Details

Task Definition

ACONIC defines LLM tasks as: given a context describing a set of constraints and a query that must reason over these constraints, reduce it to a formal constraint satisfaction problem, then decompose and construct a subtask workflow.

Model Architecture

1. Reduction to Planning Problems

Uses a state-based agent action framework to formalize tasks as Planning as Satisfiability (PaS) problems:

P = ⟨F, A, I, G⟩

Where:

F: Finite set of propositional fluents describing world facts
A: Finite set of actions
I, G: Initial and goal fluents
For action a: P(a) determines preconditions, A(a) determines fluents becoming true, D(a) determines fluents becoming false

2. Reduction to Constraint Satisfaction Problems

Reduces PaS problems to CSP instances by encoding:

Preconditions fp ∈ P(a)
Add effects fa ∈ A(a)
Delete effects fd ∈ D(a) As boolean dependency constraints between fluents and actions.

3. Tree Decomposition Strategy

Leverages Bodlaender's (1998) tree decomposition theory:

Finds tree decomposition D* with minimal maximum bag size (treewidth)
Treewidth characterizes intrinsic problem complexity
Local consistency ensures global consistency

Technical Innovations

Formal Complexity Metrics: First application of treewidth from graph theory as a quantitative indicator of LLM task complexity
Global Consistency Guarantee: Tree decomposition ensures that consistency on local subgraphs implies consistency of global CSP solutions
Optimal Decomposition Strategy: Decomposition based on minimal treewidth minimizes local complexity
Automatic Reduction Procedures: Develops automatic reduction procedures for specific benchmarks, reducing manual modeling

Experimental Setup

Datasets

1. SAT-Bench

Natural language story problems constructed from SAT problems
Contains CNF representations, natural language descriptions, and entity-to-SAT alignment mappings
Evaluates Claude 3.5-Sonnet (random sampling of half tasks) and Llama-3-70B (all tasks)

2. Spider

Popular NL2SQL benchmark dataset
Contains hundreds of databases, each with up to 37 tables, 90 foreign keys, and over 100 columns
Tasks include database schema S, natural language query q, and ground-truth SQL query q*

Evaluation Metrics

SAT-Bench: Task completion rate (success/failure)
Spider: SQL query accuracy, evaluated separately by difficulty levels (Easy/Medium/Hard/Extra)

Baseline Methods

Chain-of-Thought (CoT): Standard chain-of-thought prompting as baseline
Full Observation vs. Decomposed Observation: Compares global information access with local decomposed information access

Implementation Details

Uses SageMaker to compute tree decomposition, employing minimum fill heuristic and exact solvers
SAT-Bench employs incremental variable assignment strategy
Spider employs incremental construction strategy using WITH clauses

Experimental Results

Main Results

1. SAT-Bench Results

Claude 3.5-Sonnet: Improves from 49.3% to 58.1% (+8.8%)
Llama-3-70B: Improves from 21.5% to 36.5% (+15.0%)
Complexity metrics clearly define difficulty boundaries, with ACONIC pushing boundaries toward more complex problems

2. Spider Results

Compared to CoT baseline, ACONIC shows significant improvements across all difficulty levels:

Easy: Improves from 42.7% to 75.8% (+33.1%)
Medium: Improves from 38.1% to 58.1% (+20.0%)
Hard: Improves from 36.2% to 62.7% (+26.5%)
Extra: Improves from 19.3% to 37.9% (+18.6%)

Experimental Findings

Complexity Boundaries: Experiments reveal fixed "total task complexity" boundaries based on problem treewidth and bag count
Consistent Improvements: ACONIC decomposition shows consistent performance gains across two different models (Claude and LLaMA)
Difficulty Gradient: Stronger models (e.g., Claude) push boundaries toward more complex problems
Decomposition Effectiveness: Increasing trajectory count slightly improves accuracy, but complexity-guided decomposition yields more significant improvements

1. Task Decomposition Methods

Chain-of-Thought Series: Wei et al. (2022), Yao et al. (2023), Khot et al. (2023)
Tool-Assisted Methods: Wang et al. (2024), Singh et al. (2024)
Domain-Specific Decomposition: Pourreza and Rafiei (2023), Chen et al. (2024)

2. Constraint Satisfaction and Planning

Planning as Satisfiability: Classical work by Selman et al.
Tree Decomposition Theory: Graph-theoretic foundations by Bodlaender (1998)
Multi-Agent Path Planning: Surynek et al. (2016)

3. Database Theory Applications

Constraint Graph Modeling: Gottlob et al. (2001)
NL2SQL Methods: Relation-aware encoding by Wang et al. (2019)

Conclusions and Discussion

Main Conclusions

Framework Effectiveness: ACONIC provides the first constraint satisfaction-based quantification framework for LLM task complexity
Systematic Decomposition Advantages: Complexity-based decomposition significantly outperforms heuristic methods
Generality: The framework is effective across different task types (combinatorial problems and database queries)
Theory Guiding Practice: Graph-theoretic concepts like treewidth provide theoretical foundations for LLM task decomposition

Limitations

Scope Restrictions: Only applicable to tasks conveniently modeled as constraint satisfaction problems
Complete Representation Challenges: Real-world problems often cannot be fully logically represented due to problem ambiguity, opaque agent actions, or vague contextual information
Non-Fully Autonomous: ACONIC does not constitute a fully autonomous decomposition or reasoning system
Benchmark Specificity: Evaluated tasks can be solved directly using constraint solvers or simple algorithms

Future Directions

Hybrid Decomposition Methods: Investigate hybrid decomposition methods combining logical and commonsense constraints
Broader Task Types: Extend to more practical problems such as deadlock detection and resource scheduling
Fully Autonomous Systems: Develop toward fully autonomous decomposition and reasoning systems
Learning-Based Decomposition: Comparative studies with other theoretically-grounded or learning-based decomposition frameworks

In-Depth Evaluation

Strengths

Theoretical Innovation: First systematic application of tree decomposition theory from graph theory to LLM task decomposition
Formal Rigor: Provides a strict mathematical framework with complete reduction chains from PaS to CSP to tree decomposition
Sufficient Empirical Validation: Verified on two different benchmark types with consistent and significant results
Strong Interpretability: Complexity metrics provide intuitive understanding of task difficulty
General Framework: Not limited to specific task types, with good generality

Shortcomings

Modeling Complexity: Reducing practical tasks to CSP requires specialized knowledge and manual engineering
Computational Overhead: Tree decomposition computation itself may have high complexity
Limited Baseline Comparisons: Primarily compared with CoT, lacking comparison with other systematic decomposition methods
Task Type Limitations: Validated only on two task types, generalization ability requires broader verification

Impact

Theoretical Contribution: Provides new theoretical perspective for LLM task decomposition
Methodological Value: ACONIC framework may inspire more formal method-based LLM research
Practical Value: Significant performance improvements on specific task types have practical application value
Research Direction: May open new research directions combining LLMs with traditional symbolic AI methods

Applicable Scenarios

Combinatorial Optimization Problems: Scheduling, resource allocation, and other problems modelable as CSP
Structured Query Tasks: Database queries, knowledge graph reasoning, etc.
Multi-Constraint Planning: Planning tasks requiring satisfaction of multiple constraints
Logical Reasoning Tasks: Reasoning problems formalizable as logical constraints

References

Bodlaender, H. L. (1998). A partial k-arboretum of graphs with bounded treewidth. Theoretical computer science, 209(1-2):1–45.
Wei, J., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
Yu, T., et al. (2019). Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task.
Gottlob, G., Leone, N., & Scarcello, F. (2001). Hypertree decompositions: A survey. International Symposium on Mathematical Foundations of Computer Science.

Summary: The ACONIC framework proposed in this paper represents an important theoretical advance in LLM task decomposition. By introducing formal complexity metrics and systematic decomposition strategies, it provides new insights for solving complex LLM tasks. Despite limitations in applicable scope and modeling complexity, its significant performance improvements on specific tasks and theoretical contributions make it an important work in this field.