2025-11-12T10:58:10.220342

AI Agents as Universal Task Solvers

Achille, Soatto

AI reasoning agents are already able to solve a variety of tasks by deploying tools, simulating outcomes of multiple hypotheses and reflecting on them. In doing so, they perform computation, although not in the classical sense -- there is no program being executed. Still, if they perform computation, can AI agents be universal? Can chain-of-thought reasoning solve any computable task? How does an AI Agent learn to reason? Is it a matter of model size? Or training dataset size? In this work, we reinterpret the role of learning in the context of AI Agents, viewing them as compute-capable stochastic dynamical systems, and highlight the role of time in a foundational principle for learning to reason. In doing so, we propose a shift from classical inductive learning to transductive learning -- where the objective is not to approximate the distribution of past data, but to capture their algorithmic structure to reduce the time needed to find solutions to new tasks. Transductive learning suggests that, counter to Shannon's theory, a key role of information in learning is about reduction of time rather than reconstruction error. In particular, we show that the optimal speed-up that a universal solver can achieve using past data is tightly related to their algorithmic information. Using this, we show a theoretical derivation for the observed power-law scaling of inference time versus training time. We then show that scaling model size can lead to behaviors that, while improving accuracy on benchmarks, fail any reasonable test of intelligence, let alone super-intelligence: In the limit of infinite space and time, large models can behave as savants, able to brute-force through any task without any insight. Instead, we argue that the key quantity to optimize when scaling reasoning models is time, whose critical role in learning has so far only been indirectly considered.

academic

AI Agents as Universal Task Solvers: It's All About Time

Basic Information

Paper ID: 2510.12066
Title: AI Agents as Universal Task Solvers: It's All About Time
Authors: Alessandro Achille, Stefano Soatto (AWS Agentic AI)
Classification: cs.AI, cs.LG
Publication Date: September 12, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.12066

Abstract

This paper revisits the role of AI agents in learning to reason, conceptualizing them as stochastic dynamical systems with computational capacity, and emphasizes the critical role of time in the foundational principles of reasoning learning. The authors propose a shift from classical inductive learning to transductive learning, where the goal is not to approximate the distribution of historical data, but to capture the algorithmic structure within data to reduce the time required to solve new tasks. The research demonstrates that the optimal speedup achievable by universal solvers using historical data is closely related to algorithmic information, and provides theoretical derivation for the observed power-law scaling between inference time and training time.

Research Background and Motivation

Core Questions

Universality of AI Agents: Can chain-of-thought reasoning solve any computable task?
Learning Mechanisms: How do AI agents learn to reason? Is it a matter of model scale or training data scale?
Nature of Scaling Laws: Do current accuracy-based scaling laws truly reflect intelligence?

Research Motivation

Traditional machine learning focuses on inductive learning—fitting functions to labeled data and expecting generalization to similar inputs. However, in agent settings, we need pretrained models capable of handling specific instances of new tasks and solving those instances. This process is called transduction: at test time, the model leverages all available data and actively reasons to solve the task at hand.

Limitations of Existing Approaches

Current scaling laws use prediction error as a proxy for intelligence, ignoring time costs
As models become more powerful, learning becomes unnecessary because models can rely on exhaustive computation rather than insights derived from data structure
In the limit of infinite resources, models can brute-force any task without any learning

Core Contributions

Theoretical Framework: Models AI agents as stochastic dynamical systems, extending universal solver theory from Turing machines to general dynamical systems
Redefinition of Time: Introduces the concept of "proper time," addressing the non-trivial problem of time definition in stochastic systems
Information-Speed Equivalence: Proves that information equals speed (Theorem 1.1: log speed-up = I(h : D))
Scaling Law Theory: Provides theoretical derivation for the observed power-law scaling between inference time and training time in reasoning models
Inversion of Scaling Laws: Reveals the misleading nature of accuracy-scale plots and proposes the importance of time optimization

Methodology Details

Task Definition

The research focuses on verifiable tasks: each problem instance x is paired with a task-specific function f(x,y) that can interactively verify or score any candidate solution y.

Core Theoretical Construction

1. Dynamical Systems as Computation

Models LLM chain-of-thought reasoning as stochastic dynamical systems:

State space: states s in S
Trajectories: h = (s₁, ..., sₙ), with length T(h) = n
Transition probabilities: ν(sₜ₊₁|sₜ)
Trajectory probability: ν(h) = ∏ν(sₜ₊₁|sₜ)

2. Proper Time Definition

Definition 2.3: For a stochastic dynamical system, the proper time from input x to output a is defined as:

τᵥ(x ↓ a) = min[T(h)/ν(h|x)]

where the minimum is taken over all trajectories h starting from enc(x) and terminating with output a.

Theorem 2.4: There exists a deterministic Turing machine Mᵥ such that:

T_Mᵥ(x ↓ a) ≤ 2τᵥ(x ↓ a)

3. Existence of Universal Solvers

Theorem 3.2: Given any distribution m over encoded programs, there exists a dynamical system Uₘ such that for any solver A:

τ_Uₘ(x ↓ y) ≤ C'_A 2^(-log m(A)) τ_A(x ↓ y)

Technical Innovations

1. Information-Speed Equivalence

Theorem 4.2: The log speedup of a search algorithm after observing data is:

log[τᵥ(h)/τᵥ(h|D)] = Iᵥ(h : D)

where Iᵥ(h : D) is the ν-algorithmic mutual information.

2. Generalization of the Hilberg Conjecture

Definition 4.4: Generalized Hilberg Conjecture (GHC) scaling:

I(Xₙ : Yₘ) ∝ n^β + m^β - (m+n)^β

3. Time Scaling Laws

Theorem 4.5: The log speedup obtained from training on a sufficiently large dataset D (n tokens) is:

log[τᵥ(h)/τᵥ(h|D)] = T(h)^β - βT(h)/n^(1-β)

Experimental Setup

Theoretical Verification

The paper is primarily theoretical, with verification of theorems through mathematical proofs. Experimental verification is mainly demonstrated through:

Santa Fe Process Construction: Explicitly constructs data generation processes satisfying GHC scaling
Theoretical Derivation of Power-Law Scaling: Provides theoretical foundation for empirically observed power-law relationships between inference and training time

Key Parameters

β ∈ (0,1): Complexity parameter controlling the heavy-tail distribution of "useful facts"
For natural language: β ≈ 0.5, implying n ∝ L² scaling relationship

Experimental Results

Main Theoretical Results

1. Maximum Speedup Bounds

Theorem 4.3: The maximum speedup achievable using data generated by process q is:

log[τᵥ(h)/τᵥ(h|D)] ≤ K(q)

where K(q) is the Kolmogorov complexity of q.

2. Learning and Time Optimization

Theorem 1.5:

Without time penalties, optimal reasoning can be achieved through brute-force without learning
Any system optimizing for time must learn at least I(h : D) = log speed-up bits from historical data

3. Memory-Time Tradeoff

Corollary 4.7: Assuming optimal memory usage, the speedup as a function of memory used is:

log[τᵥ(h)/τᵥ(h|D)] = T(h)^β - T(h)/M^(1/β-1)

Key Findings

Complexity Paradox: Contrary to Occam's Razor, complex data generation processes actually facilitate learning
Inversion of Scaling Laws: As model scale increases, systems may enter a "savant regime," achieving high accuracy through brute-force computation but lacking genuine insight
Centrality of Time: Intelligent behavior should be measured by error reduction per unit time/computation, not merely by accuracy

Main Research Areas

Solomonoff Induction and Universal Search: Based on classical work by Levin and Solomonoff
Transductive Learning: Vapnik's transductive reasoning framework
In-Context Learning: Modern LLM in-context learning capabilities
Algorithmic Information Theory: Kolmogorov complexity and algorithmic mutual information

Contributions of This Paper

Extends universal search theory from Turing machines to general stochastic dynamical systems
Proposes the foundational role of time in learning, challenging traditional Shannon information theory perspectives
Provides theoretical foundation for LLM reasoning capabilities

Conclusions and Discussion

Main Conclusions

Time is Central to Intelligence: True intelligence should optimize for time efficiency, not merely pursue accuracy
The Essence of Learning is Acceleration: In transductive settings, the value of learning lies in reducing time to solve unseen tasks
Value of Complexity: Complex data generation processes provide more opportunities for learning
Rethinking Scaling Strategies: Should optimize for time rather than purely model scale

Limitations

Theoretical Nature: Primarily theoretical work lacking large-scale empirical verification
Assumption Constraints: Relies on GHC scaling assumptions; actual data may not fully conform
Computability Issues: Some theoretical results involve uncomputable quantities (e.g., Kolmogorov complexity)

Future Directions

Empirical Verification: Validate theoretical predictions in actual LLM systems
Algorithm Design: Design better training and inference algorithms based on theoretical insights
Evaluation Metrics: Develop intelligence assessment metrics accounting for time costs

In-Depth Evaluation

Strengths

Theoretical Depth: Provides deep theoretical foundation for AI agent reasoning capabilities
Conceptual Innovation: Redefines learning objectives (from accuracy to time efficiency)
Mathematical Rigor: Complete proofs with clear logic
Practical Significance: Offers important reflection on current LLM scaling strategies

Weaknesses

Empirical Gaps: Theoretical results require more experimental validation
Complexity: Mathematical content is quite abstract with high barriers to practical application
Assumption Universality: The generalizability of key assumptions (e.g., GHC) remains to be verified

Impact

Theoretical Contribution: Provides new theoretical framework for AI reasoning research
Practical Value: Guides design and evaluation of future AI systems
Paradigm Shift: May promote transition from accuracy-oriented to efficiency-oriented research

Applicable Scenarios

Training strategy design for large-scale language models
Assessment of AI agent reasoning capabilities
Model optimization in computationally constrained environments
Theoretical analysis of automated reasoning systems

References

The paper cites rich related work, including:

Levin (1973): Universal sequential search problems
Solomonoff (1964): A formal theory of inductive inference
Hilberg (1990): Classical work on text redundancy information
Contemporary deep learning and LLM research

This paper provides profound theoretical insights into AI agent reasoning capabilities, particularly emphasizing the central role of time in learning. While primarily theoretical, its perspectives may significantly influence the design of future AI systems.