AI reasoning agents are already able to solve a variety of tasks by deploying tools, simulating outcomes of multiple hypotheses and reflecting on them. In doing so, they perform computation, although not in the classical sense -- there is no program being executed. Still, if they perform computation, can AI agents be universal? Can chain-of-thought reasoning solve any computable task? How does an AI Agent learn to reason? Is it a matter of model size? Or training dataset size?
In this work, we reinterpret the role of learning in the context of AI Agents, viewing them as compute-capable stochastic dynamical systems, and highlight the role of time in a foundational principle for learning to reason. In doing so, we propose a shift from classical inductive learning to transductive learning -- where the objective is not to approximate the distribution of past data, but to capture their algorithmic structure to reduce the time needed to find solutions to new tasks.
Transductive learning suggests that, counter to Shannon's theory, a key role of information in learning is about reduction of time rather than reconstruction error. In particular, we show that the optimal speed-up that a universal solver can achieve using past data is tightly related to their algorithmic information. Using this, we show a theoretical derivation for the observed power-law scaling of inference time versus training time. We then show that scaling model size can lead to behaviors that, while improving accuracy on benchmarks, fail any reasonable test of intelligence, let alone super-intelligence: In the limit of infinite space and time, large models can behave as savants, able to brute-force through any task without any insight. Instead, we argue that the key quantity to optimize when scaling reasoning models is time, whose critical role in learning has so far only been indirectly considered.
academic
AI Agents as Universal Task Solvers: It's All About Time
This paper revisits the role of AI agents in learning to reason, conceptualizing them as stochastic dynamical systems with computational capacity, and emphasizes the critical role of time in the foundational principles of reasoning learning. The authors propose a shift from classical inductive learning to transductive learning, where the goal is not to approximate the distribution of historical data, but to capture the algorithmic structure within data to reduce the time required to solve new tasks. The research demonstrates that the optimal speedup achievable by universal solvers using historical data is closely related to algorithmic information, and provides theoretical derivation for the observed power-law scaling between inference time and training time.
Traditional machine learning focuses on inductive learning—fitting functions to labeled data and expecting generalization to similar inputs. However, in agent settings, we need pretrained models capable of handling specific instances of new tasks and solving those instances. This process is called transduction: at test time, the model leverages all available data and actively reasons to solve the task at hand.
Current scaling laws use prediction error as a proxy for intelligence, ignoring time costs
As models become more powerful, learning becomes unnecessary because models can rely on exhaustive computation rather than insights derived from data structure
In the limit of infinite resources, models can brute-force any task without any learning
Theoretical Framework: Models AI agents as stochastic dynamical systems, extending universal solver theory from Turing machines to general dynamical systems
Redefinition of Time: Introduces the concept of "proper time," addressing the non-trivial problem of time definition in stochastic systems
Information-Speed Equivalence: Proves that information equals speed (Theorem 1.1: log speed-up = I(h : D))
Scaling Law Theory: Provides theoretical derivation for the observed power-law scaling between inference time and training time in reasoning models
Inversion of Scaling Laws: Reveals the misleading nature of accuracy-scale plots and proposes the importance of time optimization
The research focuses on verifiable tasks: each problem instance x is paired with a task-specific function f(x,y) that can interactively verify or score any candidate solution y.
The paper is primarily theoretical, with verification of theorems through mathematical proofs. Experimental verification is mainly demonstrated through:
Santa Fe Process Construction: Explicitly constructs data generation processes satisfying GHC scaling
Theoretical Derivation of Power-Law Scaling: Provides theoretical foundation for empirically observed power-law relationships between inference and training time
Complexity Paradox: Contrary to Occam's Razor, complex data generation processes actually facilitate learning
Inversion of Scaling Laws: As model scale increases, systems may enter a "savant regime," achieving high accuracy through brute-force computation but lacking genuine insight
Centrality of Time: Intelligent behavior should be measured by error reduction per unit time/computation, not merely by accuracy
Solomonoff (1964): A formal theory of inductive inference
Hilberg (1990): Classical work on text redundancy information
Contemporary deep learning and LLM research
This paper provides profound theoretical insights into AI agent reasoning capabilities, particularly emphasizing the central role of time in learning. While primarily theoretical, its perspectives may significantly influence the design of future AI systems.