2025-11-16T06:22:12.451775

To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models

Malach, Saremi, Williamson et al.

State Space Models (SSMs) have become the leading alternative to Transformers for sequence modeling. Their primary advantage is efficiency in long-context and long-form generation, enabled by fixed-size memory and linear scaling of computational complexity. We begin this work by showing a simple theoretical result stating that SSMs cannot accurately solve any ``truly long-form'' generation problem (in a sense we formally define), undermining their main competitive advantage. However, we show that this limitation can be mitigated by allowing SSMs interactive access to external tools. In fact, we show that given the right choice of tool access and problem-dependent training data, SSMs can learn to solve any tractable problem and generalize to arbitrary problem length/complexity (i.e., achieve length generalization). Following our theoretical finding, we demonstrate that tool-augmented SSMs achieve remarkable length generalization on a variety of arithmetic, reasoning, and coding tasks. These findings highlight SSMs as a potential efficient alternative to Transformers in interactive tool-based and agentic settings.

academic

To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models

基本信息

论文ID: 2510.14826
标题: To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
作者: Eran Malach, Omid Saremi, Sinead Williamson, Arwen Bradley, Aryo Lotfi, Emmanuel Abbe, Josh Susskind, Etai Littwin
机构: Apple
分类: cs.LG
发表时间: 2025年10月17日
论文链接: https://arxiv.org/abs/2510.14826

摘要

状态空间模型(SSMs)已成为序列建模中Transformer的主要替代方案，其主要优势在于通过固定大小内存和线性计算复杂度实现长上下文和长序列生成的高效性。本文首先提出一个简单的理论结果，证明SSMs无法准确解决任何"真正的长序列"生成问题(在正式定义的意义下)，这削弱了其主要竞争优势。然而，研究表明这一限制可以通过为SSMs提供交互式外部工具访问来缓解。实际上，在正确选择工具访问和问题相关训练数据的条件下，SSMs可以学习解决任何可处理的问题并泛化到任意问题长度/复杂度。基于理论发现，作者证明了工具增强的SSMs在各种算术、推理和编程任务上实现了显著的长度泛化能力。

研究背景与动机

问题背景

Transformer的计算瓶颈: Transformer由于注意力机制，计算复杂度随序列长度二次增长，内存随长度线性增长，这在长上下文和长序列生成任务中成为主要限制。
SSMs的兴起: 为解决这一问题，研究者提出了各种替代架构，如线性Transformer和状态空间模型(SSMs)，包括Mamba、DeltaNet等，这些架构实现了固定内存和线性计算复杂度。
SSMs的局限性: 尽管SSMs在效率上有优势，但一些研究指出它们在需要长序列记忆和上下文学习的任务中存在显著局限性。