ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models
Zheng
Large Reasoning Language Models (LRLMs or LRMs) demonstrate remarkable capabilities in complex reasoning tasks, but suffer from significant computational inefficiencies due to overthinking phenomena. Existing efficient reasoning methods face the challenge of balancing reasoning quality with inference cost reduction. We propose \textbf{Adaptive Reasoning Suppression (ARS)}, a novel training-free approach that dynamically suppresses redundant reasoning steps while preserving accuracy through adaptive certainty monitoring. ARS introduces a multi-checkpoint certainty estimation mechanism with progressive suppression thresholds, achieving superior efficiency compared to static suppression methods. Our extensive evaluation across mathematical reasoning benchmarks using multiple model architectures demonstrates that ARS achieves up to 53%, 46.1%, and 57.9% in token, latency and energy reduction, while maintaining or improving accuracy.
academic
ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models
Large Reasoning Language Models (LRLMs) demonstrate exceptional capabilities in complex reasoning tasks, but suffer from significant computational efficiency issues due to the "overthinking" phenomenon. Existing efficient reasoning methods face challenges in balancing reasoning quality with reduced inference costs. This paper proposes Adaptive Reasoning Suppression (ARS), a novel training-free method that dynamically suppresses redundant reasoning steps through adaptive deterministic monitoring while maintaining accuracy. ARS introduces a multi-checkpoint deterministic estimation mechanism and progressive suppression thresholds, achieving superior efficiency compared to static suppression methods. On mathematical reasoning benchmarks across multiple model architectures, ARS achieves reductions of up to 53%, 46.1%, and 57.9% in tokens, latency, and energy consumption respectively, while maintaining or improving accuracy.
Large Reasoning Models (LRMs) such as OpenAI's o1/o3 and DeepSeek-R1 have achieved revolutionary progress in complex tasks including mathematics, programming, and scientific reasoning through sophisticated chain-of-thought (CoT) reasoning mechanisms. However, these models suffer from severe "overthinking" phenomena, where models continue generating redundant reasoning steps even after arriving at correct intermediate solutions.
Given a reasoning query q and a large reasoning language model π, the standard generation process produces output tokens o = {o₁, o₂, ..., oₜ}, where oₜ ~ π(·|q, o<ₜ). The objective is to minimize expected output length ET while maintaining reasoning accuracy:
min E[T] subject to E[L(f(o), y)] ≤ ε
where f(o) extracts the final answer from output o, y is the ground truth answer, L is the loss function, and ε is the acceptable accuracy degradation threshold.
ARS successfully addresses key limitations of existing methods by integrating adaptive determinism monitoring, progressive threshold adjustment, and dynamic suppression intensity control. Experiments demonstrate that ARS achieves significant computational efficiency improvements while maintaining or improving accuracy.
The paper cites 21 relevant references covering important works in large language model reasoning, chain-of-thought, mathematical problem solving, and related fields, providing solid theoretical foundation for the research.
Overall Assessment: This is an important paper making significant contributions to efficiency optimization in large reasoning models. The ARS method is cleverly designed with convincing experimental results, providing an effective solution to the overthinking problem in reasoning models. Despite some limitations, its innovation and practical value make it an important advance in this field.