Shifting AI Efficiency From Model-Centric to Data-Centric Compression
Liu, Wen, Wang et al.
The advancement of large language models (LLMs) and multi-modal LLMs (MLLMs) has historically relied on scaling model parameters. However, as hardware limits constrain further model growth, the primary computational bottleneck has shifted to the quadratic cost of self-attention over increasingly long sequences by ultra-long text contexts, high-resolution images, and extended videos. In this position paper, \textbf{we argue that the focus of research for efficient artificial intelligence (AI) is shifting from model-centric compression to data-centric compression}. We position data-centric compression as the emerging paradigm, which improves AI efficiency by directly compressing the volume of data processed during model training or inference. To formalize this shift, we establish a unified framework for existing efficiency strategies and demonstrate why it constitutes a crucial paradigm change for long-context AI. We then systematically review the landscape of data-centric compression methods, analyzing their benefits across diverse scenarios. Finally, we outline key challenges and promising future research directions. Our work aims to provide a novel perspective on AI efficiency, synthesize existing efforts, and catalyze innovation to address the challenges posed by ever-increasing context lengths.
academic
Shifting AI Efficiency From Model-Centric to Data-Centric Compression
With the development of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs), traditional approaches that rely on expanding model parameters to improve performance face hardware constraints. The primary computational bottleneck has shifted from model scale to the quadratic complexity overhead of self-attention mechanisms when processing ultra-long text contexts, high-resolution images, and extended videos. This paper proposes that the focus of AI efficiency research should shift from model-centric compression to data-centric compression. Data-centric compression improves AI efficiency by directly compressing the volume of data processed during training or inference. The paper establishes a unified efficiency strategy framework, systematically reviews the landscape of data-centric compression methods, analyzes their advantages across different scenarios, and outlines key challenges and future research directions.
The core problem addressed in this paper is: as the context length processed by AI models grows dramatically, how can we effectively address the resulting computational efficiency challenges?
Technological Trend Shifts: From 2022-2024, AI performance improvements primarily relied on model scale expansion, but by 2024 model scale growth plateaued (approximately 1T parameters), while context length continues to grow exponentially
Computational Bottleneck Migration: The primary computational overhead has shifted from linear parameter growth to the quadratic complexity O(n²) of self-attention mechanisms
Cross-domain Requirements: Language models require processing longer reasoning chains, vision models need to handle higher-resolution images and longer videos, and generative models need to create higher-quality content
Traditional model-centric compression methods (quantization, pruning, distillation, low-rank decomposition) primarily optimize model parameters W, but cannot effectively address challenges posed by growing context lengths. These methods still require processing complete input data X when facing long sequences, failing to fundamentally solve the quadratic complexity problem.
Based on in-depth analysis of AI development trends, the authors propose data-centric compression as an emerging paradigm that addresses long-context challenges by directly reducing the volume of processed data, offering superior generality, efficiency, and compatibility.
Paradigm Shift Analysis: Analyzes the critical transition in AI efficiency research from parameter-centric to context-centric computational bottlenecks, arguing for the necessity of efficiency optimization paradigm transformation
Unified Theoretical Framework: Establishes a unified mathematical formulation framework encompassing architectural design, model-centric compression, and data-centric compression
Systematic Survey: Conducts comprehensive investigation of data-centric compression methods, constructs a unified classification framework, and analyzes advantages across different scenarios
Challenges and Directions: Provides in-depth analysis of current challenges and proposes promising future research directions, aiming to catalyze innovation in this field
Data-centric compression aims to transform the original input sequence X into a compressed representation X' through compression operation Φ, satisfying |X'| < |X|, while maintaining model performance as much as possible.
The paper cites extensive related work, primarily including:
Transformer architectures and variants (Vaswani et al., 2017)
Large language model series (OpenAI GPT, Meta LLaMA, Qwen, etc.)
Multimodal models (LLaVA, InternVL, etc.)
Efficiency optimization methods (classical works on quantization, pruning, distillation, etc.)
Representative works on data-centric compression
This paper provides an important theoretical framework and practical guidance for AI efficiency research, possessing significant academic value and practical significance.