Large Language Models (LLMs) present significant computational and memory challenges due to their extensive size, making pruning essential for their efficient deployment. Existing one-shot pruning methods often apply uniform sparsity constraints across layers or within each layer, resulting in suboptimal performance, especially at high sparsity ratios. This work introduces TRIM (Targeted Row-wise Iterative Metric-driven pruning), a novel approach that applies varying sparsity ratios to individual output dimensions (rows) within each layer. TRIM employs an iterative adjustment process guided by quality metrics to optimize dimension-wise sparsity allocation, focusing on reducing variance in quality retention across outputs to preserve critical information. TRIM can be seamlessly integrated with existing layer-wise pruning strategies. Our evaluations on perplexity and zero-shot tasks across diverse LLM families (Qwen2.5, LLaMA-2, and OPT) and sparsity levels demonstrate that TRIM achieves new state-of-the-art results and enhances stability. For instance, at 80% sparsity, TRIM reduces perplexity by 48% for Qwen2.5-14B and over 90% for OPT-13B compared to baseline methods. We conclude that fine-grained, dimension-wise sparsity adaptation is crucial for pushing the limits of extreme LLM compression. Code available at: https://github.com/flobk/TRIM
āĻĒā§āĻĒāĻžāϰ āĻāĻāĻĄāĻŋ : 2505.16743āĻļāĻŋāϰā§āύāĻžāĻŽ : TRIM: Achieving Extreme Sparsity with Targeted Row-wise Iterative Metric-driven PruningāϞā§āĻāĻ : āĻĢā§āϞā§āϰā§āύā§āĻāĻŋāύ āĻŦā§āĻ (āĻāĻŋāĻāĻŦāĻŋāύāĻā§āύ āĻŦāĻŋāĻļā§āĻŦāĻŦāĻŋāĻĻā§āϝāĻžāϞāϝāĻŧ), āĻāĻāϞāĻŋāϝāĻŧāĻžāĻŽ āϰā§āĻĄāĻŽā§āϝāĻžāύ (āĻā§āĻā§āϏāĻžāϏ āĻ
āϏā§āĻāĻŋāύ āĻŦāĻŋāĻļā§āĻŦāĻŦāĻŋāĻĻā§āϝāĻžāϞāϝāĻŧ), āĻāĻžāϰā§āϏā§āĻā§āύ āĻāĻāĻāĻšāĻĢ (āĻāĻŋāĻāĻŦāĻŋāύāĻā§āύ āĻŦāĻŋāĻļā§āĻŦāĻŦāĻŋāĻĻā§āϝāĻžāϞāϝāĻŧ)āĻļā§āϰā§āĻŖā§āĻŦāĻŋāĻāĻžāĻ : cs.CL cs.AI cs.LGāĻĒā§āϰāĻāĻžāĻļāύāĻžāϰ āϏāĻŽāϝāĻŧ : ⧍ā§Ļ⧍ā§Ģ āϏāĻžāϞā§āϰ ā§§ā§§ āĻ
āĻā§āĻā§āĻŦāϰ (arXiv v2)āĻĒā§āĻĒāĻžāϰ āϞāĻŋāĻāĻ : https://arxiv.org/abs/2505.16743 āĻā§āĻĄ āϞāĻŋāĻāĻ : https://github.com/flobk/TRIM āĻŦā§āĻšā§ āĻāĻžāώāĻž āĻŽāĻĄā§āϞ (LLM) āĻā§āϞāĻŋ āϤāĻžāĻĻā§āϰ āĻŦāĻŋāĻļāĻžāϞ āĻĒā§āϝāĻžāϰāĻžāĻŽāĻŋāĻāĻžāϰ āϏā§āĻā§āϞā§āϰ āĻāĻžāϰāĻŖā§ āĻāĻŖāύāĻž āĻāĻŦāĻ āĻŽā§āĻŽāϰāĻŋ āĻā§āϝāĻžāϞā§āĻā§āĻ āϏā§āώā§āĻāĻŋ āĻāϰā§, āϝāĻž āĻĻāĻā§āώ āϏā§āĻĨāĻžāĻĒāύāĻžāϰ āĻāύā§āϝ āĻŽāĻĄā§āϞ āĻĒā§āϰā§āύāĻŋāĻ āĻ
āĻĒāϰāĻŋāĻšāĻžāϰā§āϝ āĻāϰ⧠āϤā§āϞā§āĨ¤ āĻŦāĻŋāĻĻā§āϝāĻŽāĻžāύ āĻāĻāĻāĻžāϞā§āύ āĻĒā§āϰā§āύāĻŋāĻ āĻĒāĻĻā§āϧāϤāĻŋāĻā§āϞāĻŋ āϏāĻžāϧāĻžāϰāĻŖāϤ āϏā§āϤāϰ āĻā§āĻĄāĻŧā§ āĻŦāĻž āϏā§āϤāϰā§āϰ āĻŽāϧā§āϝ⧠āĻāĻā§āĻā§āϤ āĻŦāĻŋāϰāϞāϤāĻž āϏā§āĻŽāĻžāĻŦāĻĻā§āϧāϤāĻž āĻĒā§āϰāϝāĻŧā§āĻ āĻāϰā§, āĻāĻā§āĻ āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰ⧠āĻĻā§āϰā§āĻŦāϞ āĻĒāĻžāϰāĻĢāϰāĻŽā§āϝāĻžāύā§āϏ āĻĒā§āϰāĻĻāϰā§āĻļāύ āĻāϰā§āĨ¤ āĻāĻ āĻĒā§āĻĒāĻžāϰāĻāĻŋ TRIM (āϞāĻā§āώā§āϝāĻŦāϏā§āϤ⧠āϏāĻžāϰāĻŋ-āĻāĻŋāϤā§āϤāĻŋāĻ āĻĒā§āύāϰāĻžāĻŦā§āϤā§āϤāĻŋāĻŽā§āϞāĻ āĻŽā§āĻā§āϰāĻŋāĻ-āĻāĻžāϞāĻŋāϤ āĻĒā§āϰā§āύāĻŋāĻ) āĻĒā§āϰāϏā§āϤāĻžāĻŦ āĻāϰā§, āϝāĻž āĻĒā§āϰāϤāĻŋāĻāĻŋ āϏā§āϤāϰā§āϰ āĻŽāϧā§āϝ⧠āĻŦāĻŋāĻāĻŋāύā§āύ āĻāĻāĻāĻĒā§āĻ āĻŽāĻžāϤā§āϰāĻž (āϏāĻžāϰāĻŋ) āϤ⧠āĻŦāĻŋāĻāĻŋāύā§āύ āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰ āĻĒā§āϰāϝāĻŧā§āĻ āĻāϰāĻžāϰ āĻāĻāĻāĻŋ āύāϤā§āύ āĻĒāĻĻā§āϧāϤāĻŋāĨ¤ TRIM āĻā§āĻŖāĻŽāĻžāύ āĻŽā§āĻā§āϰāĻŋāĻ āĻĻā§āĻŦāĻžāϰāĻž āĻĒāϰāĻŋāĻāĻžāϞāĻŋāϤ āĻĒā§āύāϰāĻžāĻŦā§āϤā§āϤāĻŋāĻŽā§āϞāĻ āϏāĻŽāύā§āĻŦāϝāĻŧ āĻĒā§āϰāĻā§āϰāĻŋāϝāĻŧāĻž āĻŦā§āϝāĻŦāĻšāĻžāϰ āĻāϰ⧠āĻŽāĻžāϤā§āϰāĻž-āϏā§āϤāϰā§āϰ āĻŦāĻŋāϰāϞāϤāĻž āĻŦāϰāĻžāĻĻā§āĻĻ āĻ
āĻĒā§āĻāĻŋāĻŽāĻžāĻāĻ āĻāϰā§, āĻāĻāĻāĻĒā§āĻ āĻā§āĻĄāĻŧā§ āĻā§āĻŖāĻŽāĻžāύ āϏāĻāϰāĻā§āώāĻŖā§āϰ āĻŦā§āĻāĻŋāϤā§āϰā§āϝ āĻšā§āϰāĻžāϏ āĻāϰ⧠āĻā§āϰā§āϤā§āĻŦāĻĒā§āϰā§āĻŖ āϤāĻĨā§āϝ āϧāϰ⧠āϰāĻžāĻāĻžāϰ āĻāĻĒāϰ āĻĻā§āώā§āĻāĻŋ āύāĻŋāĻŦāĻĻā§āϧ āĻāϰā§āĨ¤ TRIM āĻŦāĻŋāĻĻā§āϝāĻŽāĻžāύ āϏā§āϤāϰ-āϏā§āϤāϰā§āϰ āĻĒā§āϰā§āύāĻŋāĻ āĻā§āĻļāϞāĻā§āϞāĻŋāϰ āϏāĻžāĻĨā§ āύāĻŋāϰā§āĻŦāĻŋāĻā§āύ⧠āĻāĻā§āĻā§āϤ āĻšāϤ⧠āĻĒāĻžāϰā§āĨ¤ āĻāĻāĻžāϧāĻŋāĻ LLM āĻĒāϰāĻŋāĻŦāĻžāϰ (Qwen2.5, LLaMA-2 āĻāĻŦāĻ OPT) āĻāĻŦāĻ āĻŦāĻŋāϰāϞāϤāĻžāϰ āϏā§āϤāϰāĻā§āϞāĻŋāϤ⧠āĻŦāĻŋāĻā§āϰāĻžāύā§āϤāĻŋ āĻāĻŦāĻ āĻļā§āύā§āϝ-āĻļāĻ āĻāĻžāĻā§āϰ āĻŽā§āϞā§āϝāĻžāϝāĻŧāύ āĻĻā§āĻāĻžāϝāĻŧ āϝ⧠TRIM āύāϤā§āύ āĻ
āϤā§āϝāĻžāϧā§āύāĻŋāĻ āĻĢāϞāĻžāĻĢāϞ āĻ
āϰā§āĻāύ āĻāϰā§āĻā§ āĻāĻŦāĻ āϏā§āĻĨāĻŋāϤāĻŋāĻļā§āϞāϤāĻž āĻŦā§āĻĻā§āϧāĻŋ āĻāϰā§āĻā§āĨ¤ āĻāĻĻāĻžāĻšāϰāĻŖāϏā§āĻŦāϰā§āĻĒ, ā§Žā§Ļ% āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰā§, TRIM āĻŦā§āϏāϞāĻžāĻāύ āĻĒāĻĻā§āϧāϤāĻŋāϰ āϤā§āϞāύāĻžāϝāĻŧ Qwen2.5-14B āĻāϰ āĻŦāĻŋāĻā§āϰāĻžāύā§āϤāĻŋ ā§Ēā§Ž% āĻšā§āϰāĻžāϏ āĻāϰā§āĻā§ āĻāĻŦāĻ OPT-13B āĻāϰ āĻŦāĻŋāĻā§āϰāĻžāύā§āϤāĻŋ ⧝ā§Ļ% āĻāϰ āĻŦā§āĻļāĻŋ āĻšā§āϰāĻžāϏ āĻāϰā§āĻā§āĨ¤
āĻŦā§āĻšā§ āĻāĻžāώāĻž āĻŽāĻĄā§āϞā§āϰ āĻĒā§āϝāĻžāϰāĻžāĻŽāĻŋāĻāĻžāϰ āϏā§āĻā§āϞā§āϰ āϏā§āĻāĻā§āϝāĻŧ āĻŦā§āĻĻā§āϧāĻŋāϰ āϏāĻžāĻĨā§, āĻŽāĻĄā§āϞ āϏā§āĻĨāĻžāĻĒāύāĻž āĻā§āϰā§āϤāϰ āĻŽā§āĻŽāϰāĻŋ āĻāĻŦāĻ āĻāĻŖāύāĻž āϏāĻŽā§āĻĒāĻĻ āĻā§āϝāĻžāϞā§āĻā§āĻā§āϰ āϏāĻŽā§āĻŽā§āĻā§āύ āĻšāϝāĻŧāĨ¤ āϝāĻĻāĻŋāĻ āĻĒā§āϝāĻžāϰāĻžāĻŽāĻŋāĻāĻžāϰ āĻŦā§āĻĻā§āϧāĻŋ āĻāϰā§āĻŽāĻā§āώāĻŽāϤāĻž āĻāύā§āύāϤāĻŋ āĻāĻŦāĻ āĻāĻĻā§āϝāĻŧāĻŽāĻžāύ āĻā§āώāĻŽāϤāĻž āύāĻŋāϝāĻŧā§ āĻāϏā§, āĻāĻāĻŋ āϏāĻŽā§āĻĒāĻĻ-āϏā§āĻŽāĻŋāϤ āĻĒāϰāĻŋāĻŦā§āĻļā§ āĻ
āύā§āĻŽāĻžāύ āĻāϰāĻž āĻāĻ āĻŋāύ āĻāϰ⧠āϤā§āϞā§āĨ¤
āĻāĻā§āĻā§āϤ āĻŦāĻŋāϰāϞāϤāĻž āϏā§āĻŽāĻžāĻŦāĻĻā§āϧāϤāĻž : āĻŦāĻŋāĻĻā§āϝāĻŽāĻžāύ āĻāĻāĻāĻžāϞā§āύ āĻĒā§āϰā§āύāĻŋāĻ āĻĒāĻĻā§āϧāϤāĻŋ (āϝā§āĻŽāύ Wanda, OWL, AlphaPruning) āϏāĻžāϧāĻžāϰāĻŖāϤ āϏāĻŽāϏā§āϤ āϏā§āϤāϰ āĻŦāĻž āϏā§āϤāϰā§āϰ āĻŽāϧā§āϝ⧠āϏāĻŽāϏā§āϤ āĻāĻāĻāĻĒā§āĻ āĻŽāĻžāϤā§āϰāĻžāϝāĻŧ āĻāĻāĻ āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰ āĻĒā§āϰāϝāĻŧā§āĻ āĻāϰā§āĻāĻā§āĻ āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰ⧠āĻāϰā§āĻŽāĻā§āώāĻŽāϤāĻž āϤā§āĻŦā§āϰ āĻšā§āϰāĻžāϏ : āĻāϰāĻŽ āĻŦāĻŋāϰāϞāϤāĻžāϝāĻŧ (>70%), āĻāĻā§āĻā§āϤ āĻā§āĻļāϞ āĻāϞā§āϞā§āĻāϝā§āĻā§āϝ āĻāϰā§āĻŽāĻā§āώāĻŽāϤāĻž āĻ
āĻŦāύāϤāĻŋ āĻāĻāĻžāϝāĻŧāĻŽāĻžāϤā§āϰāĻž āĻŦā§āώāĻŽā§āϝ āĻāĻĒā§āĻā§āώāĻž āĻāϰāĻž : āĻŦāĻŋāĻāĻŋāύā§āύ āĻāĻāĻāĻĒā§āĻ āĻŽāĻžāϤā§āϰāĻž āĻĒā§āϰā§āύāĻŋāĻ āĻāϰ āĻĒā§āϰāϤāĻŋ āϏāĻāĻŦā§āĻĻāύāĻļā§āϞāϤāĻž āĻāĻŦāĻ āĻā§āϰā§āϤā§āĻŦā§ āĻāϞā§āϞā§āĻāϝā§āĻā§āϝ āĻĒāĻžāϰā§āĻĨāĻā§āϝ āϰāϝāĻŧā§āĻā§āĻĒā§āĻĒāĻžāϰāĻāĻŋ āĻĒāϰā§āϝāĻŦā§āĻā§āώāĻŖ āĻāϰ⧠āϝ⧠LLM āĻā§āϞāĻŋ āĻ
āύāύā§āϝ āĻāĻāύ āĻāĻŦāĻ āϏāĻā§āϰāĻŋāϝāĻŧāĻāϰāĻŖ āĻŦā§āĻļāĻŋāώā§āĻā§āϝ āϰāĻžāĻā§, āϝā§āĻŽāύ āĻŦāĻŋāĻļāĻŋāώā§āĻ āĻŦāĻšāĻŋāϰāĻžāĻāϤ āĻŦā§āĻļāĻŋāώā§āĻā§āϝ āĻāĻŦāĻ āĻ
āϤā§āϝāύā§āϤ āϤāĻŋāϰā§āϝāĻ āϏāĻā§āϰāĻŋāϝāĻŧāĻāϰāĻŖ āĻŦāĻŋāϤāϰāĻŖāĨ¤ āĻāĻ āĻŦā§āĻļāĻŋāώā§āĻā§āϝāĻā§āϞāĻŋ āύāĻŋāϰā§āĻĻā§āĻļ āĻāϰ⧠āϝ⧠āϏā§āϤāϰā§āϰ āĻŽāϧā§āϝ⧠āĻŦāĻŋāĻāĻŋāύā§āύ āĻāĻāĻāĻĒā§āĻ āĻŽāĻžāϤā§āϰāĻž āĻŦāĻŋāĻāĻŋāύā§āύ āĻĒā§āϰā§āύāĻŋāĻ āϏāĻāĻŦā§āĻĻāύāĻļā§āϞāϤāĻž āϰāĻžāĻā§, āϤāĻžāĻ āĻāϰāĻ āϏā§āĻā§āώā§āĻŽ-āĻĻāĻžāύāĻžāĻĻāĻžāϰ āĻŦāĻŋāϰāϞāϤāĻž āĻŦāϰāĻžāĻĻā§āĻĻ āĻā§āĻļāϞ āĻĒā§āϰāϝāĻŧā§āĻāύāĨ¤
āĻĒā§āϰāĻĨāĻŽ āĻŽāĻžāϤā§āϰāĻž-āϏā§āϤāϰā§āϰ āĻŦāĻŋāϰāϞāϤāĻž āĻŦāϰāĻžāĻĻā§āĻĻ : āĻĒā§āϰāϤāĻŋāĻāĻŋ āϏā§āϤāϰā§āϰ āĻŽāϧā§āϝ⧠āĻŦāĻŋāĻāĻŋāύā§āύ āĻāĻāĻāĻĒā§āĻ āĻŽāĻžāϤā§āϰāĻžāϰ āĻāύā§āϝ āĻŦāĻŋāĻāĻŋāύā§āύ āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰ āĻāĻŖāύāĻž āĻāϰāĻžāϰ āĻĒā§āϰāĻĨāĻŽ āĻ
ā§āϝāĻžāϞāĻāϰāĻŋāĻĻāĻŽ āĻĒā§āϰāϏā§āϤāĻžāĻŦ āĻāϰā§āĻāϰāĻŽ āĻŦāĻŋāϰāϞāϤāĻžāϝāĻŧ SOTA āĻāϰā§āĻŽāĻā§āώāĻŽāϤāĻž : ā§Žā§Ļ% āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰā§, āĻŦāĻŋāĻĻā§āϝāĻŽāĻžāύ āĻĒāĻĻā§āϧāϤāĻŋāϰ āϤā§āϞāύāĻžāϝāĻŧ āĻāϞā§āϞā§āĻāϝā§āĻā§āϝāĻāĻžāĻŦā§ āĻŦāĻŋāĻā§āϰāĻžāύā§āϤāĻŋ āĻšā§āϰāĻžāϏ āĻāϰ⧠(Qwen2.5-14B ā§Ēā§Ž% āĻšā§āϰāĻžāϏ, OPT-13B ⧝ā§Ļ%+ āĻšā§āϰāĻžāϏ)āĻāĻā§āϰ āĻ
āĻāĻŋāĻā§āĻāϤāĻžāĻŽā§āϞāĻ āĻŦāĻŋāĻļā§āϞā§āώāĻŖ : āĻĒā§āϰā§āύāĻŋāĻ āϏāĻāĻŦā§āĻĻāύāĻļā§āϞāϤāĻž āĻāĻŦāĻ āĻĄāĻžāĻāύāϏā§āĻā§āϰāĻŋāĻŽ āĻāĻžāĻā§āϰ āĻā§āϰā§āϤā§āĻŦā§ āĻāĻāĻāĻĒā§āĻ āĻŽāĻžāϤā§āϰāĻžāϰ āĻŦā§āώāĻŽā§āϝ āĻĒā§āϰāĻāĻžāĻļ āĻāϰā§āĻĒā§āϞāĻžāĻ-āĻāύā§āĻĄ-āĻĒā§āϞ⧠āĻĄāĻŋāĻāĻžāĻāύ : TRIM āϝā§āĻā§āύ⧠āĻā§āϰā§āϤā§āĻŦ-āĻāĻŋāϤā§āϤāĻŋāĻ āϏā§āĻā§āϰāĻŋāĻ āĻĒā§āϰā§āύāĻŋāĻ āĻ
ā§āϝāĻžāϞāĻāϰāĻŋāĻĻāĻŽā§āϰ āϏāĻžāĻĨā§ āĻāĻā§āĻā§āϤ āĻšāϤ⧠āĻĒāĻžāϰā§, āĻāĻžāϞ āϏāĻžāϰā§āĻŦāĻāύā§āύāϤāĻž āϏāĻšāĻāĻāύ āĻŽā§āϝāĻžāĻā§āϰāĻŋāĻā§āϏ W â R^(DÃN) āĻĻā§āĻāϝāĻŧāĻž, āϝā§āĻāĻžāύ⧠D āĻšāϞ āĻāĻāĻāĻĒā§āĻ āĻŽāĻžāϤā§āϰāĻžāϰ āϏāĻāĻā§āϝāĻž, N āĻšāϞ āĻāύāĻĒā§āĻ āĻŽāĻžāϤā§āϰāĻžāϰ āϏāĻāĻā§āϝāĻž, āϞāĻā§āώā§āϝ āĻšāϞ āĻĒā§āϰāϤāĻŋāĻāĻŋ āĻāĻāĻāĻĒā§āĻ āĻŽāĻžāϤā§āϰāĻž Wi,: āĻāϰ āĻāύā§āϝ āϏāϰā§āĻŦā§āϤā§āϤāĻŽ āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰ Si āύāĻŋāϰā§āϧāĻžāϰāĻŖ āĻāϰāĻž, āϝāĻžāϤ⧠āĻāĻĄāĻŧ āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰ āϏā§āĻŽāĻžāĻŦāĻĻā§āϧāϤāĻž āĻĒā§āϰāĻŖ āĻāϰāĻžāϰ āϏāĻŽāϝāĻŧ āϏā§āϤāϰā§āϰ āϏāĻžāĻŽāĻā§āϰāĻŋāĻ āĻā§āĻŖāĻŽāĻžāύ āϏāϰā§āĻŦāĻžāϧāĻŋāĻ āĻāϰāĻž āϝāĻžāϝāĻŧāĨ¤
TRIM āĻŽāĻžāϤā§āϰāĻž-āϏā§āϤāϰā§āϰ āĻŦāĻŋāϰāϞāϤāĻž āĻā§āĻā§āĻāϰ S = S1, S2, ..., SD āϏāĻāĻā§āĻāĻžāϝāĻŧāĻŋāϤ āĻāϰā§, āϝā§āĻāĻžāύ⧠Si â 0,1 i-āϤāĻŽ āĻāĻāĻāĻĒā§āĻ āĻŽāĻžāϤā§āϰāĻžāϰ āϞāĻā§āώā§āϝ āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰ āύāĻŋāϰā§āĻĻāĻŋāώā§āĻ āĻāϰā§āĨ¤ āϏā§āĻŽāĻžāĻŦāĻĻā§āϧāϤāĻž āĻšāϞ:
1/D * ÎŖ(i=1 to D) Si = T
āϝā§āĻāĻžāύ⧠T āĻšāϞ āϏā§āϤāϰā§āϰ āϞāĻā§āώā§āϝ āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰāĨ¤
āĻ
ā§āϝāĻžāϞāĻāϰāĻŋāĻĻāĻŽ 1: āĻĒā§āύāϰāĻžāĻŦā§āϤā§āϤāĻŋāĻŽā§āϞāĻ āĻŽāĻžāϤā§āϰāĻž-āĻāĻŋāϤā§āϤāĻŋāĻ āĻŦāĻŋāϰāϞāϤāĻž āϏāĻŽāύā§āĻŦāϝāĻŧ
āĻāϰāĻŽā§āĻā§āĻāϰāĻŖ : āĻ
āĻĒā§āϰā§āύāĻĄ āĻāĻāĻāĻĒā§āĻ Y â WX āĻāĻŖāύāĻž āĻāϰā§āύ, Si = T āĻāϰāĻŽā§āĻ āĻāϰā§āύ (āĻāĻā§āĻā§āϤ āĻŦāĻŋāϤāϰāĻŖ)āĻĒā§āύāϰāĻžāĻŦā§āϤā§āϤāĻŋāĻŽā§āϞāĻ āĻ
āĻĒā§āĻāĻŋāĻŽāĻžāĻāĻā§āĻļāύ (K āĻŦāĻžāϰ):āĻŦāϰā§āϤāĻŽāĻžāύ S āĻāϰ āĻāĻĒāϰ āĻāĻŋāϤā§āϤāĻŋ āĻāϰ⧠āĻĒā§āϰā§āύ āĻāϰā§āύ Wpruned āĻĒāĻžāύ āĻĒā§āϰā§āύāĻĄ āĻāĻāĻāĻĒā§āĻ Åļ â WprunedX āĻāĻŖāύāĻž āĻāϰā§āύ āϏāĻžāĻŽāĻā§āϰāĻŋāĻ āĻā§āĻŖāĻŽāĻžāύ āĻŽā§āϞā§āϝāĻžāϝāĻŧāύ āĻāϰā§āύ qk â Qmetric(Y, Åļ) āϏāϰā§āĻŦā§āϤā§āϤāĻŽ āĻāύāĻĢāĻŋāĻāĻžāϰā§āĻļāύ āĻāĻĒāĻĄā§āĻ āĻāϰā§āύ (āϝāĻĻāĻŋ qk > qbest) āĻĒā§āϰāϤāĻŋāĻāĻŋ āĻŽāĻžāϤā§āϰāĻžāϰ āĻā§āĻŖāĻŽāĻžāύ āĻāĻŖāύāĻž āĻāϰā§āύ ci â QmetricDimwise(Yi,:, Åļi,:) āĻā§āĻŖāĻŽāĻžāύ āϏā§āĻā§āϰ 0,1 āĻĒāϰāĻŋāϏāϰ⧠āϏā§āĻŦāĻžāĻāĻžāĻŦāĻŋāĻ āĻāϰā§āύ āĻļā§āĻāĻžāϰ āĻšāĻžāϰ Îą āĻĻāĻŋāϝāĻŧā§ āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰ āϏāĻŽāύā§āĻŦāϝāĻŧ āĻāϰā§āύ: δi â Îąc'i āĻāĻĄāĻŧ āϏā§āĻŽāĻžāĻŦāĻĻā§āϧāϤāĻž āĻŦāĻāĻžāϝāĻŧ āϰāĻžāĻāϤ⧠āĻĒā§āύāϰāĻžāϝāĻŧ āĻā§āύā§āĻĻā§āϰā§āĻā§āϤ āĻāϰā§āύ: Si â δi - (1/D)ÎŖÎ´j + T āϰāĻŋāĻāĻžāϰā§āύ : āϏāϰā§āĻŦā§āϤā§āϤāĻŽ āĻŦāĻŋāϰāϞāϤāĻž āĻŦāϰāĻžāĻĻā§āĻĻ SbestāϏā§āϤāϰ-āϏā§āϤāϰā§āϰ āĻā§āĻŖāĻŽāĻžāύ : āϏāĻŽā§āĻĒā§āϰā§āĻŖ āϏā§āϤāϰ āĻĒā§āϰā§āύāĻŋāĻ āĻā§āĻŖāĻŽāĻžāύ āĻŽā§āϞā§āϝāĻžāϝāĻŧāύ āĻāϰāϤ⧠āĻā§āϏāĻžāĻāύ āϏāĻžāĻĻā§āĻļā§āϝ āĻŦā§āϝāĻŦāĻšāĻžāϰ āĻāϰā§āύāĻŽāĻžāϤā§āϰāĻž-āϏā§āϤāϰā§āϰ āĻā§āĻŖāĻŽāĻžāύ : āĻĒā§āϰāϤāĻŋāĻāĻŋ āĻāĻāĻāĻĒā§āĻ āĻŽāĻžāϤā§āϰāĻžāϰ āĻā§āϏāĻžāĻāύ āϏāĻžāĻĻā§āĻļā§āϝ āĻāĻŖāύāĻž āĻāϰā§āύ, āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰ āϏāĻŽāύā§āĻŦāϝāĻŧ āύāĻŋāϰā§āĻĻā§āĻļāύāĻž āĻĻāĻŋāύāĻ
āĻāĻŋāϝā§āĻāĻŋāϤ āĻļā§āĻāĻžāϰ āĻšāĻžāϰ : āĻāϤāĻŋāĻŦāĻžāĻāĻ āĻāĻŦāĻ āύā§āϤāĻŋāĻŦāĻžāĻāĻ āĻļā§āĻāĻžāϰ āĻšāĻžāϰ āϏāĻŽāϰā§āĻĨāύ āĻāϰā§, āĻāϤāĻŋāĻŦāĻžāĻāĻ āĻļā§āĻāĻžāϰ āĻšāĻžāϰ āĻā§āĻŖāĻŽāĻžāύ āĻŦā§āĻāĻŋāϤā§āϰā§āϝ āĻšā§āϰāĻžāϏ āĻāϰā§, āύā§āϤāĻŋāĻŦāĻžāĻāĻ āĻļā§āĻāĻžāϰ āĻšāĻžāϰ āĻŦāĻšāĻŋāϰāĻžāĻāϤ-āĻā§āύā§āĻĻā§āϰā§āĻā§āϤ āϏā§āϤāϰā§āϰ āĻāύā§āϝ āĻĒā§āϰāϝā§āĻā§āϝāĻā§āĻŖāĻŽāĻžāύ āĻŦā§āĻāĻŋāϤā§āϰā§āϝ āύā§āϝā§āύāϤāĻŽāĻāϰāĻŖ : āĻŽāĻžāϤā§āϰāĻž āĻā§āĻĄāĻŧā§ āĻā§āĻŖāĻŽāĻžāύ āĻ
āĻŦāύāϤāĻŋāϰ āĻŦā§āĻāĻŋāϤā§āϰā§āϝ āĻšā§āϰāĻžāϏ āĻāϰ⧠āϏāĻžāĻŽāĻā§āϰāĻŋāĻ āĻāϰā§āĻŽāĻā§āώāĻŽāϤāĻž āĻāύā§āύāϤ āĻāϰā§āύāϏāĻžāĻŽāĻā§āĻāϏā§āϝāĻĒā§āϰā§āĻŖ āĻĄāĻŋāĻāĻžāĻāύ : āĻŦāĻŋāĻĻā§āϝāĻŽāĻžāύ āϏā§āĻā§āϰāĻŋāĻ āύāĻŋāϝāĻŧāĻŽ (Wanda, Magnitude, SparseGPT, GBLM) āĻāϰ āϏāĻžāĻĨā§ āĻāĻā§āĻā§āϤ āĻšāϤ⧠āĻĒāĻžāϰā§āĻŽāĻĄā§āϞ : Qwen2.5 (3B/7B/14B/32B/72B), LLaMA-2 (7B/13B), OPT (6.7B/13B)āĻŽā§āϞā§āϝāĻžāϝāĻŧāύ āĻĄā§āĻāĻž : WikiText āϝāĻžāĻāĻžāĻāĻāϰāĻŖ āϏā§āĻ (āĻŦāĻŋāĻā§āϰāĻžāύā§āϤāĻŋ), C4 āĻāĻŦāĻ Pile (āϏāĻžāϧāĻžāϰāĻŖā§āĻāϰāĻŖ āϝāĻžāĻāĻžāĻāĻāϰāĻŖ)āĻĄāĻžāĻāύāϏā§āĻā§āϰāĻŋāĻŽ āĻāĻžāĻ : BoolQ, RTE, HellaSwag, WinoGrande, ARC Easy/Challenge, OpenBookQAāĻŦāĻŋāĻā§āϰāĻžāύā§āϤāĻŋ : WikiText āϝāĻžāĻāĻžāĻāĻāϰāĻŖ āϏā§āĻā§ āĻāĻžāώāĻž āĻŽāĻĄā§āϞāĻŋāĻ āĻā§āώāĻŽāϤāĻž āĻŽā§āϞā§āϝāĻžāϝāĻŧāύ āĻāϰā§āύāĻļā§āύā§āϝ-āĻļāĻ āύāĻŋāϰā§āĻā§āϞāϤāĻž : ā§āĻāĻŋ āĻĄāĻžāĻāύāϏā§āĻā§āϰāĻŋāĻŽ āĻāĻžāĻā§ āĻāĻĄāĻŧ āĻāϰā§āĻŽāĻā§āώāĻŽāϤāĻžāĻŦā§āϏāϞāĻžāĻāύ āĻĒāĻĻā§āϧāϤāĻŋ : OWL, AlphaPruning (Wanda-āĻāĻŋāϤā§āϤāĻŋāĻ)āĻ
ā§āϝāĻžāĻŦāϞā§āĻļāύ āĻ
āϧā§āϝāϝāĻŧāύ : āĻŦāĻŋāĻāĻŋāύā§āύ āĻā§āĻŖāĻŽāĻžāύ āĻŽā§āĻā§āϰāĻŋāĻā§āϏ, āĻļā§āĻāĻžāϰ āĻšāĻžāϰ āϏā§āĻāĻŋāĻāϏ, āĻĒā§āύāϰāĻžāĻŦā§āϤā§āϤāĻŋ āϏāĻāĻā§āϝāĻžāϰ āĻĒā§āϰāĻāĻžāĻŦāĻā§āϝāĻžāϞāĻŋāĻŦā§āϰā§āĻļāύ āύāĻŽā§āύāĻž : C4 āĻĄā§āĻāĻžāϏā§āĻ āĻĨā§āĻā§ āĻāϞā§āĻŽā§āϞā§āĻāĻžāĻŦā§ āύāĻŋāϰā§āĻŦāĻžāĻāĻŋāϤ, āϏāĻŋāĻā§āϝāĻŧā§āύā§āϏ āĻĻā§āϰā§āĻā§āϝ 2048āĻŦāĻŋāϰāϞāϤāĻžāϰ āϏā§āĻŽāĻž : āĻāĻāĻ āĻŽāĻžāϤā§āϰāĻž āϏāϰā§āĻŦāĻžāϧāĻŋāĻ 95% āĻ
āϤāĻŋāĻĢāĻŋāĻāĻŋāĻ āĻĒā§āϰāϤāĻŋāϰā§āϧ āĻāϰāϤā§āĻšāĻžāĻāĻĒāĻžāϰāĻĒā§āϝāĻžāϰāĻžāĻŽāĻŋāĻāĻžāϰ : K=10 āĻĒā§āύāϰāĻžāĻŦā§āϤā§āϤāĻŋ, āĻļā§āĻāĻžāϰ āĻšāĻžāϰ Îą āĻā§āϰāĻŋāĻĄ āĻ
āύā§āϏāύā§āϧāĻžāύā§āϰ āĻŽāĻžāϧā§āϝāĻŽā§ āύāĻŋāϰā§āϧāĻžāϰāĻŋāϤāĻŽāĻĄā§āϞ OWL āĻŦā§āϏāϞāĻžāĻāύ OWL+TRIM āĻāύā§āύāϤāĻŋāϰ āĻĒāϰāĻŋāĻŽāĻžāĻŖ Qwen2.5-14B 348.48 180.67 -48% OPT-13B 6461.43 324.14 -95% LLaMA-2-13B 225.04 154.83 -31%
TRIM āϏāĻŽāϏā§āϤ āĻĒāϰā§āĻā§āώāĻŋāϤ āĻŽāĻĄā§āϞ āĻāĻŦāĻ āĻŦāĻŋāϰāϞāϤāĻžāϰ āϏā§āϤāϰ⧠āĻāϰā§āĻŽāĻā§āώāĻŽāϤāĻž āĻāύā§āύāϤāĻŋ āĻ
āϰā§āĻāύ āĻāϰā§, ā§Žā§Ļ% āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰ⧠āĻāĻĄāĻŧ 0.46-0.65 āĻļāϤāĻžāĻāĻļ āĻĒāϝāĻŧā§āύā§āĻ āĻāύā§āύāϤāĻŋāĨ¤
āϏā§āϤāϰ-āϏā§āϤāϰā§āϰ āĻā§āĻŖāĻŽāĻžāύ : āĻā§āϏāĻžāĻāύ āϏāĻžāĻĻā§āĻļā§āϝ āϏāĻŦāĻā§āϝāĻŧā§ āϏā§āĻĨāĻŋāϤāĻŋāĻļā§āϞ āĻāϰā§āĻŽāĻā§āώāĻŽāϤāĻž āĻĻā§āĻāĻžāϝāĻŧāĻŽāĻžāϤā§āϰāĻž-āϏā§āϤāϰā§āϰ āĻā§āĻŖāĻŽāĻžāύ : āĻā§āϏāĻžāĻāύ āϏāĻžāĻĻā§āĻļā§āϝ MSE āĻāĻŦāĻ PSNR āĻāϰ āϤā§āϞāύāĻžāϝāĻŧ āĻāϰāĻ āύāĻŋāϰā§āĻāϰāϝā§āĻā§āϝTRIM Magnitude, SparseGPT, GBLM āĻāϤā§āϝāĻžāĻĻāĻŋ āĻŦāĻŋāĻāĻŋāύā§āύ āϏā§āĻā§āϰāĻŋāĻ āύāĻŋāϝāĻŧāĻŽā§ āĻāύā§āύāϤāĻŋ āĻĻā§āĻāĻžāϝāĻŧ, āĻĒāĻĻā§āϧāϤāĻŋāϰ āϏāĻžāϰā§āĻŦāĻāύā§āύāϤāĻž āϝāĻžāĻāĻžāĻ āĻāϰā§āĨ¤
Gini āϏāĻšāĻ āĻŦāĻŋāĻļā§āϞā§āώāĻŖā§āϰ āĻŽāĻžāϧā§āϝāĻŽā§ āĻāĻŦāĻŋāώā§āĻāĻžāϰ āĻāϰāĻž āϝāĻžāϝāĻŧ āϝ⧠āĻŦāĻŋāĻāĻŋāύā§āύ āĻāĻāĻāĻĒā§āĻ āĻŽāĻžāϤā§āϰāĻžāϰ āĻā§āϰā§āϤā§āĻŦ āϏā§āĻā§āϰā§āϰ āĻāύāϤā§āĻŦā§ āĻāϞā§āϞā§āĻāϝā§āĻā§āϝ āĻĒāĻžāϰā§āĻĨāĻā§āϝ āϰāϝāĻŧā§āĻā§, āϝāĻž āĻĒā§āϰā§āύāĻŋāĻ āϏāĻāĻŦā§āĻĻāύāĻļā§āϞāϤāĻžāϰ āĻĻāĻŋāĻā§ āĻĒāϰāĻŋāĻāĻžāϞāĻŋāϤ āĻāϰā§āĨ¤
āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰ āĻŦā§āĻĻā§āϧāĻŋāϰ āϏāĻžāĻĨā§ āϏāĻžāĻĨā§, āĻā§āĻŖāĻŽāĻžāύ āĻ
āĻŦāύāϤāĻŋ āϤā§āĻŦāϰāĻžāύā§āĻŦāĻŋāϤ āĻĒā§āϰāĻŦāĻŖāϤāĻž āĻĒā§āϰāĻĻāϰā§āĻļāύ āĻāϰā§, āϏā§āĻā§āώā§āĻŽ āĻŦāϰāĻžāĻĻā§āĻĻ āĻāϰāĻ āĻā§āϰā§āϤā§āĻŦāĻĒā§āϰā§āĻŖ āĻāϰ⧠āϤā§āϞā§āĨ¤
āĻĒāϰā§āĻā§āώāĻž āĻĻā§āĻāĻžāϝāĻŧ āϝ⧠āĻāĻāĻ āĻŽāĻžāϤā§āϰāĻž āϏāĻŽā§āĻĒā§āϰā§āĻŖāĻāĻžāĻŦā§ āϏāϰāĻžāύā§āϰ āĻĒā§āϰāĻāĻžāĻŦ āĻŦāĻŋāĻļāĻžāϞ:
āύā§āϝā§āύāϤāĻŽ L2 āύāϰā§āĻŽ āĻŽāĻžāϤā§āϰāĻž: āĻŦāĻŋāĻā§āϰāĻžāύā§āϤāĻŋ āĻŽāĻžāϤā§āϰ 0.16 āĻŦā§āĻĻā§āϧāĻŋ āĻĒāĻžāϝāĻŧ āϏāϰā§āĻŦāĻžāϧāĻŋāĻ L2 āύāϰā§āĻŽ āĻŽāĻžāϤā§āϰāĻž: āĻŦāĻŋāĻā§āϰāĻžāύā§āϤāĻŋ 273.10 āĻ āĻŦā§āĻĻā§āϧāĻŋ āĻĒāĻžāϝāĻŧ āĻā§āϰā§āĻĄāĻŋāϝāĻŧā§āύā§āĻ-āĻāĻŋāϤā§āϤāĻŋāĻ āĻĒāĻĻā§āϧāϤāĻŋ : SNIP, GraSP, SynFlow āĻāϤā§āϝāĻžāĻĻāĻŋ, āĻā§āϰā§āĻĄāĻŋāϝāĻŧā§āύā§āĻ āϤāĻĨā§āϝ āĻāĻŦāĻ āĻĒā§āύāϰāĻžāϝāĻŧ āĻĒā§āϰāĻļāĻŋāĻā§āώāĻŖ āĻĒā§āϰāϝāĻŧā§āĻāύāĻāĻāĻāĻžāϞā§āύ āĻĒā§āϰā§āύāĻŋāĻ āĻĒāĻĻā§āϧāϤāĻŋ : SparseGPT, Wanda āĻāϤā§āϝāĻžāĻĻāĻŋ, āĻĒā§āύāϰāĻžāϝāĻŧ āĻĒā§āϰāĻļāĻŋāĻā§āώāĻŖā§āϰ āĻĒā§āϰāϝāĻŧā§āĻāύ āύā§āĻ āĻāĻŋāύā§āϤ⧠āĻāϰā§āĻŽāĻā§āώāĻŽāϤāĻž āϏā§āĻŽāĻŋāϤāϏā§āϤāϰ-āϏā§āϤāϰā§āϰ āĻ
āĻāĻŋāϝā§āĻāĻŋāϤ āĻĒāĻĻā§āϧāϤāĻŋ : OWL, AlphaPruning āĻāϤā§āϝāĻžāĻĻāĻŋ, āĻŦāĻŋāĻāĻŋāύā§āύ āϏā§āϤāϰ⧠āĻŦāĻŋāĻāĻŋāύā§āύ āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰ āĻŦāϰāĻžāĻĻā§āĻĻ āĻāϰā§TRIM āϏā§āϤāϰā§āϰ āĻŽāϧā§āϝ⧠āĻŽāĻžāϤā§āϰāĻž-āϏā§āϤāϰā§āϰ āĻŦāĻŋāϰāϞāϤāĻž āĻŦāϰāĻžāĻĻā§āĻĻ āĻāϰāĻžāϰ āĻĒā§āϰāĻĨāĻŽ āĻĒāĻĻā§āϧāϤāĻŋ, āϏā§āĻā§āώā§āĻŽ-āĻĻāĻžāύāĻžāĻĻāĻžāϰ āύāĻŋāϝāĻŧāύā§āϤā§āϰāĻŖā§ āĻŦāĻŋāĻĻā§āϝāĻŽāĻžāύ āĻĒāĻĻā§āϧāϤāĻŋāϰ āĻĢāĻžāĻāĻ āĻĒā§āϰāĻŖ āĻāϰā§āĨ¤
āĻŽāĻžāϤā§āϰāĻž-āϏā§āϤāϰā§āϰ āĻŦāĻŋāϰāϞāϤāĻž āĻŦāϰāĻžāĻĻā§āĻĻā§āϰ āĻĒā§āϰāϝāĻŧā§āĻāύā§āϝāĻŧāϤāĻž : āĻāϰāĻŽ āĻŦāĻŋāϰāϞāϤāĻžāϝāĻŧ, āϏā§āĻā§āώā§āĻŽ-āĻĻāĻžāύāĻžāĻĻāĻžāϰ āύāĻŋāϝāĻŧāύā§āϤā§āϰāĻŖ āĻŽāĻĄā§āϞ āĻāϰā§āĻŽāĻā§āώāĻŽāϤāĻž āĻŦāĻāĻžāϝāĻŧ āϰāĻžāĻāĻžāϰ āĻāύā§āϝ āĻā§āϰā§āϤā§āĻŦāĻĒā§āϰā§āĻŖāĻā§āĻŖāĻŽāĻžāύ āĻŦā§āĻāĻŋāϤā§āϰā§āϝ āύā§āϝā§āύāϤāĻŽāĻāϰāĻŖā§āϰ āĻāĻžāϰā§āϝāĻāĻžāϰāĻŋāϤāĻž : āĻŽāĻžāϤā§āϰāĻž āĻā§āĻĄāĻŧā§ āĻā§āĻŖāĻŽāĻžāύ āĻ
āĻŦāύāϤāĻŋ āĻāĻžāϰāϏāĻžāĻŽā§āϝ āϰā§āĻā§ āϏāĻžāĻŽāĻā§āϰāĻŋāĻ āĻāϰā§āĻŽāĻā§āώāĻŽāϤāĻž āĻāϞā§āϞā§āĻāϝā§āĻā§āϝāĻāĻžāĻŦā§ āĻāύā§āύāϤ āĻāϰāĻž āϝāĻžāϝāĻŧāĻĒāĻĻā§āϧāϤāĻŋāϰ āϏāĻžāϰā§āĻŦāĻāύā§āύāϤāĻž : TRIM āĻāĻāĻžāϧāĻŋāĻ āĻŦāĻŋāĻĻā§āϝāĻŽāĻžāύ āĻĒā§āϰā§āύāĻŋāĻ āĻ
ā§āϝāĻžāϞāĻāϰāĻŋāĻĻāĻŽā§āϰ āϏāĻžāĻĨā§ āĻāĻā§āĻā§āϤ āĻšāϤ⧠āĻĒāĻžāϰā§, āĻāĻžāϞ āϏāĻŽā§āĻĒā§āϰāϏāĻžāϰāĻŖāϝā§āĻā§āϝāϤāĻž āϏāĻšāĻļā§āĻāĻžāϰ āĻšāĻžāϰ āύāĻŋāϰā§āĻŦāĻžāĻāύā§āϰ āĻāĻāĻŋāϞāϤāĻž : āĻŦāĻšāĻŋāϰāĻžāĻāϤ-āĻā§āύā§āĻĻā§āϰā§āĻā§āϤ āϏā§āϤāϰā§āϰ āĻāύā§āϝ āύā§āϤāĻŋāĻŦāĻžāĻāĻ āĻļā§āĻāĻžāϰ āĻšāĻžāϰ āĻĒā§āϰāϝāĻŧā§āĻāύ, āĻšāĻžāĻāĻĒāĻžāϰāĻĒā§āϝāĻžāϰāĻžāĻŽāĻŋāĻāĻžāϰ āĻāĻŋāĻāύāĻŋāĻ āĻāĻāĻŋāϞāϤāĻž āĻŦā§āĻĻā§āϧāĻŋ āĻāϰā§āĻ
-āĻāĻžāĻ āĻžāĻŽā§āĻāϤ āĻŦāĻŋāϰāϞāϤāĻž : āĻŦāϰā§āϤāĻŽāĻžāύ āĻĒāĻĻā§āϧāϤāĻŋ āϏāϰāĻžāϏāϰāĻŋ n:m āĻāϰ āĻŽāϤ⧠āĻāĻžāĻ āĻžāĻŽā§āĻāϤ āĻŦāĻŋāϰāϞāϤāĻž āĻĒā§āϝāĻžāĻāĻžāϰā§āύ āϏāĻŽāϰā§āĻĨāύ āĻāϰ⧠āύāĻžāĻāĻŖāύāĻž āĻāĻāĻžāϰāĻšā§āĻĄ : āĻĒā§āύāϰāĻžāĻŦā§āϤā§āϤāĻŋāĻŽā§āϞāĻ āĻĒā§āϰāĻā§āϰāĻŋāϝāĻŧāĻž āĻĒā§āϰāĻžāϝāĻŧ 8% āĻāĻžāϞ⧠āϏāĻŽāϝāĻŧ āĻŦā§āĻĻā§āϧāĻŋ āĻāϰā§āĻāĻžāĻ āĻžāĻŽā§āĻāϤ āĻŦāĻŋāϰāϞāϤāĻž āϏāĻŽāϰā§āĻĨāύ : TRIM āϏāĻŽā§āĻĒā§āϰāϏāĻžāϰāĻŖ āĻāϰā§āύ āĻšāĻžāϰā§āĻĄāĻāϝāĻŧā§āϝāĻžāϰ-āĻŦāĻžāύā§āϧāĻŦ āĻŦāĻŋāϰāϞāϤāĻž āĻĒā§āϝāĻžāĻāĻžāϰā§āύ āϏāĻŽāϰā§āĻĨāύ āĻāϰāϤā§āϏā§āĻŦāϝāĻŧāĻāĻā§āϰāĻŋāϝāĻŧ āĻļā§āĻāĻžāϰ āĻšāĻžāϰ āύāĻŋāϰā§āĻŦāĻžāĻāύ : āĻšāĻžāĻāĻĒāĻžāϰāĻĒā§āϝāĻžāϰāĻžāĻŽāĻŋāĻāĻžāϰ āĻāĻŋāĻāύāĻŋāĻ āĻĒā§āϰāϝāĻŧā§āĻāύ āĻšā§āϰāĻžāϏ āĻāϰāϤ⧠āĻ
āĻāĻŋāϝā§āĻāĻŋāϤ āĻĒā§āϰāĻā§āϰāĻŋāϝāĻŧāĻž āĻŦāĻŋāĻāĻžāĻļ āĻāϰā§āύāϤāĻžāϤā§āϤā§āĻŦāĻŋāĻ āĻŦāĻŋāĻļā§āϞā§āώāĻŖ : āĻŽāĻžāϤā§āϰāĻž āĻā§āϰā§āϤā§āĻŦ āĻāĻŦāĻ āĻĒā§āϰā§āύāĻŋāĻ āϏāĻāĻŦā§āĻĻāύāĻļā§āϞāϤāĻžāϰ āϤāĻžāϤā§āϤā§āĻŦāĻŋāĻ āĻāĻžāĻ āĻžāĻŽā§ āĻĒā§āϰāϤāĻŋāώā§āĻ āĻž āĻāϰā§āύāĻļāĻā§āϤāĻŋāĻļāĻžāϞ⧠āĻāĻĻā§āĻāĻžāĻŦāύ⧠: āĻĒā§āϰāĻĨāĻŽāĻŦāĻžāϰ āĻŽāĻžāϤā§āϰāĻž-āϏā§āϤāϰā§āϰ āĻŦāĻŋāϰāϞāϤāĻž āĻŦāϰāĻžāĻĻā§āĻĻ āĻĒā§āϰāϏā§āϤāĻžāĻŦ āĻāϰā§, āύāϤā§āύ āĻāĻŋāύā§āϤāĻžāĻāĻžāĻŦāύāĻžāĻĒāϰā§āϝāĻžāĻĒā§āϤ āĻĒāϰā§āĻā§āώāĻž : āĻāĻāĻžāϧāĻŋāĻ āĻŽāĻĄā§āϞ āĻĒāϰāĻŋāĻŦāĻžāϰ āĻāĻŦāĻ āĻāĻžāĻā§ āĻĒāĻĻā§āϧāϤāĻŋāϰ āĻāĻžāϰā§āϝāĻāĻžāϰāĻŋāϤāĻž āϝāĻžāĻāĻžāĻ āĻāϰā§āϤāĻžāϤā§āϤā§āĻŦāĻŋāĻ āϏāĻŽāϰā§āĻĨāύ : āĻāĻā§āϰ āĻŦāĻŋāĻļā§āϞā§āώāĻŖā§āϰ āĻŽāĻžāϧā§āϝāĻŽā§ āĻĒāĻĻā§āϧāϤāĻŋ āĻāĻžāϰā§āϝāĻāĻžāϰāĻŋāϤāĻžāϰ āĻŽā§āϞ āĻāĻžāϰāĻŖ āĻĒā§āϰāĻāĻžāĻļ āĻāϰā§āĻāĻā§āĻ āĻŦā§āϝāĻŦāĻšāĻžāϰāĻŋāĻ āĻŽā§āϞā§āϝ : āĻĒā§āϞāĻžāĻ-āĻāύā§āĻĄ-āĻĒā§āϞ⧠āĻĄāĻŋāĻāĻžāĻāύ āĻŦāĻŋāĻĻā§āϝāĻŽāĻžāύ āϏāĻŋāϏā§āĻā§āĻŽā§ āϏāĻšāĻ āĻāĻā§āĻāϰāĻŖ āĻāϰā§āĻĒāĻĻā§āϧāϤāĻŋ āĻāĻāĻŋāϞāϤāĻž : āĻŦā§āϏāϞāĻžāĻāύ āĻĒāĻĻā§āϧāϤāĻŋāϰ āϤā§āϞāύāĻžāϝāĻŧ āĻ
ā§āϝāĻžāϞāĻāϰāĻŋāĻĻāĻŽ āĻāĻāĻŋāϞāϤāĻž āĻāĻŦāĻ āĻšāĻžāĻāĻĒāĻžāϰāĻĒā§āϝāĻžāϰāĻžāĻŽāĻŋāĻāĻžāϰ āĻŦā§āĻĻā§āϧāĻŋ āĻāϰā§āĻšāĻžāϰā§āĻĄāĻāϝāĻŧā§āϝāĻžāϰ āĻ
āĻāĻŋāϝā§āĻāύāϝā§āĻā§āϝāϤāĻž : āĻ
-āĻāĻžāĻ āĻžāĻŽā§āĻāϤ āĻŦāĻŋāϰāϞāϤāĻž āĻŦāĻŋāĻļā§āώāĻžāϝāĻŧāĻŋāϤ āĻšāĻžāϰā§āĻĄāĻāϝāĻŧā§āϝāĻžāϰ⧠āϤā§āĻŦāϰāĻŖ āϏā§āĻŽāĻŋāϤ āĻāϰā§āĻ
āĻĒāϰā§āϝāĻžāĻĒā§āϤ āϤāĻžāϤā§āϤā§āĻŦāĻŋāĻ āĻŦāĻŋāĻļā§āϞā§āώāĻŖ : āϏāϰā§āĻŦā§āϤā§āϤāĻŽ āĻŦāĻŋāϰāϞāϤāĻž āĻŦāϰāĻžāĻĻā§āĻĻā§āϰ āĻāύā§āϝ āϤāĻžāϤā§āϤā§āĻŦāĻŋāĻ āĻā§āϝāĻžāϰāĻžāύā§āĻāĻŋ āĻ
āĻāĻžāĻŦāĻāĻāĻžāĻĄā§āĻŽāĻŋāĻ āĻ
āĻŦāĻĻāĻžāύ : LLM āĻĒā§āϰā§āύāĻŋāĻ āĻā§āώā§āϤā§āϰ⧠āύāϤā§āύ āĻāĻŦā§āώāĻŖāĻž āĻĻāĻŋāĻāύāĻŋāϰā§āĻĻā§āĻļāύāĻž āĻĒā§āϰāĻĻāĻžāύ āĻāϰā§āĻŦā§āϝāĻŦāĻšāĻžāϰāĻŋāĻ āĻŽā§āϞā§āϝ : āϏāĻŽā§āĻĒāĻĻ-āϏā§āĻŽāĻŋāϤ āĻĒāϰāĻŋāĻŦā§āĻļā§ āĻŦāĻĄāĻŧ āĻŽāĻĄā§āϞ āϏā§āĻĨāĻžāĻĒāύ⧠āĻā§āϰā§āϤā§āĻŦāĻĒā§āϰā§āĻŖ āĻ
āϰā§āĻĨ āϰāĻžāĻā§āĻĒā§āύāϰā§ā§āĻĒāĻžāĻĻāύāϝā§āĻā§āϝāϤāĻž : āĻāĻĒā§āύ āϏā§āϰā§āϏ āĻā§āĻĄ āĻĒā§āϰāĻĻāĻžāύ āĻāϰā§, āĻĒāϰāĻŦāϰā§āϤ⧠āĻāĻŦā§āώāĻŖāĻž āϏāĻšāĻāϤāϰ āĻāϰā§āĻāϰāĻŽ āĻŦāĻŋāϰāϞāϤāĻž āĻĒā§āϰāϝāĻŧā§āĻāύ : āĻŦāĻŋāĻļā§āώāĻāĻžāĻŦā§ >70% āĻŦāĻŋāϰāϞāϤāĻžāϰ āĻšāĻžāϰ āĻĒā§āϰāϝāĻŧā§āĻāύā§āϝāĻŧ āĻĻā§āĻļā§āϝāĻāϞā§āĻĒā§āϰ āĻāύā§āϝ āĻāĻĒāϝā§āĻā§āϤāϏāĻŽā§āĻĒāĻĻ-āϏā§āĻŽāĻŋāϤ āĻĒāϰāĻŋāĻŦā§āĻļ : āĻĒā§āϰāĻžāύā§āϤ āĻĄāĻŋāĻāĻžāĻāϏ, āĻŽā§āĻŦāĻžāĻāϞ āĻāϤā§āϝāĻžāĻĻāĻŋ āĻāĻŖāύāĻž āϏāĻŽā§āĻĒāĻĻ āϏā§āĻŽāĻŋāϤ āĻĻā§āĻļā§āϝāĻāϞā§āĻĒā§āĻāĻŦā§āώāĻŖāĻž āĻāĻĻā§āĻĻā§āĻļā§āϝ : āĻĒā§āϰā§āύāĻŋāĻ āĻ
ā§āϝāĻžāϞāĻāϰāĻŋāĻĻāĻŽ āĻāĻŦā§āώāĻŖāĻžāϰ āĻāύā§āϝ āύāϤā§āύ āĻŦā§āĻā§āĻāĻŽāĻžāϰā§āĻ āĻāĻŦāĻ āĻāĻŋāύā§āϤāĻžāĻāĻžāĻŦāύāĻž āĻĒā§āϰāĻĻāĻžāύ āĻāϰā§āĻĒā§āĻĒāĻžāϰāĻāĻŋ āĻĒā§āϰā§āύāĻŋāĻ āĻā§āώā§āϤā§āϰā§āϰ āĻā§āϰā§āϤā§āĻŦāĻĒā§āϰā§āĻŖ āĻāĻžāĻ āĻāĻĻā§āϧā§āϤ āĻāϰā§, āϝāĻžāϰ āĻŽāϧā§āϝ⧠āϰāϝāĻŧā§āĻā§:
āĻā§āϞāĻžāϏāĻŋāĻ āĻĒā§āϰā§āύāĻŋāĻ āĻĒāĻĻā§āϧāϤāĻŋ: Le Cun et al. (1989), Han et al. (2015) āĻāϧā§āύāĻŋāĻ LLM āĻĒā§āϰā§āύāĻŋāĻ: Sun et al. (2024) Wanda, Frantar and Alistarh (2023) SparseGPT āϏā§āϤāϰ-āϏā§āϤāϰā§āϰ āĻ
āĻāĻŋāϝā§āĻāĻŋāϤ āĻĒāĻĻā§āϧāϤāĻŋ: Yin et al. (2024) OWL, Lu et al. (2024) AlphaPruning āϏāĻžāϰāϏāĻāĻā§āώā§āĻĒ : TRIM āĻŽāĻžāϤā§āϰāĻž-āϏā§āϤāϰā§āϰ āĻŦāĻŋāϰāϞāϤāĻž āĻŦāϰāĻžāĻĻā§āĻĻ āĻĒā§āϰāĻŦāϰā§āϤāύ āĻāϰā§, āĻāϰāĻŽ āĻŦāĻŋāϰāϞāϤāĻžāϝāĻŧ LLM āĻĒā§āϰā§āύāĻŋāĻ āĻāϰā§āĻŽāĻā§āώāĻŽāϤāĻž āĻāϞā§āϞā§āĻāϝā§āĻā§āϝāĻāĻžāĻŦā§ āĻāύā§āύāϤ āĻāϰā§āĨ¤ āĻāĻ āĻĒāĻĻā§āϧāϤāĻŋ āĻā§āϰā§āϤā§āĻŦāĻĒā§āϰā§āĻŖ āϤāĻžāϤā§āϤā§āĻŦāĻŋāĻ āĻŽā§āϞā§āϝ āĻāĻŦāĻ āĻŦā§āϝāĻŦāĻšāĻžāϰāĻŋāĻ āĻ
āϰā§āĻĨ āϰāĻžāĻā§, āĻŦāĻĄāĻŧ āĻŽāĻĄā§āϞ āϏāĻāĻā§āĻāύ āĻā§āώā§āϤā§āϰ⧠āύāϤā§āύ āĻāĻŦā§āώāĻŖāĻž āĻĻāĻŋāĻāύāĻŋāϰā§āĻĻā§āĻļāύāĻž āĻā§āϞ⧠āĻĻā§āϝāĻŧāĨ¤ āĻāĻŋāĻā§ āϏā§āĻŽāĻžāĻŦāĻĻā§āϧāϤāĻž āĻĨāĻžāĻāϞā§āĻ, āĻāϰ āĻāĻĻā§āĻāĻžāĻŦāύ⧠āĻāĻŦāĻ āĻāĻžāϰā§āϝāĻāĻžāϰāĻŋāϤāĻž āĻāĻāĻŋāĻā§ āĻā§āώā§āϤā§āϰā§āϰ āĻāĻāĻāĻŋ āĻā§āϰā§āϤā§āĻŦāĻĒā§āϰā§āĻŖ āĻ
āĻŦāĻĻāĻžāύ āĻāϰ⧠āϤā§āϞā§āĨ¤