2025-11-10T02:44:53.419690

Group-Wise Optimization for Self-Extensible Codebooks in Vector Quantized Models

Zheng, Li

Vector Quantized Variational Autoencoders (VQ-VAEs) leverage self-supervised learning through reconstruction tasks to represent continuous vectors using the closest vectors in a codebook. However, issues such as codebook collapse persist in the VQ model. To address these issues, existing approaches employ implicit static codebooks or jointly optimize the entire codebook, but these methods constrain the codebook's learning capability, leading to reduced reconstruction quality. In this paper, we propose Group-VQ, which performs group-wise optimization on the codebook. Each group is optimized independently, with joint optimization performed within groups. This approach improves the trade-off between codebook utilization and reconstruction performance. Additionally, we introduce a training-free codebook resampling method, allowing post-training adjustment of the codebook size. In image reconstruction experiments under various settings, Group-VQ demonstrates improved performance on reconstruction metrics. And the post-training codebook sampling method achieves the desired flexibility in adjusting the codebook size.

academic

Group-Wise Optimization for Self-Extensible Codebooks in Vector Quantized Models

基本信息

论文ID: 2510.13331
标题: Group-Wise Optimization for Self-Extensible Codebooks in Vector Quantized Models
作者: Hong-Kai Zheng, Piji Li (南京航空航天大学)
分类: cs.CV
发表时间/会议: ICLR 2026
论文链接: https://arxiv.org/abs/2510.13331

摘要

Vector Quantized Variational Autoencoders (VQ-VAEs) 通过重构任务进行自监督学习，使用码本中最接近的向量来表示连续向量。然而，VQ模型中仍然存在码本崩溃等问题。为了解决这些问题，现有方法采用隐式静态码本或联合优化整个码本，但这些方法限制了码本的学习能力，导致重构质量下降。本文提出了Group-VQ，对码本进行分组优化。每个组独立优化，组内进行联合优化。这种方法改善了码本利用率和重构性能之间的权衡。此外，我们还引入了无需训练的码本重采样方法，允许训练后调整码本大小。在各种设置下的图像重构实验中，Group-VQ在重构指标上表现出改进的性能。

研究背景与动机

问题描述

Vector Quantization (VQ) 是一种将连续特征映射到离散token的技术，在VQ-VAE中被广泛应用。然而，传统VQ训练面临码本利用率低的问题，即只有部分码向量被使用和更新，导致"码本崩溃"，限制了模型的编码能力。

现有方法的局限性

Vanilla VQ: 每个码向量独立更新，容易导致码本崩溃
Joint VQ方法 (如SimVQ、VQGAN-LC): 通过共享参数联合优化整个码本，虽然能达到100%利用率，但限制了码本的学习能力

研究动机

作者通过实验发现，Joint VQ虽然能快速达到100%码本利用率，但在相同利用率下，其重构质量反而不如Vanilla VQ。这表明码本利用率和重构性能之间存在权衡，需要一种更好的平衡策略。

核心贡献

提出Group-VQ方法: 基于分组的码本优化方法，平衡VQ模型中的利用率和重构性能
泛化Joint VQ方法: 基于共享参数的角度重新理解Joint VQ，并引入训练后码本采样方法
无需训练的码本调整: 实现训练后灵活调整码本大小，无需重新训练模型
全面实验验证: 在图像重构任务上验证了Group-VQ和码本重采样的有效性

方法详解

任务定义

给定图像 $I \in \mathbb{R}^{H \times W \times 3}$ ，VQ-VAE首先使用编码器得到特征图 $Z \in \mathbb{R}^{h \times w \times d}$ ，然后通过量化器将每个特征向量 $z \in \mathbb{R}^d$ 替换为码本 $C = \{q_i | q_i \in \mathbb{R}^d, i = 0,1,...,n-1\}$ 中最近的码向量：