NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models
Barmpas, Lee, Koliousis et al.
Electroencephalography (EEG) captures neural activity across multiple temporal and spectral scales, yielding signals that are rich but complex for representation learning. Recently, EEG foundation models trained to predict masked signal-tokens have shown promise for learning generalizable representations. However, their performance is hindered by their signal tokenization modules. Existing neural tokenizers fail to preserve high-frequency dynamics, limiting their ability to reconstruct EEG signals with high fidelity. We introduce NeuroRVQ, a scalable Large Brainwave Model (LBM) centered on a codebook-based tokenizer. Our tokenizer integrates: (i) multi-scale feature extraction modules that capture the full frequency neural spectrum; (ii) hierarchical residual vector quantization (RVQ) codebooks for high-resolution encoding; and, (iii) an EEG signal phase- and amplitude-aware loss function for efficient training. This design enables efficient EEG compression while supporting accurate reconstruction across all frequency bands, leading to robust generative masked modeling. Our empirical results demonstrate that NeuroRVQ achieves lower reconstruction error and outperforms existing LBMs on a variety of downstream tasks. More broadly, NeuroRVQ tokenizer establishes a strong prior for codebook-based general-purpose brainwave models, enabling advances in neural decoding, generative modeling and multimodal biosignal integration.
academic
NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models
Electroencephalography (EEG) signals capture neural activity across multiple temporal and spectral scales, producing rich yet complex signals that pose challenges for representation learning. Recently, EEG foundation models trained through masked signal token prediction have shown promise in learning generalizable representations, but their performance is limited by the signal tokenization module. Existing neural tokenizers fail to preserve high-frequency dynamics, limiting their ability to reconstruct EEG signals with high fidelity. This paper introduces NeuroRVQ, a scalable large brainwave model (LBM) centered on a codebook-based tokenizer. The tokenizer integrates: (i) a multi-scale feature extraction module capturing the complete frequency neural spectrum; (ii) a hierarchical residual vector quantization (RVQ) codebook for high-resolution encoding; (iii) a phase and amplitude-aware loss function for efficient training of EEG signals.
Brain-computer interface (BCI) systems enable direct communication between the brain and the external world by analyzing brainwaves recorded by EEG devices. EEG signals can represent the complete spectrum of human experience, from sleep and emotion to motor activity. However, existing large brainwave models (LBMs) face a fundamental bottleneck—signal tokenization.
Multi-scale Characteristics: Brain activity unfolds across multiple frequency scales, including delta (0.5-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), beta (13-30 Hz), and gamma (>30 Hz) bands
Tokenization Quality: Existing tokenizers struggle to preserve complete structural information, particularly high-frequency components, which are crucial for robust generative masked modeling
Reconstruction Fidelity: Direct adoption of discrete codebook tokenizers from computer vision (e.g., VQ-VAE) fails to achieve faithful reconstruction of brain signals
The authors argue that unlocking EEG foundation-scale masked modeling hinges on tokenizer design. A well-designed tokenizer should not only compress continuous neural signals into discrete tokens but also faithfully reconstruct the original waveform across all important frequency scales.
Proposed the NeuroRVQ Tokenizer: Captures multi-scale frequency features by applying temporal convolutions with different kernel sizes
Designed a Hierarchical RVQ Codebook Structure: One codebook per frequency scale, utilizing 32 codebooks (2³² parameters) to capture complex patterns necessary for high-fidelity signal reconstruction
Introduced Phase and Amplitude-Aware Loss Function: Based on strong signal processing principles, capturing EEG signal amplitude and wrapped phase information through sine and cosine representations
Achieved SOTA Performance: 15% higher accuracy than existing LBMs on four BCI classification tasks
Effectiveness of Multi-Scale Design: Temporal convolutions with different kernel sizes successfully capture multi-frequency characteristics of EEG signals
Importance of Phase-Aware Loss: Unit circle constraints ensure geometric significance of phase predictions
Parameter Efficiency: NeuroRVQ achieves better performance than NeuroGPT (79.5M parameters) with only 5.9M parameters
Early approaches relied on hand-crafted features such as power spectral density (PSD) and independent component analysis (ICA), but suffered from limited generalization due to large inter-subject variability and noise characteristics of EEG signals.
Models such as EEGNet, EEGInception, and EEGConformer reduced dependence on hand-crafted features but still required carefully annotated data and task-specific training.
LaBraM, NeuroGPT, and CBraMod represent the development direction of EEG foundation models but all face the bottleneck of signal tokenization. NeuroRVQ addresses this critical issue through improved codebook design.
The paper cites 68 relevant references covering multiple domains including EEG analysis, deep learning, and foundation models, providing a solid theoretical foundation for the research.
Overall Assessment: This is a high-quality paper with significant contributions to the EEG signal processing and foundation model domains. Through innovative design tailored to EEG signal characteristics, it substantially improves upon existing methods and provides important momentum for the field's development.