2025-11-23T22:58:17.474910

NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models

Barmpas, Lee, Koliousis et al.

Electroencephalography (EEG) captures neural activity across multiple temporal and spectral scales, yielding signals that are rich but complex for representation learning. Recently, EEG foundation models trained to predict masked signal-tokens have shown promise for learning generalizable representations. However, their performance is hindered by their signal tokenization modules. Existing neural tokenizers fail to preserve high-frequency dynamics, limiting their ability to reconstruct EEG signals with high fidelity. We introduce NeuroRVQ, a scalable Large Brainwave Model (LBM) centered on a codebook-based tokenizer. Our tokenizer integrates: (i) multi-scale feature extraction modules that capture the full frequency neural spectrum; (ii) hierarchical residual vector quantization (RVQ) codebooks for high-resolution encoding; and, (iii) an EEG signal phase- and amplitude-aware loss function for efficient training. This design enables efficient EEG compression while supporting accurate reconstruction across all frequency bands, leading to robust generative masked modeling. Our empirical results demonstrate that NeuroRVQ achieves lower reconstruction error and outperforms existing LBMs on a variety of downstream tasks. More broadly, NeuroRVQ tokenizer establishes a strong prior for codebook-based general-purpose brainwave models, enabling advances in neural decoding, generative modeling and multimodal biosignal integration.

academic

NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models

Basic Information

Paper ID: 2510.13068
Title: NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models
Authors: Konstantinos Barmpas, Na Lee, Alexandros Koliousis, Yannis Panagakis, Dimitrios Adamos, Nikolaos Laskaris, Stefanos Zafeiriou
Classification: cs.LG cs.AI cs.HC
Publication Date: October 15, 2025 (Preprint)
Paper Link: https://arxiv.org/abs/2510.13068

Abstract

Electroencephalography (EEG) signals capture neural activity across multiple temporal and spectral scales, producing rich yet complex signals that pose challenges for representation learning. Recently, EEG foundation models trained through masked signal token prediction have shown promise in learning generalizable representations, but their performance is limited by the signal tokenization module. Existing neural tokenizers fail to preserve high-frequency dynamics, limiting their ability to reconstruct EEG signals with high fidelity. This paper introduces NeuroRVQ, a scalable large brainwave model (LBM) centered on a codebook-based tokenizer. The tokenizer integrates: (i) a multi-scale feature extraction module capturing the complete frequency neural spectrum; (ii) a hierarchical residual vector quantization (RVQ) codebook for high-resolution encoding; (iii) a phase and amplitude-aware loss function for efficient training of EEG signals.

Research Background and Motivation

Problem Definition

Brain-computer interface (BCI) systems enable direct communication between the brain and the external world by analyzing brainwaves recorded by EEG devices. EEG signals can represent the complete spectrum of human experience, from sleep and emotion to motor activity. However, existing large brainwave models (LBMs) face a fundamental bottleneck—signal tokenization.

Core Challenges

Multi-scale Characteristics: Brain activity unfolds across multiple frequency scales, including delta (0.5-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), beta (13-30 Hz), and gamma (>30 Hz) bands
Tokenization Quality: Existing tokenizers struggle to preserve complete structural information, particularly high-frequency components, which are crucial for robust generative masked modeling
Reconstruction Fidelity: Direct adoption of discrete codebook tokenizers from computer vision (e.g., VQ-VAE) fails to achieve faithful reconstruction of brain signals

Research Motivation

The authors argue that unlocking EEG foundation-scale masked modeling hinges on tokenizer design. A well-designed tokenizer should not only compress continuous neural signals into discrete tokens but also faithfully reconstruct the original waveform across all important frequency scales.

Core Contributions

Proposed the NeuroRVQ Tokenizer: Captures multi-scale frequency features by applying temporal convolutions with different kernel sizes
Designed a Hierarchical RVQ Codebook Structure: One codebook per frequency scale, utilizing 32 codebooks (2³² parameters) to capture complex patterns necessary for high-fidelity signal reconstruction
Introduced Phase and Amplitude-Aware Loss Function: Based on strong signal processing principles, capturing EEG signal amplitude and wrapped phase information through sine and cosine representations
Achieved SOTA Performance: 15% higher accuracy than existing LBMs on four BCI classification tasks

Methodology Details

Task Definition

Given a multivariate EEG time series X ∈ R^(C×T) (where T is the number of time points and C is the number of electrodes), the objectives are:

Tokenize continuous EEG signals into discrete neural tokens
Support accurate reconstruction across all frequency bands
Enable robust generative masked modeling

Model Architecture

1. Patch Generation

Divide input EEG signals into P temporal patches of length w (corresponding to 1-second time windows), yielding segmented input samples x ∈ R^(P×w).

2. Multi-Scale Temporal Encoder

Extract features at S different temporal scales using an inception-style module:

Apply 1-D temporal convolutions with different kernel sizes: K_temporal1, K_temporal2, ..., K_temporalS
Each temporal branch contains: 1-D convolution → group normalization → GELU activation → pooling (repeated twice)
Produce S outputs: F1, F2, ..., FS, where Fi ∈ R^w

3. Transformer Encoder

Introduce learnable temporal embeddings TE and spatial embeddings SE
Add multi-scale features with embeddings and pass through shared Transformer layers
Generate multi-scale patch representations: p1, p2, ..., pS ∈ R^D

4. RVQ Codebook

For each temporal branch, discretize using RVQ codebook R:

R = {Vi | i = 1, ..., N}
Vi = {vj | j = 1, ..., K} ∈ R^(K×D)

Iterative quantization process:

z1 = arg min_{v∈V1} ||l2(p1) - l2(v)||
pi+1 = pi - zi
p̂ = Σ(i=1 to N) zi

5. Tokenizer Decoder

Reconstruct the original signal based on learned codebook tokens, using Fourier spectrum as reconstruction target with three prediction heads:

log(1 + Â): log magnitude
sin φ̂: phase sine component
cos φ̂: phase cosine component

Technical Innovations

1. Unit Circle-Aware Phase Loss

Traditional methods applying MSE directly to phase suffer from periodicity boundary discontinuity issues. NeuroRVQ introduces unit circle-aware loss:

L_unit-loss = 1 - Σ_i [cos φ̂i cos φi + sin φ̂i sin φi] / [√(cos²φ̂i + sin²φ̂i) √(cos²φi + sin²φi)]
             + λ_circle · Σ_i (cos²φ̂i + sin²φ̂i - 1)²

2. Comprehensive Training Objective

LT = ||log(1 + Âi) - log(1 + Ai)||²₂ + L_unit-loss + ||X̂i - Xi||²₂ + LQ

where LQ is the quantization loss.

Experimental Setup

Datasets

Utilize 13 large-scale EEG datasets (approximately 235 hours), including:

Public Datasets: BCI Competition IV-1, Grasp and Lift, Physionet MI, and 9 others
Self-Collected Dataset: Approximately 235 hours of motor imagery data (29 channels)
All data resampled to 200 Hz

Evaluation Metrics

Reconstruction Quality: Mean squared error (MSE) across frequency bands
Downstream Tasks: Balanced accuracy using 10-fold subject-independent cross-validation

Comparison Methods

Tokenizer Comparisons: LaBraM
Foundation Model Comparisons: NeuroGPT, CBraMod, LaBraM, EEGPT, BIOT

Implementation Details

Tokenizer Training: 100 epochs, S=4 temporal branches, 4 RVQ codebooks, each containing 8 single codebooks Vi ∈ R^(8192×128)
Foundation Model Training: 50 epochs, λ_circle = 0.4
Hardware: NVIDIA DGX with 4 NVIDIA Tesla V100 GPUs

Experimental Results

Main Results

1. Tokenizer Reconstruction Performance

In-Distribution Evaluation (Table 1):

Frequency Band	Raw Signal	Delta	Theta	Alpha	Beta	Gamma
LaBraM	1.071	1.561	0.184	0.099	0.122	0.020
NeuroRVQ	0.016	0.006	0.002	0.002	0.005	0.002

NeuroRVQ achieves orders of magnitude lower reconstruction error across all frequency bands.

Out-of-Distribution Evaluation:

Consistently outperforms both versions of LaBraM on memory and motor tasks
Demonstrates superior generalization capability

2. Downstream Task Performance

Model	Motor	Memory	Sleep	Eyes	Mean	Parameters
NeuroGPT	0.682±0.083	0.597±0.029	0.674±0.033	0.827±0.036	0.695±0.045	79.5M
CBraMod	0.614±0.104	0.574±0.038	0.635±0.041	0.839±0.041	0.666±0.056	4.9M
LaBraM	0.630±0.076	0.526±0.026	0.652±0.037	0.799±0.047	0.652±0.047	5.8M
NeuroRVQ	0.700±0.073	0.574±0.027	0.728±0.028	0.869±0.026	0.717±0.038	5.9M

NeuroRVQ achieves best or near-best performance on all tasks with optimal average performance.

Ablation Studies

RVQ Layers: Experiments demonstrate that using 8 layers Vi ∈ R^(8192×128) achieves optimal reconstruction performance
Phase Representation: Sine-cosine representation significantly improves training stability compared to direct phase prediction

Experimental Findings

Effectiveness of Multi-Scale Design: Temporal convolutions with different kernel sizes successfully capture multi-frequency characteristics of EEG signals
Importance of Phase-Aware Loss: Unit circle constraints ensure geometric significance of phase predictions
Parameter Efficiency: NeuroRVQ achieves better performance than NeuroGPT (79.5M parameters) with only 5.9M parameters

Traditional EEG Analysis Methods

Early approaches relied on hand-crafted features such as power spectral density (PSD) and independent component analysis (ICA), but suffered from limited generalization due to large inter-subject variability and noise characteristics of EEG signals.

Deep Learning Era

Models such as EEGNet, EEGInception, and EEGConformer reduced dependence on hand-crafted features but still required carefully annotated data and task-specific training.

Foundation Models

LaBraM, NeuroGPT, and CBraMod represent the development direction of EEG foundation models but all face the bottleneck of signal tokenization. NeuroRVQ addresses this critical issue through improved codebook design.

Conclusions and Discussion

Main Conclusions

NeuroRVQ tokenizer achieves SOTA EEG signal reconstruction performance
Multi-scale feature extraction and hierarchical RVQ design effectively capture complex patterns in EEG signals
Phase and amplitude-aware training significantly improves tokenization quality
Achieves best performance on multiple downstream BCI tasks

Limitations

Computational Complexity: Multi-scale encoder and multiple RVQ codebooks increase computational overhead
Data Dependency: Performance still depends on the quality and diversity of large-scale pretraining data
Fixed Frequency Bands: Current design targets traditional EEG frequency bands and may not apply to other biosignals

Future Directions

Causal Inference Integration: Incorporate more targeted spatiotemporal masking strategies
Multimodal Extension: Extend principles to other biosignals
Architecture Optimization: Explore integration with larger-scale LBM architectures

In-Depth Evaluation

Strengths

Strong Technical Innovation: Multi-scale RVQ design and phase-aware loss represent important innovations tailored to EEG signal characteristics
Comprehensive Experiments: Include in-distribution and out-of-distribution evaluation, ablation studies, and multi-task validation
Solid Theoretical Foundation: Design based on signal processing principles has strong theoretical support
High Practical Value: Significantly improves performance of EEG foundation models

Weaknesses

Limited Baseline Comparisons: Primarily compared with LaBraM, lacking comparison with more codebook methods
Missing Computational Cost Analysis: Lacks detailed analysis of computational complexity and inference time
Insufficient Generalization Validation: Primarily validated on BCI tasks with limited verification on other EEG application scenarios

Impact

Academic Contribution: Provides important tokenization solution for EEG foundation models
Practical Value: Can be directly applied to improve existing BCI systems
Reproducibility: Provides detailed implementation details and hyperparameter settings

Applicable Scenarios

Applications requiring high-fidelity EEG signal reconstruction
Pretraining and fine-tuning of large-scale EEG data
Multi-task BCI system development
Biosignal foundation model research

References

The paper cites 68 relevant references covering multiple domains including EEG analysis, deep learning, and foundation models, providing a solid theoretical foundation for the research.

Overall Assessment: This is a high-quality paper with significant contributions to the EEG signal processing and foundation model domains. Through innovative design tailored to EEG signal characteristics, it substantially improves upon existing methods and provides important momentum for the field's development.