2025-11-23T22:58:17.474910

NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models

Barmpas, Lee, Koliousis et al.
Electroencephalography (EEG) captures neural activity across multiple temporal and spectral scales, yielding signals that are rich but complex for representation learning. Recently, EEG foundation models trained to predict masked signal-tokens have shown promise for learning generalizable representations. However, their performance is hindered by their signal tokenization modules. Existing neural tokenizers fail to preserve high-frequency dynamics, limiting their ability to reconstruct EEG signals with high fidelity. We introduce NeuroRVQ, a scalable Large Brainwave Model (LBM) centered on a codebook-based tokenizer. Our tokenizer integrates: (i) multi-scale feature extraction modules that capture the full frequency neural spectrum; (ii) hierarchical residual vector quantization (RVQ) codebooks for high-resolution encoding; and, (iii) an EEG signal phase- and amplitude-aware loss function for efficient training. This design enables efficient EEG compression while supporting accurate reconstruction across all frequency bands, leading to robust generative masked modeling. Our empirical results demonstrate that NeuroRVQ achieves lower reconstruction error and outperforms existing LBMs on a variety of downstream tasks. More broadly, NeuroRVQ tokenizer establishes a strong prior for codebook-based general-purpose brainwave models, enabling advances in neural decoding, generative modeling and multimodal biosignal integration.
academic

NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models

Basic Information

  • Paper ID: 2510.13068
  • Title: NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models
  • Authors: Konstantinos Barmpas, Na Lee, Alexandros Koliousis, Yannis Panagakis, Dimitrios Adamos, Nikolaos Laskaris, Stefanos Zafeiriou
  • Classification: cs.LG cs.AI cs.HC
  • Publication Date: October 15, 2025 (Preprint)
  • Paper Link: https://arxiv.org/abs/2510.13068

Abstract

Electroencephalography (EEG) signals capture neural activity across multiple temporal and spectral scales, producing rich yet complex signals that pose challenges for representation learning. Recently, EEG foundation models trained through masked signal token prediction have shown promise in learning generalizable representations, but their performance is limited by the signal tokenization module. Existing neural tokenizers fail to preserve high-frequency dynamics, limiting their ability to reconstruct EEG signals with high fidelity. This paper introduces NeuroRVQ, a scalable large brainwave model (LBM) centered on a codebook-based tokenizer. The tokenizer integrates: (i) a multi-scale feature extraction module capturing the complete frequency neural spectrum; (ii) a hierarchical residual vector quantization (RVQ) codebook for high-resolution encoding; (iii) a phase and amplitude-aware loss function for efficient training of EEG signals.

Research Background and Motivation

Problem Definition

Brain-computer interface (BCI) systems enable direct communication between the brain and the external world by analyzing brainwaves recorded by EEG devices. EEG signals can represent the complete spectrum of human experience, from sleep and emotion to motor activity. However, existing large brainwave models (LBMs) face a fundamental bottleneck—signal tokenization.

Core Challenges

  1. Multi-scale Characteristics: Brain activity unfolds across multiple frequency scales, including delta (0.5-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), beta (13-30 Hz), and gamma (>30 Hz) bands
  2. Tokenization Quality: Existing tokenizers struggle to preserve complete structural information, particularly high-frequency components, which are crucial for robust generative masked modeling
  3. Reconstruction Fidelity: Direct adoption of discrete codebook tokenizers from computer vision (e.g., VQ-VAE) fails to achieve faithful reconstruction of brain signals

Research Motivation

The authors argue that unlocking EEG foundation-scale masked modeling hinges on tokenizer design. A well-designed tokenizer should not only compress continuous neural signals into discrete tokens but also faithfully reconstruct the original waveform across all important frequency scales.

Core Contributions

  1. Proposed the NeuroRVQ Tokenizer: Captures multi-scale frequency features by applying temporal convolutions with different kernel sizes
  2. Designed a Hierarchical RVQ Codebook Structure: One codebook per frequency scale, utilizing 32 codebooks (2³² parameters) to capture complex patterns necessary for high-fidelity signal reconstruction
  3. Introduced Phase and Amplitude-Aware Loss Function: Based on strong signal processing principles, capturing EEG signal amplitude and wrapped phase information through sine and cosine representations
  4. Achieved SOTA Performance: 15% higher accuracy than existing LBMs on four BCI classification tasks

Methodology Details

Task Definition

Given a multivariate EEG time series X ∈ R^(C×T) (where T is the number of time points and C is the number of electrodes), the objectives are:

  1. Tokenize continuous EEG signals into discrete neural tokens
  2. Support accurate reconstruction across all frequency bands
  3. Enable robust generative masked modeling

Model Architecture

1. Patch Generation

Divide input EEG signals into P temporal patches of length w (corresponding to 1-second time windows), yielding segmented input samples x ∈ R^(P×w).

2. Multi-Scale Temporal Encoder

Extract features at S different temporal scales using an inception-style module:

  • Apply 1-D temporal convolutions with different kernel sizes: K_temporal1, K_temporal2, ..., K_temporalS
  • Each temporal branch contains: 1-D convolution → group normalization → GELU activation → pooling (repeated twice)
  • Produce S outputs: F1, F2, ..., FS, where Fi ∈ R^w

3. Transformer Encoder

  • Introduce learnable temporal embeddings TE and spatial embeddings SE
  • Add multi-scale features with embeddings and pass through shared Transformer layers
  • Generate multi-scale patch representations: p1, p2, ..., pS ∈ R^D

4. RVQ Codebook

For each temporal branch, discretize using RVQ codebook R:

R = {Vi | i = 1, ..., N}
Vi = {vj | j = 1, ..., K} ∈ R^(K×D)

Iterative quantization process:

z1 = arg min_{v∈V1} ||l2(p1) - l2(v)||
pi+1 = pi - zi
p̂ = Σ(i=1 to N) zi

5. Tokenizer Decoder

Reconstruct the original signal based on learned codebook tokens, using Fourier spectrum as reconstruction target with three prediction heads:

  • log(1 + Â): log magnitude
  • sin φ̂: phase sine component
  • cos φ̂: phase cosine component

Technical Innovations

1. Unit Circle-Aware Phase Loss

Traditional methods applying MSE directly to phase suffer from periodicity boundary discontinuity issues. NeuroRVQ introduces unit circle-aware loss:

L_unit-loss = 1 - Σ_i [cos φ̂i cos φi + sin φ̂i sin φi] / [√(cos²φ̂i + sin²φ̂i) √(cos²φi + sin²φi)]
             + λ_circle · Σ_i (cos²φ̂i + sin²φ̂i - 1)²

2. Comprehensive Training Objective

LT = ||log(1 + Âi) - log(1 + Ai)||²₂ + L_unit-loss + ||X̂i - Xi||²₂ + LQ

where LQ is the quantization loss.

Experimental Setup

Datasets

Utilize 13 large-scale EEG datasets (approximately 235 hours), including:

  • Public Datasets: BCI Competition IV-1, Grasp and Lift, Physionet MI, and 9 others
  • Self-Collected Dataset: Approximately 235 hours of motor imagery data (29 channels)
  • All data resampled to 200 Hz

Evaluation Metrics

  • Reconstruction Quality: Mean squared error (MSE) across frequency bands
  • Downstream Tasks: Balanced accuracy using 10-fold subject-independent cross-validation

Comparison Methods

  • Tokenizer Comparisons: LaBraM
  • Foundation Model Comparisons: NeuroGPT, CBraMod, LaBraM, EEGPT, BIOT

Implementation Details

  • Tokenizer Training: 100 epochs, S=4 temporal branches, 4 RVQ codebooks, each containing 8 single codebooks Vi ∈ R^(8192×128)
  • Foundation Model Training: 50 epochs, λ_circle = 0.4
  • Hardware: NVIDIA DGX with 4 NVIDIA Tesla V100 GPUs

Experimental Results

Main Results

1. Tokenizer Reconstruction Performance

In-Distribution Evaluation (Table 1):

Frequency BandRaw SignalDeltaThetaAlphaBetaGamma
LaBraM1.0711.5610.1840.0990.1220.020
NeuroRVQ0.0160.0060.0020.0020.0050.002

NeuroRVQ achieves orders of magnitude lower reconstruction error across all frequency bands.

Out-of-Distribution Evaluation:

  • Consistently outperforms both versions of LaBraM on memory and motor tasks
  • Demonstrates superior generalization capability

2. Downstream Task Performance

ModelMotorMemorySleepEyesMeanParameters
NeuroGPT0.682±0.0830.597±0.0290.674±0.0330.827±0.0360.695±0.04579.5M
CBraMod0.614±0.1040.574±0.0380.635±0.0410.839±0.0410.666±0.0564.9M
LaBraM0.630±0.0760.526±0.0260.652±0.0370.799±0.0470.652±0.0475.8M
NeuroRVQ0.700±0.0730.574±0.0270.728±0.0280.869±0.0260.717±0.0385.9M

NeuroRVQ achieves best or near-best performance on all tasks with optimal average performance.

Ablation Studies

  • RVQ Layers: Experiments demonstrate that using 8 layers Vi ∈ R^(8192×128) achieves optimal reconstruction performance
  • Phase Representation: Sine-cosine representation significantly improves training stability compared to direct phase prediction

Experimental Findings

  1. Effectiveness of Multi-Scale Design: Temporal convolutions with different kernel sizes successfully capture multi-frequency characteristics of EEG signals
  2. Importance of Phase-Aware Loss: Unit circle constraints ensure geometric significance of phase predictions
  3. Parameter Efficiency: NeuroRVQ achieves better performance than NeuroGPT (79.5M parameters) with only 5.9M parameters

Traditional EEG Analysis Methods

Early approaches relied on hand-crafted features such as power spectral density (PSD) and independent component analysis (ICA), but suffered from limited generalization due to large inter-subject variability and noise characteristics of EEG signals.

Deep Learning Era

Models such as EEGNet, EEGInception, and EEGConformer reduced dependence on hand-crafted features but still required carefully annotated data and task-specific training.

Foundation Models

LaBraM, NeuroGPT, and CBraMod represent the development direction of EEG foundation models but all face the bottleneck of signal tokenization. NeuroRVQ addresses this critical issue through improved codebook design.

Conclusions and Discussion

Main Conclusions

  1. NeuroRVQ tokenizer achieves SOTA EEG signal reconstruction performance
  2. Multi-scale feature extraction and hierarchical RVQ design effectively capture complex patterns in EEG signals
  3. Phase and amplitude-aware training significantly improves tokenization quality
  4. Achieves best performance on multiple downstream BCI tasks

Limitations

  1. Computational Complexity: Multi-scale encoder and multiple RVQ codebooks increase computational overhead
  2. Data Dependency: Performance still depends on the quality and diversity of large-scale pretraining data
  3. Fixed Frequency Bands: Current design targets traditional EEG frequency bands and may not apply to other biosignals

Future Directions

  1. Causal Inference Integration: Incorporate more targeted spatiotemporal masking strategies
  2. Multimodal Extension: Extend principles to other biosignals
  3. Architecture Optimization: Explore integration with larger-scale LBM architectures

In-Depth Evaluation

Strengths

  1. Strong Technical Innovation: Multi-scale RVQ design and phase-aware loss represent important innovations tailored to EEG signal characteristics
  2. Comprehensive Experiments: Include in-distribution and out-of-distribution evaluation, ablation studies, and multi-task validation
  3. Solid Theoretical Foundation: Design based on signal processing principles has strong theoretical support
  4. High Practical Value: Significantly improves performance of EEG foundation models

Weaknesses

  1. Limited Baseline Comparisons: Primarily compared with LaBraM, lacking comparison with more codebook methods
  2. Missing Computational Cost Analysis: Lacks detailed analysis of computational complexity and inference time
  3. Insufficient Generalization Validation: Primarily validated on BCI tasks with limited verification on other EEG application scenarios

Impact

  1. Academic Contribution: Provides important tokenization solution for EEG foundation models
  2. Practical Value: Can be directly applied to improve existing BCI systems
  3. Reproducibility: Provides detailed implementation details and hyperparameter settings

Applicable Scenarios

  • Applications requiring high-fidelity EEG signal reconstruction
  • Pretraining and fine-tuning of large-scale EEG data
  • Multi-task BCI system development
  • Biosignal foundation model research

References

The paper cites 68 relevant references covering multiple domains including EEG analysis, deep learning, and foundation models, providing a solid theoretical foundation for the research.


Overall Assessment: This is a high-quality paper with significant contributions to the EEG signal processing and foundation model domains. Through innovative design tailored to EEG signal characteristics, it substantially improves upon existing methods and provides important momentum for the field's development.