2025-11-20T21:55:15.461429

Diffusion Generative Recommendation with Continuous Tokens

Qu, Lin, Ding et al.
Recent advances in generative artificial intelligence, particularly large language models (LLMs), have opened new opportunities for enhancing recommender systems (RecSys). Most existing LLM-based RecSys approaches operate in a discrete space, using vector-quantized tokenizers to align with the inherent discrete nature of language models. However, these quantization methods often result in lossy tokenization and suboptimal learning, primarily due to inaccurate gradient propagation caused by the non-differentiable argmin operation in standard vector quantization. Inspired by the emerging trend of embracing continuous tokens in language models, we propose ContRec, a novel framework that seamlessly integrates continuous tokens into LLM-based RecSys. Specifically, ContRec consists of two key modules: a sigma-VAE Tokenizer, which encodes users/items with continuous tokens; and a Dispersive Diffusion module, which captures implicit user preference. The tokenizer is trained with a continuous Variational Auto-Encoder (VAE) objective, where three effective techniques are adopted to avoid representation collapse. By conditioning on the previously generated tokens of the LLM backbone during user modeling, the Dispersive Diffusion module performs a conditional diffusion process with a novel Dispersive Loss, enabling high-quality user preference generation through next-token diffusion. Finally, ContRec leverages both the textual reasoning output from the LLM and the latent representations produced by the diffusion model for Top-K item retrieval, thereby delivering comprehensive recommendation results. Extensive experiments on four datasets demonstrate that \ourname{} consistently outperforms both traditional and SOTA LLM-based recommender systems. Our results highlight the potential of continuous tokenization and generative modeling for advancing the next generation of recommender systems.
academic

Diffusion Generative Recommendation with Continuous Tokens

Basic Information

  • Paper ID: 2504.12007
  • Title: Diffusion Generative Recommendation with Continuous Tokens
  • Authors: Haohao Qu, Shanru Lin, Yujuan Ding, Yiqi Wang, Wenqi Fan
  • Classification: cs.IR cs.AI
  • Publication Date/Venue: arXiv Preprint (Revised October 10, 2025)
  • Paper Link: https://arxiv.org/abs/2504.12007

Abstract

This paper addresses the limitations of discrete tokenization methods in Large Language Model (LLM)-based recommendation systems by proposing the ContRec framework, which seamlessly integrates continuous tokens into LLM recommendation systems. ContRec comprises two core modules: a σ-VAE tokenizer (encoding users/items with continuous tokens) and a dispersed diffusion module (capturing implicit user preferences). By combining text reasoning outputs from LLMs with latent representations generated by diffusion models for Top-K item retrieval, experiments on four datasets demonstrate that ContRec significantly outperforms both traditional and state-of-the-art LLM-based recommendation systems.

Research Background and Motivation

Problem Definition

Existing LLM-based recommendation systems face two critical challenges:

  1. Lossy Tokenization: Vector quantization methods inevitably lose information during compression
  2. Inaccurate Gradient Propagation: The non-differentiable argmin operation in standard vector quantization necessitates "straight-through" tricks, producing inaccurate gradients

Research Significance

  • LLMs demonstrate strong generalization and in-context learning capabilities in recommendation systems
  • User and item sets typically reach millions of scale, making traditional indexing methods inefficient
  • While quantization methods are practical, they suffer from reconstruction quality and generation performance limitations

Limitations of Existing Methods

  1. Discrete Methods: Approaches like TIGER and UTGRec use VQ-VAE to construct discrete vocabularies, suffering from information compression loss
  2. Continuous Projection Methods: Methods such as CoLLM and LlaRA only employ continuous tokens in the input portion, with outputs still relying on discrete generators, creating discrete-continuous discrepancies

Research Motivation

Inspired by the trend of embracing continuous tokens in language models, this work explores the potential of using continuous tokens and diffusion models in recommendation scenarios to achieve higher-quality user preference modeling.

Core Contributions

  1. Proposes ContRec Framework: The first framework to seamlessly integrate continuous tokens into LLM recommendation systems, breaking through quantization limitations
  2. Designs Two Key Modules:
    • σ-VAE Tokenizer: A robust continuous tokenizer employing three techniques to prevent representation collapse
    • Dispersed Diffusion Module: Generates implicit user preference representations through contrastive self-supervised learning
  3. Introduces Dispersed Loss: A contrastive learning mechanism without explicit negative sample pairs
  4. Experimental Validation: Achieves average improvements of 11.76% HR@10 and 10.11% NDCG@10 across four datasets

Methodology Details

Task Definition

Given a user set U = {u₁, u₂, ..., uₙ} and item set V = {v₁, v₂, ..., vₘ}, the objective is to predict future user preferences by analyzing historical interactions, reformulating sequential recommendation as a language model paradigm:

Yᵢ = LLM(P(Tᵢ, {Tⱼ|vⱼ ∈ V(uᵢ)}))

Model Architecture

1. σ-VAE Tokenizer

Employs a VAE framework for non-quantized tokenization, incorporating three key techniques:

Masking Operation: Element-level masking strategy based on Bernoulli distribution

μₖ = Encₖ(Mask(x, ρ))

K-Path Encoder: Parallel encoding channels for implicit encoding

zₖ = μₖ + σₖ ⊙ ε, where ε ~ N(0,1), σₖ ~ N(0,Σ)

Gaussian Kernel: Prevents variance collapse

x̂ = Dec(Concat{zₖ}ᴷ)

Loss Function:

Lvae = ||x̂ - x||₂² + (β/K)∑ᵏ₌₁ᴷ ||μₖ||₂²

2. LLM User Modeling

Combines discrete semantic information with continuous collaborative knowledge:

Xᵢ := P(Tᵢ, {Tⱼ|vⱼ ∈ V(uᵢ)})

Special tokens ⟨z_start⟩ and ⟨z_end⟩ mark the beginning and end of continuous token sequences.

3. Dispersed Diffusion Module

Conditional Diffusion Process:

Ldiff = E(yᵢ,cᵢ,t) ||ε - εθ(y^t_i, cᵢ, t)||₂²

Dispersed Loss:

Ldisp = log E_{i,j}[exp(-D(hᵢ, hⱼ)/τ)]

This is a "contrastive loss without positive sample pairs" that encourages dispersed representations within a batch.

Technical Innovations

  1. Continuous Tokenization: Completely avoids quantization operations, preserving information integrity
  2. Hybrid Retrieval Mechanism: Combines LLM text reasoning with implicit representations generated by diffusion
  3. End-to-End Optimization: Unified optimization objective integrating three loss functions
  4. Classifier-Free Guidance: Controls personalization intensity during inference

Experimental Setup

Datasets

Four benchmark datasets are employed:

DatasetUsersItemsInteractionsAvg LengthDensity (%)
LastFM1,0913,68552,67048.31.31
ML1M6,0403,416447,294165.52.17
Beauty22,36312,101278,6418.90.07
Games47,56816,834266,1399.50.03

Evaluation Metrics

  • HR@K (Hit Ratio): Top-K hit rate
  • NDCG@K (Normalized Discounted Cumulative Gain): Normalized discounted cumulative gain
  • K values set to 10 and 20

Baseline Methods

Traditional Sequential Recommendation: GRU4Rec, SASRec, SSD4Rec, DreamRec LLM-Based Recommendation Systems: P5, CoLLM, TIGER, TokenRec, LLaRA

Implementation Details

  • Base Model: Llama-3.2-1B-Instruct
  • Optimizer: AdamW (learning rate 1e-5/1e-4)
  • Batch Size: 24
  • Maximum Sequence Length: 20
  • Diffusion Steps: 1000 for training, 100 for inference

Experimental Results

Main Results

ContRec achieves best performance across all datasets:

DatasetMetricBest BaselineContRecImprovement
BeautyHR@100.04420.0473±0.00177.74%
GamesHR@100.10180.1041±0.00368.66%
LastFMHR@100.05250.0539±0.003415.42%
ML1MHR@100.10760.1099±0.006615.20%

Compared to TIGER (typical discrete method), average improvements of 11.76% HR@10 and 10.11% NDCG@10 are achieved.

Ablation Study

Analysis of key component contributions:

ComponentBeauty HR@10ML1M HR@10Impact
Full Model0.04730.1099-
w/o Diffusion0.04310.1007Significant Drop
w/o Dispersed Loss0.04480.1042Notable Drop
w/o σ0.04570.1051Performance Decline
w/ VQ-VAE0.04260.0974Substantial Drop

Reconstruction Evaluation

On item embedding reconstruction tasks, continuous methods significantly outperform discrete methods:

  • Diffusion model achieves lowest reconstruction error
  • VAE outperforms various quantization methods (VQ-VAE, RQ-VAE, MQ-VAE)
  • Loss convergence is smoother

Hyperparameter Sensitivity

  • Masking Ratio ρ: Optimal at 0.2
  • Token Count K: Best performance with 3-4 tokens
  • Guidance Strength ω: Small values (ω=2) provide improvements
  • Weight Parameters: Optimal performance at γ₁=1, γ₂=0.5

LLM-Based Recommendation Systems

  1. Discrete Tokenization: P5 unifies multi-task as text generation; TIGER/TokenRec employ vector quantization
  2. Continuous Projection: CoLLM/LlaRA directly project collaborative representations, suffering from discrete-continuous discrepancies

Diffusion Models and Continuous Tokens

  1. Image Generation: VAE-MAR, Next-Token Diffusion demonstrate potential of continuous tokens
  2. Multimodal Modeling: DEEM and others employ diffusion as "eyes" for LLMs
  3. Protein Modeling: DPLM and similar works show success in continuous structure embeddings

Conclusions and Discussion

Main Conclusions

  1. Advantages of Continuous Tokens Validated: Avoids quantization loss, achieving more precise representation learning
  2. Diffusion Models Applicable to Recommendation: Demonstrates strong capability in user preference modeling
  3. Hybrid Retrieval Mechanism Effective: Combines advantages of explicit reasoning and implicit representations
  4. End-to-End Optimization Feasible: Unified framework enables coordinated optimization of all components

Limitations

  1. Computational Overhead: Inference time primarily dominated by LLM inference (approximately 88.6%)
  2. User Preference Shifts: Limited adaptability to sudden preference changes
  3. Application Scenarios: Better suited for personalized conversational recommendation rather than large-scale online systems
  4. Data Dependency: Requires rich item textual information for support

Future Directions

  1. Efficiency Optimization: Explore more efficient continuous token generation methods
  2. Dynamic Modeling: Enhance modeling of user preference evolution
  3. Multimodal Extension: Integrate multimodal information such as images and videos
  4. Theoretical Analysis: Deepen understanding of theoretical foundations of continuous tokens in recommendation

In-Depth Evaluation

Strengths

  1. Strong Novelty: First systematic introduction of continuous tokens into LLM recommendation systems
  2. Technical Rigor: Ingenious σ-VAE design effectively prevents representation collapse
  3. Comprehensive Experiments: Multi-dataset validation with detailed ablation and sensitivity analyses
  4. Theoretical Support: Clear mathematical derivation of dispersed loss with sound design

Weaknesses

  1. Computational Efficiency: High inference latency limits practical application scenarios
  2. Generalization Capability: Limited performance in scenarios with sudden user preference changes
  3. Incomplete Comparisons: Lacks comparison with more recent LLM recommendation methods
  4. Insufficient Theoretical Analysis: Theoretical explanation of continuous token advantages needs deeper investigation

Impact

  1. Academic Contribution: Provides new technical pathways for LLM recommendation systems
  2. Practical Value: Shows good application prospects in conversational recommendation scenarios
  3. Reproducibility: Provides detailed implementation details and hyperparameter settings
  4. Inspirational Significance: Offers new perspectives on combining recommendation systems with generative AI

Applicable Scenarios

  1. Personalized Conversational Recommendation: Scenarios requiring explainability and interactivity
  2. Cold-Start Recommendation: Leveraging textual information for new users/items
  3. Cross-Domain Recommendation: Utilizing LLM generalization for domain transfer
  4. Research Prototype: Serves as foundational framework for exploring continuous token recommendation

References

This paper cites important works from recommendation systems, large language models, diffusion models, and related fields, including:

  • Classical Recommendation Algorithms: LightGCN, SASRec, etc.
  • LLM-Based Recommendation Systems: P5, TIGER, TokenRec, etc.
  • Diffusion Models: DDPM, Classifier-free Guidance, etc.
  • Continuous Tokenization: VAE-MAR, Next-Token Diffusion, etc.

Overall Assessment: This is an important work with significant innovation in the LLM recommendation systems domain. By introducing continuous tokenization and diffusion models, it effectively addresses limitations of existing methods. While there remains room for improvement in computational efficiency and applicability in certain scenarios, its technical innovations and experimental validation are sufficiently rigorous, providing valuable contributions to the field's development.