2025-11-20T21:55:15.461429

Diffusion Generative Recommendation with Continuous Tokens

Qu, Lin, Ding et al.

Recent advances in generative artificial intelligence, particularly large language models (LLMs), have opened new opportunities for enhancing recommender systems (RecSys). Most existing LLM-based RecSys approaches operate in a discrete space, using vector-quantized tokenizers to align with the inherent discrete nature of language models. However, these quantization methods often result in lossy tokenization and suboptimal learning, primarily due to inaccurate gradient propagation caused by the non-differentiable argmin operation in standard vector quantization. Inspired by the emerging trend of embracing continuous tokens in language models, we propose ContRec, a novel framework that seamlessly integrates continuous tokens into LLM-based RecSys. Specifically, ContRec consists of two key modules: a sigma-VAE Tokenizer, which encodes users/items with continuous tokens; and a Dispersive Diffusion module, which captures implicit user preference. The tokenizer is trained with a continuous Variational Auto-Encoder (VAE) objective, where three effective techniques are adopted to avoid representation collapse. By conditioning on the previously generated tokens of the LLM backbone during user modeling, the Dispersive Diffusion module performs a conditional diffusion process with a novel Dispersive Loss, enabling high-quality user preference generation through next-token diffusion. Finally, ContRec leverages both the textual reasoning output from the LLM and the latent representations produced by the diffusion model for Top-K item retrieval, thereby delivering comprehensive recommendation results. Extensive experiments on four datasets demonstrate that \ourname{} consistently outperforms both traditional and SOTA LLM-based recommender systems. Our results highlight the potential of continuous tokenization and generative modeling for advancing the next generation of recommender systems.

academic

Diffusion Generative Recommendation with Continuous Tokens

Basic Information

Paper ID: 2504.12007
Title: Diffusion Generative Recommendation with Continuous Tokens
Authors: Haohao Qu, Shanru Lin, Yujuan Ding, Yiqi Wang, Wenqi Fan
Classification: cs.IR cs.AI
Publication Date/Venue: arXiv Preprint (Revised October 10, 2025)
Paper Link: https://arxiv.org/abs/2504.12007

Abstract

This paper addresses the limitations of discrete tokenization methods in Large Language Model (LLM)-based recommendation systems by proposing the ContRec framework, which seamlessly integrates continuous tokens into LLM recommendation systems. ContRec comprises two core modules: a σ-VAE tokenizer (encoding users/items with continuous tokens) and a dispersed diffusion module (capturing implicit user preferences). By combining text reasoning outputs from LLMs with latent representations generated by diffusion models for Top-K item retrieval, experiments on four datasets demonstrate that ContRec significantly outperforms both traditional and state-of-the-art LLM-based recommendation systems.

Research Background and Motivation

Problem Definition

Existing LLM-based recommendation systems face two critical challenges:

Lossy Tokenization: Vector quantization methods inevitably lose information during compression
Inaccurate Gradient Propagation: The non-differentiable argmin operation in standard vector quantization necessitates "straight-through" tricks, producing inaccurate gradients

Research Significance

LLMs demonstrate strong generalization and in-context learning capabilities in recommendation systems
User and item sets typically reach millions of scale, making traditional indexing methods inefficient
While quantization methods are practical, they suffer from reconstruction quality and generation performance limitations

Limitations of Existing Methods

Discrete Methods: Approaches like TIGER and UTGRec use VQ-VAE to construct discrete vocabularies, suffering from information compression loss
Continuous Projection Methods: Methods such as CoLLM and LlaRA only employ continuous tokens in the input portion, with outputs still relying on discrete generators, creating discrete-continuous discrepancies

Research Motivation

Inspired by the trend of embracing continuous tokens in language models, this work explores the potential of using continuous tokens and diffusion models in recommendation scenarios to achieve higher-quality user preference modeling.

Core Contributions

Proposes ContRec Framework: The first framework to seamlessly integrate continuous tokens into LLM recommendation systems, breaking through quantization limitations
Designs Two Key Modules:
- σ-VAE Tokenizer: A robust continuous tokenizer employing three techniques to prevent representation collapse
- Dispersed Diffusion Module: Generates implicit user preference representations through contrastive self-supervised learning
Introduces Dispersed Loss: A contrastive learning mechanism without explicit negative sample pairs
Experimental Validation: Achieves average improvements of 11.76% HR@10 and 10.11% NDCG@10 across four datasets

Methodology Details

Task Definition

Given a user set U = {u₁, u₂, ..., uₙ} and item set V = {v₁, v₂, ..., vₘ}, the objective is to predict future user preferences by analyzing historical interactions, reformulating sequential recommendation as a language model paradigm:

Yᵢ = LLM(P(Tᵢ, {Tⱼ|vⱼ ∈ V(uᵢ)}))

Model Architecture

1. σ-VAE Tokenizer

Employs a VAE framework for non-quantized tokenization, incorporating three key techniques:

Masking Operation: Element-level masking strategy based on Bernoulli distribution

μₖ = Encₖ(Mask(x, ρ))

K-Path Encoder: Parallel encoding channels for implicit encoding

zₖ = μₖ + σₖ ⊙ ε, where ε ~ N(0,1), σₖ ~ N(0,Σ)

Gaussian Kernel: Prevents variance collapse

x̂ = Dec(Concat{zₖ}ᴷ)

Loss Function:

Lvae = ||x̂ - x||₂² + (β/K)∑ᵏ₌₁ᴷ ||μₖ||₂²

2. LLM User Modeling

Combines discrete semantic information with continuous collaborative knowledge:

Xᵢ := P(Tᵢ, {Tⱼ|vⱼ ∈ V(uᵢ)})

Special tokens ⟨z_start⟩ and ⟨z_end⟩ mark the beginning and end of continuous token sequences.

3. Dispersed Diffusion Module

Conditional Diffusion Process:

Ldiff = E(yᵢ,cᵢ,t) ||ε - εθ(y^t_i, cᵢ, t)||₂²

Dispersed Loss:

Ldisp = log E_{i,j}[exp(-D(hᵢ, hⱼ)/τ)]

This is a "contrastive loss without positive sample pairs" that encourages dispersed representations within a batch.

Technical Innovations

Continuous Tokenization: Completely avoids quantization operations, preserving information integrity
Hybrid Retrieval Mechanism: Combines LLM text reasoning with implicit representations generated by diffusion
End-to-End Optimization: Unified optimization objective integrating three loss functions
Classifier-Free Guidance: Controls personalization intensity during inference

Experimental Setup

Datasets

Four benchmark datasets are employed:

Dataset	Users	Items	Interactions	Avg Length	Density (%)
LastFM	1,091	3,685	52,670	48.3	1.31
ML1M	6,040	3,416	447,294	165.5	2.17
Beauty	22,363	12,101	278,641	8.9	0.07
Games	47,568	16,834	266,139	9.5	0.03

Evaluation Metrics

HR@K (Hit Ratio): Top-K hit rate
NDCG@K (Normalized Discounted Cumulative Gain): Normalized discounted cumulative gain
K values set to 10 and 20

Baseline Methods

Traditional Sequential Recommendation: GRU4Rec, SASRec, SSD4Rec, DreamRec LLM-Based Recommendation Systems: P5, CoLLM, TIGER, TokenRec, LLaRA

Implementation Details

Base Model: Llama-3.2-1B-Instruct
Optimizer: AdamW (learning rate 1e-5/1e-4)
Batch Size: 24
Maximum Sequence Length: 20
Diffusion Steps: 1000 for training, 100 for inference

Experimental Results

Main Results

ContRec achieves best performance across all datasets:

Dataset	Metric	Best Baseline	ContRec	Improvement
Beauty	HR@10	0.0442	0.0473±0.0017	7.74%
Games	HR@10	0.1018	0.1041±0.0036	8.66%
LastFM	HR@10	0.0525	0.0539±0.0034	15.42%
ML1M	HR@10	0.1076	0.1099±0.0066	15.20%

Compared to TIGER (typical discrete method), average improvements of 11.76% HR@10 and 10.11% NDCG@10 are achieved.

Ablation Study

Analysis of key component contributions:

Component	Beauty HR@10	ML1M HR@10	Impact
Full Model	0.0473	0.1099	-
w/o Diffusion	0.0431	0.1007	Significant Drop
w/o Dispersed Loss	0.0448	0.1042	Notable Drop
w/o σ	0.0457	0.1051	Performance Decline
w/ VQ-VAE	0.0426	0.0974	Substantial Drop

Reconstruction Evaluation

On item embedding reconstruction tasks, continuous methods significantly outperform discrete methods:

Diffusion model achieves lowest reconstruction error
VAE outperforms various quantization methods (VQ-VAE, RQ-VAE, MQ-VAE)
Loss convergence is smoother

Hyperparameter Sensitivity

Masking Ratio ρ: Optimal at 0.2
Token Count K: Best performance with 3-4 tokens
Guidance Strength ω: Small values (ω=2) provide improvements
Weight Parameters: Optimal performance at γ₁=1, γ₂=0.5

LLM-Based Recommendation Systems

Discrete Tokenization: P5 unifies multi-task as text generation; TIGER/TokenRec employ vector quantization
Continuous Projection: CoLLM/LlaRA directly project collaborative representations, suffering from discrete-continuous discrepancies

Diffusion Models and Continuous Tokens

Image Generation: VAE-MAR, Next-Token Diffusion demonstrate potential of continuous tokens
Multimodal Modeling: DEEM and others employ diffusion as "eyes" for LLMs
Protein Modeling: DPLM and similar works show success in continuous structure embeddings

Conclusions and Discussion

Main Conclusions

Advantages of Continuous Tokens Validated: Avoids quantization loss, achieving more precise representation learning
Diffusion Models Applicable to Recommendation: Demonstrates strong capability in user preference modeling
Hybrid Retrieval Mechanism Effective: Combines advantages of explicit reasoning and implicit representations
End-to-End Optimization Feasible: Unified framework enables coordinated optimization of all components

Limitations

Computational Overhead: Inference time primarily dominated by LLM inference (approximately 88.6%)
User Preference Shifts: Limited adaptability to sudden preference changes
Application Scenarios: Better suited for personalized conversational recommendation rather than large-scale online systems
Data Dependency: Requires rich item textual information for support

Future Directions

Efficiency Optimization: Explore more efficient continuous token generation methods
Dynamic Modeling: Enhance modeling of user preference evolution
Multimodal Extension: Integrate multimodal information such as images and videos
Theoretical Analysis: Deepen understanding of theoretical foundations of continuous tokens in recommendation

In-Depth Evaluation

Strengths

Strong Novelty: First systematic introduction of continuous tokens into LLM recommendation systems
Technical Rigor: Ingenious σ-VAE design effectively prevents representation collapse
Comprehensive Experiments: Multi-dataset validation with detailed ablation and sensitivity analyses
Theoretical Support: Clear mathematical derivation of dispersed loss with sound design

Weaknesses

Computational Efficiency: High inference latency limits practical application scenarios
Generalization Capability: Limited performance in scenarios with sudden user preference changes
Incomplete Comparisons: Lacks comparison with more recent LLM recommendation methods
Insufficient Theoretical Analysis: Theoretical explanation of continuous token advantages needs deeper investigation

Impact

Academic Contribution: Provides new technical pathways for LLM recommendation systems
Practical Value: Shows good application prospects in conversational recommendation scenarios
Reproducibility: Provides detailed implementation details and hyperparameter settings
Inspirational Significance: Offers new perspectives on combining recommendation systems with generative AI

Applicable Scenarios

Personalized Conversational Recommendation: Scenarios requiring explainability and interactivity
Cold-Start Recommendation: Leveraging textual information for new users/items
Cross-Domain Recommendation: Utilizing LLM generalization for domain transfer
Research Prototype: Serves as foundational framework for exploring continuous token recommendation

References

This paper cites important works from recommendation systems, large language models, diffusion models, and related fields, including:

Classical Recommendation Algorithms: LightGCN, SASRec, etc.
LLM-Based Recommendation Systems: P5, TIGER, TokenRec, etc.
Diffusion Models: DDPM, Classifier-free Guidance, etc.
Continuous Tokenization: VAE-MAR, Next-Token Diffusion, etc.

Overall Assessment: This is an important work with significant innovation in the LLM recommendation systems domain. By introducing continuous tokenization and diffusion models, it effectively addresses limitations of existing methods. While there remains room for improvement in computational efficiency and applicability in certain scenarios, its technical innovations and experimental validation are sufficiently rigorous, providing valuable contributions to the field's development.