2025-11-22T10:40:16.215584

What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context

Ouyang, Wen, Zhang et al.

Sequential recommendation systems aspire to profile users by interpreting their interaction histories, echoing how humans make decisions by weighing experience, relative preference strength, and situational relevance. Yet, existing large language model (LLM)-based recommenders often fall short of mimicking the flexible, context-aware decision strategies humans exhibit, neglecting the structured, dynamic, and context-aware mechanisms fundamental to human behaviors. To bridge this gap, we propose RecPO, a preference optimization framework that models structured feedback and contextual delay to emulate human-like prioritization in sequential recommendation. RecPO exploits adaptive reward margins based on inferred preference hierarchies and temporal signals, enabling the model to favor immediately relevant items and to distinguish between varying degrees of preference and aversion. Extensive experiments across five real-world datasets demonstrate that RecPO not only yields performance gains over state-of-the-art baselines, but also mirrors key characteristics of human decision-making: favoring timely satisfaction, maintaining coherent preferences, and exercising discernment under shifting contexts.

academic

What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context

Basic Information

Paper ID: 2506.02261
Title: What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context
Authors: Zhongyu Ouyang, Qianlong Wen, Chunhui Zhang, Yanfang Ye, Soroush Vosoughi
Institutions: Dartmouth College, University of Notre Dame
Classification: cs.IR, cs.LG
Publication Date: October 10, 2025 (arXiv v2)
Paper Link: https://arxiv.org/abs/2506.02261v2

Abstract

Research Background and Motivation

Problem Definition

Existing LLM-based sequential recommendation systems suffer from the following key limitations:

Binary Preference Modeling: Existing methods such as DPO and its variants process all preferences through binary pairwise comparisons, neglecting the heterogeneity of preference intensity
Missing Temporal Context: Lack of temporal sensitivity modeling, failing to distinguish between immediate and delayed satisfaction
Overlooking Human Decision Mechanisms: Failure to emulate the complex mechanisms by which humans weigh experience, relative preference strength, and situational relevance in decision-making

Research Motivation

Human decision-making behavior exhibits hierarchical preferences (strong affinity vs. mild preference) and temporal sensitivity (immediate vs. delayed satisfaction), characteristics well-established in behavioral economics and cognitive science but largely overlooked in current LLM recommendation preference alignment. Through systematic empirical investigation, the authors discover that integrating comprehensive feedback (including negative interactions) and structured preference signals (such as ratings) significantly enhances performance.

Core Insights

Through proof-of-concept experiments, the authors identify two critical factors:

Preference Intensity: The hierarchical strength of user affinity or aversion
Temporal Context: The immediacy of satisfaction

Core Contributions

Theoretical Contribution: Systematically demonstrates that preference intensity and temporal context are critical factors for fine-grained preference modeling in LLM recommendation systems, challenging the existing binary preference paradigm
Methodological Contribution: Proposes the RecPO framework that integrates these factors through adaptive reward margins based on preference intensity and temporal context
Empirical Contribution: Experiments across five datasets show that RecPO not only improves accuracy but also exhibits behavioral characteristics aligned with human preferences: prioritizing timely satisfaction, maintaining preference consistency under changing contexts

Methodology Details

Task Definition

Given user u's interaction history $H_u^t$ at time t and candidate item set $C = \{i^{(j)}\}_{j=1}^K$ , where $H_u^t \cap C = \emptyset$ and $i_p^{t+1} \in C$ , the model $\pi_\theta$ needs to predict the item $i_p^{t+1}$ the user is most likely to prefer.

Core Method: RecPO Framework

1. Adaptive Reward Margin

The core innovation of RecPO lies in defining an adaptive target reward margin $\gamma_r$ dynamically determined by structured preferences and relative recency:

$\gamma_r = \lambda \frac{\phi(s_p, \Delta t_p)}{\phi(s_d, \Delta t_d)}$

where:

$s_p, s_d$ are the structured preference scores for preferred and non-preferred items respectively
$\Delta t_p = t_p^+ - t$ represents the temporal delay of interaction
$\phi(s, \Delta t) = s/(\Delta t)^{0.5}$ is the utility function
$\lambda$ controls the magnitude of the margin

2. Preference Distribution Modeling

Based on the Bradley-Terry model, RecPO models preference probability as:

$P^*(y_p \succ y_d | x_u) = \sigma(r(x_u, y_p) - r(x_u, y_d) - \gamma_r)$

3. Objective Function

Adopting the Plackett-Luce model to generalize pairwise comparisons to list-level ranking framework, the final objective function is:

$L(\pi_\theta; \pi_{ref}) = -E_{(x_u,y_p,T_d)\sim D}\left[\log \sigma\left(-\log \sum_{y_d \in T_d} \exp\left(\beta \log \frac{\pi_\theta(y_d|x_u)}{\pi_{ref}(y_d|x_u)} - \beta \log \frac{\pi_\theta(y_p|x_u)}{\pi_{ref}(y_p|x_u)} - \lambda \frac{\phi(s_p,\Delta t_p)}{\phi(s_d,\Delta t_d)}\right)\right)\right]$

Technical Innovations

Non-uniform Margin Design: Unlike prior work using uniform margins, RecPO dynamically adjusts margins based on preference intensity and temporal distance
Comprehensive Feedback Utilization: Preserves complete interaction sequences including negative feedback combined with explicit ratings
Human Cognition Alignment: Preference modeling mechanism designed based on cognitive science principles

Experimental Setup

Datasets

Five real-world sequential recommendation datasets are employed:

Explicit Feedback Datasets: MovieLens-1M, Amazon-Books, BeerAdvocate
Implicit Feedback Datasets: Steam, LastFM

Dataset	Sequences	Items	Interactions
MovieLens	6,040	3,952	994,169
Amazon-Books	5,103	38,203	62,290
Steam	3,171	4,251	82,072
BeerAdvocate	4,724	6,105	91,207
LastFM	982	107,296	307,829

Evaluation Metrics

Hit Ratio@1: Measures the proportion of correctly recommended items
Valid Ratio: Assesses instruction-following capability, quantifying outputs conforming to format requirements

Baseline Methods

Traditional Methods: GRU4Rec, Caser, SASRec
LLM Methods: DPO, SimPO, S-DPO
Base Models: LLaMA3-8B, Qwen2.5-7B

Implementation Details

Learning rate: 1e-5, Optimizer: AdamW
Batch size: 128, Sequence length: Adjusted per dataset
Number of negative samples: 3, Margin parameter λ: 2
Hardware: 8×NVIDIA RTX A100 (80GB)

Experimental Results

Main Results

RecPO achieves the best performance across all five datasets:

Model	MovieLens HR@1	Amazon-Books HR@1	BeerAdvocate HR@1	Steam HR@1	LastFM HR@1
SASRec	0.2671	0.1559	0.3800	0.4587	0.6659
S-DPO	0.2902	0.5065	0.4698	0.3588	0.5719
RecPO	0.3451	0.5802	0.5771	0.4672	0.6830

Key Findings

Importance of Comprehensive Feedback: Retaining negative interactions outperforms using only positive feedback
Value of Structured Signals: Adding rating information significantly improves performance
Factor Complementarity: Optimal performance derives from the combination of comprehensive feedback and structured signals

Ablation Study

Ablation studies on margin functions show:

Dataset	Log Diff	Log Ratio	RecPO (Ratio)
MovieLens	0.3160	0.3247	0.3451
Amazon-Books	0.5370	0.5455	0.5802

Ratio-based margin functions achieve the best performance across all datasets.

Human Alignment Behavior Analysis

RecPO exhibits human-aligned behavior across four key dimensions:

Temporal Context Sensitivity: Better prioritizes temporally appropriate items when candidate sets contain future high-rated items
Preference Intensity Awareness: Avoids recommending tempting items that ultimately receive low ratings
Implicit Aversion Modeling: Identifies disliked items without explicit aversion labels
Cross-context Robustness: Maintains stable performance across varying interaction history lengths

Sequential Recommendation

Early methods such as GRU4Rec employ recurrent neural networks, while SASRec introduces self-attention mechanisms. Recent approaches integrate graph structures, contrastive learning, and other techniques.

LLM-based Recommendation Systems

Methods like LLaRA and TALLRec integrate LLMs into recommendation systems, primarily focusing on semantic understanding rather than fine-grained factors in preference modeling.

LLM Alignment Techniques

From RLHF to DPO and its variants (IPO, CPO, KTO, SimPO), these methods primarily target general NLP tasks, with S-DPO being the first to adapt alignment techniques to recommendation tasks.

Conclusions and Discussion

Main Conclusions

Preference intensity and temporal context are overlooked yet critical factors in LLM recommendation systems
RecPO effectively integrates these factors through adaptive reward margins, achieving both performance improvements and human behavior alignment
The method demonstrates consistent improvements across both explicit and implicit feedback datasets

Limitations

Simplified Preference Structure: Adopts a simplified sequential preference structure
Single Contextual Factor: Considers only satisfaction delay as a contextual factor
Evaluation Metric Limitations: Primarily relies on single metrics, failing to capture more comprehensive behavioral patterns

Future Directions

Complex Preference Hierarchy Modeling: Explore more sophisticated cognitively-grounded preference structures
Enriched Contextual Factors: Integrate additional contextual influences
Comprehensive Evaluation Framework: Develop more comprehensive behavior-oriented evaluation metrics

In-Depth Evaluation

Strengths

Precise Problem Identification: Clearly identifies core issues with existing methods (binary preference modeling)
Well-Designed Methodology: Adaptive margin mechanism grounded in cognitive science principles provides solid theoretical foundation
Comprehensive Experimental Design: Complete experimental framework including proof-of-concept, main experiments, ablation studies, and behavioral analysis
Strong Result Convincingness: Consistent improvements across multiple datasets and human behavior alignment analysis enhance persuasiveness

Weaknesses

Insufficient Theoretical Analysis: Lacks in-depth theoretical analysis of why this margin design is effective
Computational Complexity Undiscussed: Does not analyze computational overhead compared to baseline methods
Limited Hyperparameter Sensitivity: Relatively simple sensitivity analysis for critical parameter λ
Limited Generalization Capability: Primarily validated on specific types of recommendation tasks; generalization remains to be verified

Impact

Academic Contribution: Provides new research directions and theoretical frameworks for LLM recommendation system research
Practical Value: Offers directly applicable improvement methods; open-source code enhances reproducibility
Inspirational Significance: Emphasizes the importance of cognitive science principles in AI system design

Applicable Scenarios

Sequential Recommendation Systems: Particularly suitable for scenarios with clear temporal sequences and rating information
Personalization Applications: Appropriate for personalized services requiring fine-grained preference modeling
Multimodal Recommendation: Framework design exhibits extensibility, adaptable to multimodal recommendation tasks

References

This paper cites important works from multiple domains including recommendation systems, LLM alignment, and cognitive science, including:

Classical recommendation methods: GRU4Rec, SASRec, Caser
LLM alignment techniques: DPO, RLHF, SimPO
Cognitive science foundations: Astington & Jenkins (1995) research on human decision-making mechanisms

Overall Assessment: This is a high-quality research paper demonstrating excellence in theoretical contributions, methodological innovation, and experimental validation. The paper successfully identifies and addresses key challenges in LLM recommendation systems, proposing the RecPO framework with solid theoretical grounding and practical value. Despite some limitations, its contributions to recommendation systems and LLM alignment research are significant.