2025-11-22T10:40:16.215584

What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context

Ouyang, Wen, Zhang et al.
Sequential recommendation systems aspire to profile users by interpreting their interaction histories, echoing how humans make decisions by weighing experience, relative preference strength, and situational relevance. Yet, existing large language model (LLM)-based recommenders often fall short of mimicking the flexible, context-aware decision strategies humans exhibit, neglecting the structured, dynamic, and context-aware mechanisms fundamental to human behaviors. To bridge this gap, we propose RecPO, a preference optimization framework that models structured feedback and contextual delay to emulate human-like prioritization in sequential recommendation. RecPO exploits adaptive reward margins based on inferred preference hierarchies and temporal signals, enabling the model to favor immediately relevant items and to distinguish between varying degrees of preference and aversion. Extensive experiments across five real-world datasets demonstrate that RecPO not only yields performance gains over state-of-the-art baselines, but also mirrors key characteristics of human decision-making: favoring timely satisfaction, maintaining coherent preferences, and exercising discernment under shifting contexts.
academic

What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context

Basic Information

  • Paper ID: 2506.02261
  • Title: What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context
  • Authors: Zhongyu Ouyang, Qianlong Wen, Chunhui Zhang, Yanfang Ye, Soroush Vosoughi
  • Institutions: Dartmouth College, University of Notre Dame
  • Classification: cs.IR, cs.LG
  • Publication Date: October 10, 2025 (arXiv v2)
  • Paper Link: https://arxiv.org/abs/2506.02261v2

Abstract

Sequential recommendation systems aspire to profile users by interpreting their interaction histories, echoing how humans make decisions by weighing experience, relative preference strength, and situational relevance. Yet, existing large language model (LLM)-based recommenders often fall short of mimicking the flexible, context-aware decision strategies humans exhibit, neglecting the structured, dynamic, and context-aware mechanisms fundamental to human behaviors. To bridge this gap, we propose RecPO, a preference optimization framework that models structured feedback and contextual delay to emulate human-like prioritization in sequential recommendation. RecPO exploits adaptive reward margins based on inferred preference hierarchies and temporal signals, enabling the model to favor immediately relevant items and to distinguish between varying degrees of preference and aversion. Extensive experiments across five real-world datasets demonstrate that RecPO not only yields performance gains over state-of-the-art baselines, but also mirrors key characteristics of human decision-making: favoring timely satisfaction, maintaining coherent preferences, and exercising discernment under shifting contexts.

Research Background and Motivation

Problem Definition

Existing LLM-based sequential recommendation systems suffer from the following key limitations:

  1. Binary Preference Modeling: Existing methods such as DPO and its variants process all preferences through binary pairwise comparisons, neglecting the heterogeneity of preference intensity
  2. Missing Temporal Context: Lack of temporal sensitivity modeling, failing to distinguish between immediate and delayed satisfaction
  3. Overlooking Human Decision Mechanisms: Failure to emulate the complex mechanisms by which humans weigh experience, relative preference strength, and situational relevance in decision-making

Research Motivation

Human decision-making behavior exhibits hierarchical preferences (strong affinity vs. mild preference) and temporal sensitivity (immediate vs. delayed satisfaction), characteristics well-established in behavioral economics and cognitive science but largely overlooked in current LLM recommendation preference alignment. Through systematic empirical investigation, the authors discover that integrating comprehensive feedback (including negative interactions) and structured preference signals (such as ratings) significantly enhances performance.

Core Insights

Through proof-of-concept experiments, the authors identify two critical factors:

  • Preference Intensity: The hierarchical strength of user affinity or aversion
  • Temporal Context: The immediacy of satisfaction

Core Contributions

  1. Theoretical Contribution: Systematically demonstrates that preference intensity and temporal context are critical factors for fine-grained preference modeling in LLM recommendation systems, challenging the existing binary preference paradigm
  2. Methodological Contribution: Proposes the RecPO framework that integrates these factors through adaptive reward margins based on preference intensity and temporal context
  3. Empirical Contribution: Experiments across five datasets show that RecPO not only improves accuracy but also exhibits behavioral characteristics aligned with human preferences: prioritizing timely satisfaction, maintaining preference consistency under changing contexts

Methodology Details

Task Definition

Given user u's interaction history HutH_u^t at time t and candidate item set C={i(j)}j=1KC = \{i^{(j)}\}_{j=1}^K, where HutC=H_u^t \cap C = \emptyset and ipt+1Ci_p^{t+1} \in C, the model πθ\pi_\theta needs to predict the item ipt+1i_p^{t+1} the user is most likely to prefer.

Core Method: RecPO Framework

1. Adaptive Reward Margin

The core innovation of RecPO lies in defining an adaptive target reward margin γr\gamma_r dynamically determined by structured preferences and relative recency:

γr=λϕ(sp,Δtp)ϕ(sd,Δtd)\gamma_r = \lambda \frac{\phi(s_p, \Delta t_p)}{\phi(s_d, \Delta t_d)}

where:

  • sp,sds_p, s_d are the structured preference scores for preferred and non-preferred items respectively
  • Δtp=tp+t\Delta t_p = t_p^+ - t represents the temporal delay of interaction
  • ϕ(s,Δt)=s/(Δt)0.5\phi(s, \Delta t) = s/(\Delta t)^{0.5} is the utility function
  • λ\lambda controls the magnitude of the margin

2. Preference Distribution Modeling

Based on the Bradley-Terry model, RecPO models preference probability as:

P(ypydxu)=σ(r(xu,yp)r(xu,yd)γr)P^*(y_p \succ y_d | x_u) = \sigma(r(x_u, y_p) - r(x_u, y_d) - \gamma_r)

3. Objective Function

Adopting the Plackett-Luce model to generalize pairwise comparisons to list-level ranking framework, the final objective function is:

L(πθ;πref)=E(xu,yp,Td)D[logσ(logydTdexp(βlogπθ(ydxu)πref(ydxu)βlogπθ(ypxu)πref(ypxu)λϕ(sp,Δtp)ϕ(sd,Δtd)))]L(\pi_\theta; \pi_{ref}) = -E_{(x_u,y_p,T_d)\sim D}\left[\log \sigma\left(-\log \sum_{y_d \in T_d} \exp\left(\beta \log \frac{\pi_\theta(y_d|x_u)}{\pi_{ref}(y_d|x_u)} - \beta \log \frac{\pi_\theta(y_p|x_u)}{\pi_{ref}(y_p|x_u)} - \lambda \frac{\phi(s_p,\Delta t_p)}{\phi(s_d,\Delta t_d)}\right)\right)\right]

Technical Innovations

  1. Non-uniform Margin Design: Unlike prior work using uniform margins, RecPO dynamically adjusts margins based on preference intensity and temporal distance
  2. Comprehensive Feedback Utilization: Preserves complete interaction sequences including negative feedback combined with explicit ratings
  3. Human Cognition Alignment: Preference modeling mechanism designed based on cognitive science principles

Experimental Setup

Datasets

Five real-world sequential recommendation datasets are employed:

  • Explicit Feedback Datasets: MovieLens-1M, Amazon-Books, BeerAdvocate
  • Implicit Feedback Datasets: Steam, LastFM
DatasetSequencesItemsInteractions
MovieLens6,0403,952994,169
Amazon-Books5,10338,20362,290
Steam3,1714,25182,072
BeerAdvocate4,7246,10591,207
LastFM982107,296307,829

Evaluation Metrics

  • Hit Ratio@1: Measures the proportion of correctly recommended items
  • Valid Ratio: Assesses instruction-following capability, quantifying outputs conforming to format requirements

Baseline Methods

  • Traditional Methods: GRU4Rec, Caser, SASRec
  • LLM Methods: DPO, SimPO, S-DPO
  • Base Models: LLaMA3-8B, Qwen2.5-7B

Implementation Details

  • Learning rate: 1e-5, Optimizer: AdamW
  • Batch size: 128, Sequence length: Adjusted per dataset
  • Number of negative samples: 3, Margin parameter λ: 2
  • Hardware: 8×NVIDIA RTX A100 (80GB)

Experimental Results

Main Results

RecPO achieves the best performance across all five datasets:

ModelMovieLens HR@1Amazon-Books HR@1BeerAdvocate HR@1Steam HR@1LastFM HR@1
SASRec0.26710.15590.38000.45870.6659
S-DPO0.29020.50650.46980.35880.5719
RecPO0.34510.58020.57710.46720.6830

Key Findings

  1. Importance of Comprehensive Feedback: Retaining negative interactions outperforms using only positive feedback
  2. Value of Structured Signals: Adding rating information significantly improves performance
  3. Factor Complementarity: Optimal performance derives from the combination of comprehensive feedback and structured signals

Ablation Study

Ablation studies on margin functions show:

DatasetLog DiffLog RatioRecPO (Ratio)
MovieLens0.31600.32470.3451
Amazon-Books0.53700.54550.5802

Ratio-based margin functions achieve the best performance across all datasets.

Human Alignment Behavior Analysis

RecPO exhibits human-aligned behavior across four key dimensions:

  1. Temporal Context Sensitivity: Better prioritizes temporally appropriate items when candidate sets contain future high-rated items
  2. Preference Intensity Awareness: Avoids recommending tempting items that ultimately receive low ratings
  3. Implicit Aversion Modeling: Identifies disliked items without explicit aversion labels
  4. Cross-context Robustness: Maintains stable performance across varying interaction history lengths

Sequential Recommendation

Early methods such as GRU4Rec employ recurrent neural networks, while SASRec introduces self-attention mechanisms. Recent approaches integrate graph structures, contrastive learning, and other techniques.

LLM-based Recommendation Systems

Methods like LLaRA and TALLRec integrate LLMs into recommendation systems, primarily focusing on semantic understanding rather than fine-grained factors in preference modeling.

LLM Alignment Techniques

From RLHF to DPO and its variants (IPO, CPO, KTO, SimPO), these methods primarily target general NLP tasks, with S-DPO being the first to adapt alignment techniques to recommendation tasks.

Conclusions and Discussion

Main Conclusions

  1. Preference intensity and temporal context are overlooked yet critical factors in LLM recommendation systems
  2. RecPO effectively integrates these factors through adaptive reward margins, achieving both performance improvements and human behavior alignment
  3. The method demonstrates consistent improvements across both explicit and implicit feedback datasets

Limitations

  1. Simplified Preference Structure: Adopts a simplified sequential preference structure
  2. Single Contextual Factor: Considers only satisfaction delay as a contextual factor
  3. Evaluation Metric Limitations: Primarily relies on single metrics, failing to capture more comprehensive behavioral patterns

Future Directions

  1. Complex Preference Hierarchy Modeling: Explore more sophisticated cognitively-grounded preference structures
  2. Enriched Contextual Factors: Integrate additional contextual influences
  3. Comprehensive Evaluation Framework: Develop more comprehensive behavior-oriented evaluation metrics

In-Depth Evaluation

Strengths

  1. Precise Problem Identification: Clearly identifies core issues with existing methods (binary preference modeling)
  2. Well-Designed Methodology: Adaptive margin mechanism grounded in cognitive science principles provides solid theoretical foundation
  3. Comprehensive Experimental Design: Complete experimental framework including proof-of-concept, main experiments, ablation studies, and behavioral analysis
  4. Strong Result Convincingness: Consistent improvements across multiple datasets and human behavior alignment analysis enhance persuasiveness

Weaknesses

  1. Insufficient Theoretical Analysis: Lacks in-depth theoretical analysis of why this margin design is effective
  2. Computational Complexity Undiscussed: Does not analyze computational overhead compared to baseline methods
  3. Limited Hyperparameter Sensitivity: Relatively simple sensitivity analysis for critical parameter λ
  4. Limited Generalization Capability: Primarily validated on specific types of recommendation tasks; generalization remains to be verified

Impact

  1. Academic Contribution: Provides new research directions and theoretical frameworks for LLM recommendation system research
  2. Practical Value: Offers directly applicable improvement methods; open-source code enhances reproducibility
  3. Inspirational Significance: Emphasizes the importance of cognitive science principles in AI system design

Applicable Scenarios

  1. Sequential Recommendation Systems: Particularly suitable for scenarios with clear temporal sequences and rating information
  2. Personalization Applications: Appropriate for personalized services requiring fine-grained preference modeling
  3. Multimodal Recommendation: Framework design exhibits extensibility, adaptable to multimodal recommendation tasks

References

This paper cites important works from multiple domains including recommendation systems, LLM alignment, and cognitive science, including:

  • Classical recommendation methods: GRU4Rec, SASRec, Caser
  • LLM alignment techniques: DPO, RLHF, SimPO
  • Cognitive science foundations: Astington & Jenkins (1995) research on human decision-making mechanisms

Overall Assessment: This is a high-quality research paper demonstrating excellence in theoretical contributions, methodological innovation, and experimental validation. The paper successfully identifies and addresses key challenges in LLM recommendation systems, proposing the RecPO framework with solid theoretical grounding and practical value. Despite some limitations, its contributions to recommendation systems and LLM alignment research are significant.