Sequential recommendation systems aspire to profile users by interpreting their interaction histories, echoing how humans make decisions by weighing experience, relative preference strength, and situational relevance. Yet, existing large language model (LLM)-based recommenders often fall short of mimicking the flexible, context-aware decision strategies humans exhibit, neglecting the structured, dynamic, and context-aware mechanisms fundamental to human behaviors. To bridge this gap, we propose RecPO, a preference optimization framework that models structured feedback and contextual delay to emulate human-like prioritization in sequential recommendation. RecPO exploits adaptive reward margins based on inferred preference hierarchies and temporal signals, enabling the model to favor immediately relevant items and to distinguish between varying degrees of preference and aversion. Extensive experiments across five real-world datasets demonstrate that RecPO not only yields performance gains over state-of-the-art baselines, but also mirrors key characteristics of human decision-making: favoring timely satisfaction, maintaining coherent preferences, and exercising discernment under shifting contexts.
What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context
- Paper ID: 2506.02261
- Title: What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context
- Authors: Zhongyu Ouyang, Qianlong Wen, Chunhui Zhang, Yanfang Ye, Soroush Vosoughi
- Institutions: Dartmouth College, University of Notre Dame
- Classification: cs.IR, cs.LG
- Publication Date: October 10, 2025 (arXiv v2)
- Paper Link: https://arxiv.org/abs/2506.02261v2
Sequential recommendation systems aspire to profile users by interpreting their interaction histories, echoing how humans make decisions by weighing experience, relative preference strength, and situational relevance. Yet, existing large language model (LLM)-based recommenders often fall short of mimicking the flexible, context-aware decision strategies humans exhibit, neglecting the structured, dynamic, and context-aware mechanisms fundamental to human behaviors. To bridge this gap, we propose RecPO, a preference optimization framework that models structured feedback and contextual delay to emulate human-like prioritization in sequential recommendation. RecPO exploits adaptive reward margins based on inferred preference hierarchies and temporal signals, enabling the model to favor immediately relevant items and to distinguish between varying degrees of preference and aversion. Extensive experiments across five real-world datasets demonstrate that RecPO not only yields performance gains over state-of-the-art baselines, but also mirrors key characteristics of human decision-making: favoring timely satisfaction, maintaining coherent preferences, and exercising discernment under shifting contexts.
Existing LLM-based sequential recommendation systems suffer from the following key limitations:
- Binary Preference Modeling: Existing methods such as DPO and its variants process all preferences through binary pairwise comparisons, neglecting the heterogeneity of preference intensity
- Missing Temporal Context: Lack of temporal sensitivity modeling, failing to distinguish between immediate and delayed satisfaction
- Overlooking Human Decision Mechanisms: Failure to emulate the complex mechanisms by which humans weigh experience, relative preference strength, and situational relevance in decision-making
Human decision-making behavior exhibits hierarchical preferences (strong affinity vs. mild preference) and temporal sensitivity (immediate vs. delayed satisfaction), characteristics well-established in behavioral economics and cognitive science but largely overlooked in current LLM recommendation preference alignment. Through systematic empirical investigation, the authors discover that integrating comprehensive feedback (including negative interactions) and structured preference signals (such as ratings) significantly enhances performance.
Through proof-of-concept experiments, the authors identify two critical factors:
- Preference Intensity: The hierarchical strength of user affinity or aversion
- Temporal Context: The immediacy of satisfaction
- Theoretical Contribution: Systematically demonstrates that preference intensity and temporal context are critical factors for fine-grained preference modeling in LLM recommendation systems, challenging the existing binary preference paradigm
- Methodological Contribution: Proposes the RecPO framework that integrates these factors through adaptive reward margins based on preference intensity and temporal context
- Empirical Contribution: Experiments across five datasets show that RecPO not only improves accuracy but also exhibits behavioral characteristics aligned with human preferences: prioritizing timely satisfaction, maintaining preference consistency under changing contexts
Given user u's interaction history Hut at time t and candidate item set C={i(j)}j=1K, where Hut∩C=∅ and ipt+1∈C, the model πθ needs to predict the item ipt+1 the user is most likely to prefer.
The core innovation of RecPO lies in defining an adaptive target reward margin γr dynamically determined by structured preferences and relative recency:
γr=λϕ(sd,Δtd)ϕ(sp,Δtp)
where:
- sp,sd are the structured preference scores for preferred and non-preferred items respectively
- Δtp=tp+−t represents the temporal delay of interaction
- ϕ(s,Δt)=s/(Δt)0.5 is the utility function
- λ controls the magnitude of the margin
Based on the Bradley-Terry model, RecPO models preference probability as:
P∗(yp≻yd∣xu)=σ(r(xu,yp)−r(xu,yd)−γr)
Adopting the Plackett-Luce model to generalize pairwise comparisons to list-level ranking framework, the final objective function is:
L(πθ;πref)=−E(xu,yp,Td)∼D[logσ(−log∑yd∈Tdexp(βlogπref(yd∣xu)πθ(yd∣xu)−βlogπref(yp∣xu)πθ(yp∣xu)−λϕ(sd,Δtd)ϕ(sp,Δtp)))]
- Non-uniform Margin Design: Unlike prior work using uniform margins, RecPO dynamically adjusts margins based on preference intensity and temporal distance
- Comprehensive Feedback Utilization: Preserves complete interaction sequences including negative feedback combined with explicit ratings
- Human Cognition Alignment: Preference modeling mechanism designed based on cognitive science principles
Five real-world sequential recommendation datasets are employed:
- Explicit Feedback Datasets: MovieLens-1M, Amazon-Books, BeerAdvocate
- Implicit Feedback Datasets: Steam, LastFM
| Dataset | Sequences | Items | Interactions |
|---|
| MovieLens | 6,040 | 3,952 | 994,169 |
| Amazon-Books | 5,103 | 38,203 | 62,290 |
| Steam | 3,171 | 4,251 | 82,072 |
| BeerAdvocate | 4,724 | 6,105 | 91,207 |
| LastFM | 982 | 107,296 | 307,829 |
- Hit Ratio@1: Measures the proportion of correctly recommended items
- Valid Ratio: Assesses instruction-following capability, quantifying outputs conforming to format requirements
- Traditional Methods: GRU4Rec, Caser, SASRec
- LLM Methods: DPO, SimPO, S-DPO
- Base Models: LLaMA3-8B, Qwen2.5-7B
- Learning rate: 1e-5, Optimizer: AdamW
- Batch size: 128, Sequence length: Adjusted per dataset
- Number of negative samples: 3, Margin parameter λ: 2
- Hardware: 8×NVIDIA RTX A100 (80GB)
RecPO achieves the best performance across all five datasets:
| Model | MovieLens HR@1 | Amazon-Books HR@1 | BeerAdvocate HR@1 | Steam HR@1 | LastFM HR@1 |
|---|
| SASRec | 0.2671 | 0.1559 | 0.3800 | 0.4587 | 0.6659 |
| S-DPO | 0.2902 | 0.5065 | 0.4698 | 0.3588 | 0.5719 |
| RecPO | 0.3451 | 0.5802 | 0.5771 | 0.4672 | 0.6830 |
- Importance of Comprehensive Feedback: Retaining negative interactions outperforms using only positive feedback
- Value of Structured Signals: Adding rating information significantly improves performance
- Factor Complementarity: Optimal performance derives from the combination of comprehensive feedback and structured signals
Ablation studies on margin functions show:
| Dataset | Log Diff | Log Ratio | RecPO (Ratio) |
|---|
| MovieLens | 0.3160 | 0.3247 | 0.3451 |
| Amazon-Books | 0.5370 | 0.5455 | 0.5802 |
Ratio-based margin functions achieve the best performance across all datasets.
RecPO exhibits human-aligned behavior across four key dimensions:
- Temporal Context Sensitivity: Better prioritizes temporally appropriate items when candidate sets contain future high-rated items
- Preference Intensity Awareness: Avoids recommending tempting items that ultimately receive low ratings
- Implicit Aversion Modeling: Identifies disliked items without explicit aversion labels
- Cross-context Robustness: Maintains stable performance across varying interaction history lengths
Early methods such as GRU4Rec employ recurrent neural networks, while SASRec introduces self-attention mechanisms. Recent approaches integrate graph structures, contrastive learning, and other techniques.
Methods like LLaRA and TALLRec integrate LLMs into recommendation systems, primarily focusing on semantic understanding rather than fine-grained factors in preference modeling.
From RLHF to DPO and its variants (IPO, CPO, KTO, SimPO), these methods primarily target general NLP tasks, with S-DPO being the first to adapt alignment techniques to recommendation tasks.
- Preference intensity and temporal context are overlooked yet critical factors in LLM recommendation systems
- RecPO effectively integrates these factors through adaptive reward margins, achieving both performance improvements and human behavior alignment
- The method demonstrates consistent improvements across both explicit and implicit feedback datasets
- Simplified Preference Structure: Adopts a simplified sequential preference structure
- Single Contextual Factor: Considers only satisfaction delay as a contextual factor
- Evaluation Metric Limitations: Primarily relies on single metrics, failing to capture more comprehensive behavioral patterns
- Complex Preference Hierarchy Modeling: Explore more sophisticated cognitively-grounded preference structures
- Enriched Contextual Factors: Integrate additional contextual influences
- Comprehensive Evaluation Framework: Develop more comprehensive behavior-oriented evaluation metrics
- Precise Problem Identification: Clearly identifies core issues with existing methods (binary preference modeling)
- Well-Designed Methodology: Adaptive margin mechanism grounded in cognitive science principles provides solid theoretical foundation
- Comprehensive Experimental Design: Complete experimental framework including proof-of-concept, main experiments, ablation studies, and behavioral analysis
- Strong Result Convincingness: Consistent improvements across multiple datasets and human behavior alignment analysis enhance persuasiveness
- Insufficient Theoretical Analysis: Lacks in-depth theoretical analysis of why this margin design is effective
- Computational Complexity Undiscussed: Does not analyze computational overhead compared to baseline methods
- Limited Hyperparameter Sensitivity: Relatively simple sensitivity analysis for critical parameter λ
- Limited Generalization Capability: Primarily validated on specific types of recommendation tasks; generalization remains to be verified
- Academic Contribution: Provides new research directions and theoretical frameworks for LLM recommendation system research
- Practical Value: Offers directly applicable improvement methods; open-source code enhances reproducibility
- Inspirational Significance: Emphasizes the importance of cognitive science principles in AI system design
- Sequential Recommendation Systems: Particularly suitable for scenarios with clear temporal sequences and rating information
- Personalization Applications: Appropriate for personalized services requiring fine-grained preference modeling
- Multimodal Recommendation: Framework design exhibits extensibility, adaptable to multimodal recommendation tasks
This paper cites important works from multiple domains including recommendation systems, LLM alignment, and cognitive science, including:
- Classical recommendation methods: GRU4Rec, SASRec, Caser
- LLM alignment techniques: DPO, RLHF, SimPO
- Cognitive science foundations: Astington & Jenkins (1995) research on human decision-making mechanisms
Overall Assessment: This is a high-quality research paper demonstrating excellence in theoretical contributions, methodological innovation, and experimental validation. The paper successfully identifies and addresses key challenges in LLM recommendation systems, proposing the RecPO framework with solid theoretical grounding and practical value. Despite some limitations, its contributions to recommendation systems and LLM alignment research are significant.