Sequential recommendation aims to predict the next item based on user interests in historical interaction sequences. Historical interaction sequences often contain irrelevant noisy items, which significantly hinders the performance of recommendation systems. Existing research employs unsupervised methods that indirectly identify item-granularity irrelevant noise by predicting the ground truth item. Since these methods lack explicit noise labels, they are prone to misidentify users' interested items as noise. Additionally, while these methods focus on removing item-granularity noise driven by the ground truth item, they overlook interest-granularity noise, limiting their ability to perform broader denoising based on user interests. To address these issues, we propose Multi-Granularity Sequence Denoising with Weakly Supervised Signal for Sequential Recommendation(MGSD-WSS). MGSD-WSS first introduces the Multiple Gaussian Kernel Perceptron module to map the original and enhance sequence into a common representation space and utilizes weakly supervised signals to accurately identify noisy items in the historical interaction sequence. Subsequently, it employs the item-granularity denoising module with noise-weighted contrastive learning to obtain denoised item representations. Then, it extracts target interest representations from the ground truth item and applies noise-weighted contrastive learning to obtain denoised interest representations. Finally, based on the denoised item and interest representations, MGSD-WSS predicts the next item. Extensive experiments on five datasets demonstrate that the proposed method significantly outperforms state-of-the-art sequence recommendation and denoising models. Our code is available at https://github.com/lalunex/MGSD-WSS.
- Paper ID: 2510.10564
- Title: Multi-Granularity Sequence Denoising with Weakly Supervised Signal for Sequential Recommendation
- Authors: Liang Li (Chongqing University of Technology), Zhou Yang (Fuzhou University), Xiaofei Zhu (Chongqing University of Technology)
- Category: cs.IR (Information Retrieval)
- Publication Date: October 12, 2025 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2510.10564
- Code Link: https://github.com/lalunex/MGSD-WSS
Sequential recommendation aims to predict the next item based on user interests derived from historical interaction sequences. Historical interaction sequences typically contain irrelevant noisy items, which significantly impede recommendation system performance. Existing research employs unsupervised methods to indirectly identify item-level irrelevant noise by predicting ground-truth items. Due to the lack of explicit noise labels in these methods, they are prone to misidentifying items of user interest as noise. Furthermore, these methods focus on removing item-level noise driven by ground-truth items while neglecting interest-level noise, limiting the capability for broader denoising based on user interests. To address these issues, this paper proposes MGSD-WSS (Multi-Granularity Sequence Denoising with Weakly Supervised Signal), a sequential recommendation method that combines multi-granularity denoising with weakly supervised signals.
The core problem faced by sequential recommendation systems is the presence of noisy items in historical interaction sequences, such as accidental clicks and malicious false interactions, which significantly reduce recommendation system performance.
- Soft Denoising Methods: Adjust the weights of noisy items through attention mechanisms or filtering algorithms, but cannot completely eliminate noise effects
- Hard Denoising Methods: Generate noise detection signals to explicitly remove noisy items, but suffer from the following issues:
- Use ground-truth items rather than true noise labels to guide model noise identification, limiting accuracy
- Focus solely on item-level denoising while neglecting interest-level noise
- The absence of explicit noise labels causes existing unsupervised methods to easily misidentify items of user interest
- User interactions not only reflect preferences for specific items but also embody higher-level interests (e.g., "sports" interest encompasses football, running shoes, treadmills, etc.)
- Hierarchical denoising at multiple granularities is needed to more comprehensively remove noise
- First Introduction of Weakly Supervised Signals: Directly trains the model for noise identification through labeled weakly supervised signals, overcoming the inaccuracy of previous unsupervised methods
- Multi-Granularity Hierarchical Denoising: Proposes hierarchical denoising modules at item and interest granularities, combined with noise-weighted contrastive learning
- Innovative Architecture Design:
- Multiple Gaussian Kernel Perceptron (MGP) module
- Target-aware Sequence Encoding
- Noise-weighted contrastive learning framework
- Significant Performance Improvements: Substantially outperforms state-of-the-art sequential recommendation and denoising models on five datasets
Given a user set U={u1,u2,…,u∣U∣} and an item set V={v1,v2,…,v∣V∣}, each user u∈U is associated with a temporally ordered historical interaction sequence S=[s1,s2,…,sn]. The objective is to utilize the interaction sequence S to predict the item most likely to be interacted with by the user at step (n+1), i.e., p(sn+1∣s1:n).
MGSD-WSS comprises three core components:
Sequence Data Augmentation:
- Randomly select t distinct items as noise to insert into the original sequence
- Construct augmented sequence Sˉ=[sˉ1,sˉ2,…,sˉn+t]
- Obtain supervision signal Yˉ=[yˉ1,yˉ2,…,yˉn+t] indicating noise positions
Multiple Gaussian Kernel Perceptron (MGP):
- Compute cosine similarity between target item and each item in the sequence:
αˉi=cos(hˉn+1,hˉi)
- Transform relevance scores using k Gaussian kernels:
rij=exp(−2σj2(αˉi−μj)2)h^i=∑j=1krijhˉi
- Obtain rich representations through Transformer encoder:
G=Transformer(H^+P)
Use a shared item-level noise discriminator to detect noisy items in the augmented sequence:
βi=Softmax((ReLU(gˉiW1+b1))W2)
Minimize the difference between noise detection signals and supervision signals through MSE loss:
MSE=n1∑i=1n(βi0−yˉi)2
Item-Level Denoising:
- Convert noise detection signals to binary hard values using Gumbel-softmax
- Filter noisy items to construct denoised representation matrix
- Apply noise-weighted contrastive learning:
ITSCL=−∣G+∣1∑gi∈G+log∑gj∈Gω(gj)⋅exp(sim(ese,gj)/τ)ω(gi)⋅exp(sim(ese,gi)/τ)
Interest-Level Denoising:
- Introduce learnable interest representation matrix Q=[q1,q2,…,qm]
- Compute relevance scores between items and interests
- Assess interest reliability using target-aware interest attention
- Apply interest-level noise-weighted contrastive learning
- Weakly Supervised Signal Generation: Generate explicit noise labels through data augmentation strategies, providing accurate supervision signals
- Multi-Granularity Denoising: Perform denoising simultaneously at both item and interest granularities for more comprehensive sequence noise handling
- Noise-Weighted Contrastive Learning: Assign weights to samples based on noise degree, superior to traditional equal-weight contrastive learning
- Gaussian Kernel Perceptron: Capture information from different similarity regions, enhancing sequence representation
Five public benchmark datasets are used:
| Dataset | Sequences | Users | Items | Avg. Length | Sparsity |
|---|
| ML-100k | 99,287 | 944 | 1,350 | 105.29 | 92.21% |
| Beauty | 198,502 | 22,364 | 12,102 | 8.88 | 99.93% |
| Sports | 296,337 | 35,599 | 18,358 | 8.32 | 99.95% |
| Yelp | 316,354 | 30,432 | 20,034 | 10.40 | 99.95% |
| ML-1M | 999,611 | 6,041 | 3,417 | 165.50 | 95.16% |
- Hit Ratio (HR@{5, 10, 20})
- Normalized Discounted Cumulative Gain (NDCG@{5, 10, 20})
- Mean Reciprocal Rank (MRR@20)
Sequential Recommendation Baselines:
- GRU4Rec, NARM, STAMP, CASER, SASRec, BERT4Rec
Denoising Baselines:
- DSAN, FMLP-Rec, HSD+BERT4Rec, AC-BERT4Rec, MSDCCL+BERT4Rec
- Embedding dimension: 100
- Batch size: 256
- Learning rate: 10^-3
- Number of Gaussian kernels: 10
- Temperature parameter: τ = 0.5
Comparison with Sequential Recommendation Baselines:
MGSD-WSS combined with mainstream sequential recommendation models achieves significant performance improvements on all datasets. On the ML-100k dataset, MGSD-WSS+BERT4Rec achieves improvements of 167.43%, 195.87%, and 235.67% over the original BERT4Rec in HR@20, NDCG@20, and MRR@20, respectively.
Comparison with Denoising Baselines:
On most metrics, MGSD-WSS+BERT4Rec outperforms other denoising baselines, particularly excelling on ML-100k and ML-1M datasets. On the ML-1M dataset, compared to the strongest baseline MSDCCL+BERT4Rec, improvements range from 30.80% to 60.94% across metrics.
Performance degradation analysis after removing each module:
- w/o AND (without Auxiliary Noise Discrimination): Largest performance drop, demonstrating the importance of weakly supervised signals
- w/o InSD (without Interest-Level Denoising): Significantly impacts performance on Beauty, Sports, and ML-1M datasets
- w/o ItSD (without Item-Level Denoising): Greatest impact on ML-100k and Yelp datasets
- w/o MGP (without Multiple Gaussian Kernel Perceptron): Causes performance degradation, validating module effectiveness
Compared to traditional contrastive learning, noise-weighted contrastive learning improves HR@20, NDCG@20, and MRR@20 by 12.59%, 10.63%, and 9.48%, respectively, on the ML-100k dataset, demonstrating the effectiveness of precise weight assignment.
Number of Noisy Items t:
- Moderate numbers of noisy items help the model learn to distinguish true preferences from noise
- Excessive noise dilutes information signals, leading to performance degradation
Number of User Interests m:
- Optimal performance is achieved at m=5
- Excessive interests may introduce irrelevant information, reducing performance
Development from early Markov chain methods to deep learning approaches, including RNN, LSTM, CNN, attention mechanisms, and graph neural networks. Recent research integrates external knowledge graphs, cross-domain information, and multimodal learning frameworks.
Divided into soft denoising (weight adjustment) and hard denoising (direct removal) categories. Existing hard denoising methods primarily rely on ground-truth items for guidance, lacking true noise labels, and focus only on item-level denoising.
Used in recommendation systems to extract high-quality representations, but existing methods treat all samples equally, ignoring sample importance differences.
- Weakly supervised signals significantly improve noise identification accuracy
- Multi-granularity denoising is more effective than single item-level denoising
- Noise-weighted contrastive learning outperforms traditional contrastive learning
- The model maintains robustness across different sequence lengths
- Suboptimal performance on some metrics for short sequence datasets (Beauty, Sports, Yelp)
- Introduced noise may cause information pollution for short sequences
- Requires pre-specification of hyperparameters such as user interest quantity
- Investigate the impact of different Gaussian kernel configurations
- Explore adversarial or heuristic noise generation strategies
- Provide theoretical or data-driven justification for interest configuration
- Strong Novelty: First application of weakly supervised denoising in sequential recommendation, proposing a multi-granularity denoising framework
- Complete Methodology: Comprehensive solution from noise detection to multi-granularity denoising
- Comprehensive Experiments: Five datasets, multiple baselines, detailed ablation studies and parameter analysis
- Theoretically Sound: Noise-weighted contrastive learning has clear theoretical motivation
- Excellent Performance: Significantly outperforms existing methods on most metrics
- Limited Applicability: Unstable performance on short sequence datasets
- Computational Complexity: Multi-granularity denoising and contrastive learning increase computational overhead
- Hyperparameter Sensitivity: Requires careful tuning of noise quantity, interest quantity, and other parameters
- Noise Generation Strategy: Random noise insertion may not be sufficiently realistic
- Academic Value: Provides new research directions for sequential recommendation denoising
- Practical Value: Applicable to real recommendation systems for performance improvement
- Reproducibility: Provides detailed implementation details and code
- Recommendation systems with long user interaction sequences
- Recommendation scenarios with significant noise (e.g., e-commerce, video platforms)
- Applications requiring fine-grained user interest modeling
The paper cites important works in sequential recommendation, denoising methods, and contrastive learning, including:
- Classical sequential recommendation methods: GRU4Rec, SASRec, BERT4Rec
- Denoising-related work: HSD, MSDCCL, etc.
- Contrastive learning methods: CL4SRec, ICL, etc.
This paper provides an innovative solution to the noise handling problem in sequential recommendation, with significant value in both theory and practice.