2025-11-23T00:10:15.831186

Multi-Granularity Sequence Denoising with Weakly Supervised Signal for Sequential Recommendation

Li, Yang, Zhu

Sequential recommendation aims to predict the next item based on user interests in historical interaction sequences. Historical interaction sequences often contain irrelevant noisy items, which significantly hinders the performance of recommendation systems. Existing research employs unsupervised methods that indirectly identify item-granularity irrelevant noise by predicting the ground truth item. Since these methods lack explicit noise labels, they are prone to misidentify users' interested items as noise. Additionally, while these methods focus on removing item-granularity noise driven by the ground truth item, they overlook interest-granularity noise, limiting their ability to perform broader denoising based on user interests. To address these issues, we propose Multi-Granularity Sequence Denoising with Weakly Supervised Signal for Sequential Recommendation(MGSD-WSS). MGSD-WSS first introduces the Multiple Gaussian Kernel Perceptron module to map the original and enhance sequence into a common representation space and utilizes weakly supervised signals to accurately identify noisy items in the historical interaction sequence. Subsequently, it employs the item-granularity denoising module with noise-weighted contrastive learning to obtain denoised item representations. Then, it extracts target interest representations from the ground truth item and applies noise-weighted contrastive learning to obtain denoised interest representations. Finally, based on the denoised item and interest representations, MGSD-WSS predicts the next item. Extensive experiments on five datasets demonstrate that the proposed method significantly outperforms state-of-the-art sequence recommendation and denoising models. Our code is available at https://github.com/lalunex/MGSD-WSS.

academic

Multi-Granularity Sequence Denoising with Weakly Supervised Signal for Sequential Recommendation

Basic Information

Paper ID: 2510.10564
Title: Multi-Granularity Sequence Denoising with Weakly Supervised Signal for Sequential Recommendation
Authors: Liang Li (Chongqing University of Technology), Zhou Yang (Fuzhou University), Xiaofei Zhu (Chongqing University of Technology)
Category: cs.IR (Information Retrieval)
Publication Date: October 12, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.10564
Code Link: https://github.com/lalunex/MGSD-WSS

Abstract

Sequential recommendation aims to predict the next item based on user interests derived from historical interaction sequences. Historical interaction sequences typically contain irrelevant noisy items, which significantly impede recommendation system performance. Existing research employs unsupervised methods to indirectly identify item-level irrelevant noise by predicting ground-truth items. Due to the lack of explicit noise labels in these methods, they are prone to misidentifying items of user interest as noise. Furthermore, these methods focus on removing item-level noise driven by ground-truth items while neglecting interest-level noise, limiting the capability for broader denoising based on user interests. To address these issues, this paper proposes MGSD-WSS (Multi-Granularity Sequence Denoising with Weakly Supervised Signal), a sequential recommendation method that combines multi-granularity denoising with weakly supervised signals.

Research Background and Motivation

Problem Definition

The core problem faced by sequential recommendation systems is the presence of noisy items in historical interaction sequences, such as accidental clicks and malicious false interactions, which significantly reduce recommendation system performance.

Limitations of Existing Methods

Soft Denoising Methods: Adjust the weights of noisy items through attention mechanisms or filtering algorithms, but cannot completely eliminate noise effects
Hard Denoising Methods: Generate noise detection signals to explicitly remove noisy items, but suffer from the following issues:
- Use ground-truth items rather than true noise labels to guide model noise identification, limiting accuracy
- Focus solely on item-level denoising while neglecting interest-level noise

Research Motivation

The absence of explicit noise labels causes existing unsupervised methods to easily misidentify items of user interest
User interactions not only reflect preferences for specific items but also embody higher-level interests (e.g., "sports" interest encompasses football, running shoes, treadmills, etc.)
Hierarchical denoising at multiple granularities is needed to more comprehensively remove noise

Core Contributions

First Introduction of Weakly Supervised Signals: Directly trains the model for noise identification through labeled weakly supervised signals, overcoming the inaccuracy of previous unsupervised methods
Multi-Granularity Hierarchical Denoising: Proposes hierarchical denoising modules at item and interest granularities, combined with noise-weighted contrastive learning
Innovative Architecture Design:
- Multiple Gaussian Kernel Perceptron (MGP) module
- Target-aware Sequence Encoding
- Noise-weighted contrastive learning framework
Significant Performance Improvements: Substantially outperforms state-of-the-art sequential recommendation and denoising models on five datasets

Method Details

Task Definition

Given a user set $\mathcal{U} = \{u_1, u_2, \ldots, u_{|\mathcal{U}|}\}$ and an item set $\mathcal{V} = \{v_1, v_2, \ldots, v_{|\mathcal{V}|}\}$ , each user $u \in \mathcal{U}$ is associated with a temporally ordered historical interaction sequence $S = [s_1, s_2, \ldots, s_n]$ . The objective is to utilize the interaction sequence $S$ to predict the item most likely to be interacted with by the user at step $(n+1)$ , i.e., $p(s_{n+1}|s_{1:n})$ .

Model Architecture

MGSD-WSS comprises three core components:

1. Target-aware Sequence Encoding

Sequence Data Augmentation:

Randomly select $t$ distinct items as noise to insert into the original sequence
Construct augmented sequence $\bar{S} = [\bar{s}_1, \bar{s}_2, \ldots, \bar{s}_{n+t}]$
Obtain supervision signal $\bar{Y} = [\bar{y}_1, \bar{y}_2, \ldots, \bar{y}_{n+t}]$ indicating noise positions

Multiple Gaussian Kernel Perceptron (MGP):

Compute cosine similarity between target item and each item in the sequence: $\bar{\alpha}_i = \cos(\bar{h}_{n+1}, \bar{h}_i)$
Transform relevance scores using $k$ Gaussian kernels: $r_{ij} = \exp\left(-\frac{(\bar{\alpha}_i - \mu_j)^2}{2\sigma_j^2}\right)$ $\hat{h}_i = \sum_{j=1}^k r_{ij} \bar{h}_i$
Obtain rich representations through Transformer encoder: $G = \text{Transformer}(\hat{H} + P)$

2. Auxiliary Noise Discrimination

Use a shared item-level noise discriminator to detect noisy items in the augmented sequence: $\boldsymbol{\beta}_i = \text{Softmax}((\text{ReLU}(\bar{g}_i W_1 + b_1))W_2)$

Minimize the difference between noise detection signals and supervision signals through MSE loss: $MSE = \frac{1}{n}\sum_{i=1}^n (\beta_i^0 - \bar{y}_i)^2$

3. Multi-granularity Sequence Denoising

Item-Level Denoising:

Convert noise detection signals to binary hard values using Gumbel-softmax
Filter noisy items to construct denoised representation matrix
Apply noise-weighted contrastive learning: $ITSCL = -\frac{1}{|G^+|}\sum_{g_i \in G^+} \log \frac{\omega(g_i) \cdot \exp(\text{sim}(e_{se}, g_i)/\tau)}{\sum_{g_j \in G} \omega(g_j) \cdot \exp(\text{sim}(e_{se}, g_j)/\tau)}$

Interest-Level Denoising:

Introduce learnable interest representation matrix $Q = [q_1, q_2, \ldots, q_m]$
Compute relevance scores between items and interests
Assess interest reliability using target-aware interest attention
Apply interest-level noise-weighted contrastive learning

Technical Innovations

Weakly Supervised Signal Generation: Generate explicit noise labels through data augmentation strategies, providing accurate supervision signals
Multi-Granularity Denoising: Perform denoising simultaneously at both item and interest granularities for more comprehensive sequence noise handling
Noise-Weighted Contrastive Learning: Assign weights to samples based on noise degree, superior to traditional equal-weight contrastive learning
Gaussian Kernel Perceptron: Capture information from different similarity regions, enhancing sequence representation

Experimental Setup

Datasets

Five public benchmark datasets are used:

Dataset	Sequences	Users	Items	Avg. Length	Sparsity
ML-100k	99,287	944	1,350	105.29	92.21%
Beauty	198,502	22,364	12,102	8.88	99.93%
Sports	296,337	35,599	18,358	8.32	99.95%
Yelp	316,354	30,432	20,034	10.40	99.95%
ML-1M	999,611	6,041	3,417	165.50	95.16%

Evaluation Metrics

Hit Ratio (HR@{5, 10, 20})
Normalized Discounted Cumulative Gain (NDCG@{5, 10, 20})
Mean Reciprocal Rank (MRR@20)

Baseline Methods

Sequential Recommendation Baselines:

GRU4Rec, NARM, STAMP, CASER, SASRec, BERT4Rec

Denoising Baselines:

DSAN, FMLP-Rec, HSD+BERT4Rec, AC-BERT4Rec, MSDCCL+BERT4Rec

Implementation Details

Embedding dimension: 100
Batch size: 256
Learning rate: 10^-3
Number of Gaussian kernels: 10
Temperature parameter: τ = 0.5

Experimental Results

Main Results

Comparison with Sequential Recommendation Baselines: MGSD-WSS combined with mainstream sequential recommendation models achieves significant performance improvements on all datasets. On the ML-100k dataset, MGSD-WSS+BERT4Rec achieves improvements of 167.43%, 195.87%, and 235.67% over the original BERT4Rec in HR@20, NDCG@20, and MRR@20, respectively.

Comparison with Denoising Baselines: On most metrics, MGSD-WSS+BERT4Rec outperforms other denoising baselines, particularly excelling on ML-100k and ML-1M datasets. On the ML-1M dataset, compared to the strongest baseline MSDCCL+BERT4Rec, improvements range from 30.80% to 60.94% across metrics.

Ablation Study

Performance degradation analysis after removing each module:

w/o AND (without Auxiliary Noise Discrimination): Largest performance drop, demonstrating the importance of weakly supervised signals
w/o InSD (without Interest-Level Denoising): Significantly impacts performance on Beauty, Sports, and ML-1M datasets
w/o ItSD (without Item-Level Denoising): Greatest impact on ML-100k and Yelp datasets
w/o MGP (without Multiple Gaussian Kernel Perceptron): Causes performance degradation, validating module effectiveness

Noise-Weighted Contrastive Learning Analysis

Compared to traditional contrastive learning, noise-weighted contrastive learning improves HR@20, NDCG@20, and MRR@20 by 12.59%, 10.63%, and 9.48%, respectively, on the ML-100k dataset, demonstrating the effectiveness of precise weight assignment.

Parameter Sensitivity Analysis

Number of Noisy Items $t$ :

Moderate numbers of noisy items help the model learn to distinguish true preferences from noise
Excessive noise dilutes information signals, leading to performance degradation

Number of User Interests $m$ :

Optimal performance is achieved at $m=5$
Excessive interests may introduce irrelevant information, reducing performance

Sequential Recommendation

Development from early Markov chain methods to deep learning approaches, including RNN, LSTM, CNN, attention mechanisms, and graph neural networks. Recent research integrates external knowledge graphs, cross-domain information, and multimodal learning frameworks.

Denoising Methods

Divided into soft denoising (weight adjustment) and hard denoising (direct removal) categories. Existing hard denoising methods primarily rely on ground-truth items for guidance, lacking true noise labels, and focus only on item-level denoising.

Contrastive Learning

Used in recommendation systems to extract high-quality representations, but existing methods treat all samples equally, ignoring sample importance differences.

Conclusions and Discussion

Main Conclusions

Weakly supervised signals significantly improve noise identification accuracy
Multi-granularity denoising is more effective than single item-level denoising
Noise-weighted contrastive learning outperforms traditional contrastive learning
The model maintains robustness across different sequence lengths

Limitations

Suboptimal performance on some metrics for short sequence datasets (Beauty, Sports, Yelp)
Introduced noise may cause information pollution for short sequences
Requires pre-specification of hyperparameters such as user interest quantity

Future Directions

Investigate the impact of different Gaussian kernel configurations
Explore adversarial or heuristic noise generation strategies
Provide theoretical or data-driven justification for interest configuration

In-Depth Evaluation

Strengths

Strong Novelty: First application of weakly supervised denoising in sequential recommendation, proposing a multi-granularity denoising framework
Complete Methodology: Comprehensive solution from noise detection to multi-granularity denoising
Comprehensive Experiments: Five datasets, multiple baselines, detailed ablation studies and parameter analysis
Theoretically Sound: Noise-weighted contrastive learning has clear theoretical motivation
Excellent Performance: Significantly outperforms existing methods on most metrics

Weaknesses

Limited Applicability: Unstable performance on short sequence datasets
Computational Complexity: Multi-granularity denoising and contrastive learning increase computational overhead
Hyperparameter Sensitivity: Requires careful tuning of noise quantity, interest quantity, and other parameters
Noise Generation Strategy: Random noise insertion may not be sufficiently realistic

Impact

Academic Value: Provides new research directions for sequential recommendation denoising
Practical Value: Applicable to real recommendation systems for performance improvement
Reproducibility: Provides detailed implementation details and code

Applicable Scenarios

Recommendation systems with long user interaction sequences
Recommendation scenarios with significant noise (e.g., e-commerce, video platforms)
Applications requiring fine-grained user interest modeling

References

The paper cites important works in sequential recommendation, denoising methods, and contrastive learning, including:

Classical sequential recommendation methods: GRU4Rec, SASRec, BERT4Rec
Denoising-related work: HSD, MSDCCL, etc.
Contrastive learning methods: CL4SRec, ICL, etc.

This paper provides an innovative solution to the noise handling problem in sequential recommendation, with significant value in both theory and practice.