2025-11-17T20:07:13.334490

Weed Out, Then Harvest: Dual Low-Rank Adaptation is an Effective Noisy Label Detector for Noise-Robust Learning

Yuan, Chen, Zhang

Parameter-efficient fine-tuning (PEFT) large language models (LLMs) have shown impressive performance in various downstream tasks. However, in many real-world scenarios, the collected training data inevitably contains noisy labels. To learn from noisy labels, most solutions select samples with small losses for model training. However, the selected samples, in turn, impact the loss computation in the next iteration. An inaccurate initial selection can create a vicious cycle, leading to suboptimal performance. To break this cycle, we propose Delora, a novel framework that decouples the sample selection from model training. For sample selection, Delora establishes a noisy label detector by introducing clean and noisy LoRA. Benefiting from the memory effect, the clean LoRA is encouraged to memorize clean data, while the noisy LoRA is constrained to memorize mislabeled data, which serves as a learnable threshold for selecting clean and noisy samples. For model training, Delora can use carefully selected samples to fine-tune language models seamlessly. Experimental results on synthetic and real-world noisy datasets demonstrate the effectiveness of Delora in noisy label detection and text classification.

academic

Weed Out, Then Harvest: Dual Low-Rank Adaptation is an Effective Noisy Label Detector for Noise-Robust Learning

Basic Information

Paper ID: 2510.10208
Title: Weed Out, Then Harvest: Dual Low-Rank Adaptation is an Effective Noisy Label Detector for Noise-Robust Learning
Authors: Bo Yuan, Yulin Chen, Yin Zhang (Zhejiang University)
Category: cs.CL (Computational Linguistics)
Publication Date: October 11, 2024
Paper Link: https://arxiv.org/abs/2510.10208v1

Abstract

Parameter-efficient fine-tuning (PEFT) of large language models demonstrates excellent performance across various downstream tasks. However, training data in real-world scenarios inevitably contains noisy labels. Existing noisy label learning methods typically select small-loss samples for training, but this selection affects subsequent loss computation, and inaccurate initial selection creates a vicious cycle. This paper proposes the Delora framework, which breaks this cycle by decoupling sample selection from model training. The framework introduces clean LoRA and noisy LoRA to construct a noisy label detector, leveraging memorization effects to enable clean LoRA to memorize clean data and noisy LoRA to memorize mislabeled data, serving as learnable thresholds for sample selection. Experimental results demonstrate the effectiveness of Delora in noisy label detection and text classification tasks.

Research Background and Motivation

Problem Definition

Core Problem: How to handle inevitable noisy labels in training data during parameter-efficient fine-tuning of large language models
Significance: Annotation errors inevitably exist in real-world data collection processes, severely impacting model performance and generalization ability
Limitations of Existing Methods:
- Traditional small-loss selection strategies suffer from "vicious cycle" problems: sample selection affects loss computation, which in turn affects sample selection
- Reliance on manually set thresholds limits practical applicability
- Performance instability in high-noise scenarios

Research Motivation

The authors observe that the fundamental problem with existing methods lies in the coupling relationship between sample selection and model training. They propose a key insight: Can sample selection and model training be decoupled to make them independent? This question inspired the core framework design of this paper.

Core Contributions

Decoupled Framework: First decomposition of noisy label learning into independent sample selection and model training stages, effectively avoiding vicious cycles
Innovative Dual-LoRA Detector: Introduction of clean LoRA and noisy LoRA to separately memorize clean and noisy samples, constructing a learnable noisy label detector
Dynamic Constraint Mechanism: Design of dynamic regularization strategy based on memorization effects to control parameter update patterns of different LoRAs
Comprehensive Experimental Validation: Verification of method effectiveness on synthetic and real noisy datasets, achieving significant improvements in both noisy label detection and text classification tasks

Method Details

Task Definition

Given training dataset $D=\{(x_i, y_i)\}_{i=1}^N$ , where $y \in \{1, \ldots, K\}$ is the observed label, potentially incorrect. The goal is to learn a robust classifier that achieves good generalization performance in the presence of noisy labels.

Model Architecture

The Delora framework comprises two core stages:

Stage 1: Noisy Label Detector Training

Dual-LoRA Design:

Clean LoRA ( $\Delta w_c$ ): Ideal parameters for memorizing clean samples
Noisy LoRA ( $\Delta w_n$ ): Noise parameters for memorizing mislabeled samples

Learnable Threshold Mechanism: For the $i$ -th training sample, the learnable threshold is defined as: $\phi_i = CE(f(x_i, w_0 + \Delta w_n), y_i)$

Sample selection criterion: $D_c = \{(x_i, y_i) | CE(f(x_i, w_0 + \Delta w_c), y_i) < \phi_i\}$

Dynamic Constraint Optimization: $L_{LoRA} = \tau_1(t)\Delta\sigma_c + \tau_2(t)\Delta\sigma_n$

Where:

$\tau_1(t) = t^{h_1}$ (increasing function constraining clean LoRA)
$\tau_2(t) = t^{-h_2}$ (decreasing function constraining noisy LoRA)
$\Delta\sigma_c = ||\Delta w_t_c - \Delta w_{t-1}_c||$ (parameter change magnitude)

Detector Optimization Objective: Binary classification using clean probability: $p_i^c = \frac{e^{CE(f(x_i,w_0+\Delta w_c),y_i)}}{e^{CE(f(x_i,w_0+\Delta w_c),y_i)} + e^{CE(f(x_i,w_0+\Delta w_n),y_i)}}$

Total optimization objective: $L = L_{ce} + L_{LoRA} + L_{Detector}$

Stage 2: Classifier Model Training

Clean Samples: Direct training using cross-entropy loss
Noisy Samples: Re-annotated using GPT-4o, trained with reverse cross-entropy loss for robust learning

Technical Innovations

Decoupled Design: Complete separation of sample selection and model training to avoid mutual interference
Memorization Effect Utilization: Clever exploitation of the deep network property of memorizing clean samples before noisy samples
Learnable Threshold: Data-driven threshold prediction using noisy LoRA, eliminating manual hyperparameter tuning
Parameter-Level Functional Separation: Functional separation at parameter level, architecture-agnostic

Experimental Setup

Datasets

Synthetic Noisy Datasets:

Trec, SST-2, SST-5, 20ng, AGNews
Noise Types: Symmetric (S), Asymmetric (A), Instance-dependent (I)
Noise Rates: 20%, 40%

Real Noisy Datasets:

Hausa (noise rate 50.37%)
Yorùbá (noise rate 33.28%)
AlleNoise (noise rate 15.00%)

Evaluation Metrics

Noisy Detection Stage: Precision and Recall
Classification Stage: Test Accuracy

Baseline Methods

Base Model: Llama3.1-8B-Instruct
Noisy Learning Methods: Co-Teaching, SelfMix, NoiseAL, CleaR, SENT, LAFT
Detection Methods: LLMs-detection, Small-loss strategy

Implementation Details

Backbone Model: LLaMA-3.1-8B-Instruct
LoRA Rank: r=32
Training Epochs: 8 for detector, 6 for classifier
Warm-up Epochs: 2
Learning Rate: 1e-4, 5e-4

Experimental Results

Main Results

Noisy Label Detection Performance: Significant improvements of Delora over baselines on Trec dataset:

20% Symmetric Noise: Precision 99.47% vs 81.15% (Small-loss)
40% Asymmetric Noise: Recall 97.27% vs 96.20% (Small-loss)

Text Classification Performance:

Dataset	Noise Setting	Base	NoiseAL	Delora
Trec	20%S	95.20	97.30	98.46
Trec	40%A	87.40	95.95	97.40
SST-5	20%S	54.08	55.00	57.39

Real Noisy Dataset Results:

Dataset	Noise Rate	NoiseAL	Delora	Improvement
Hausa	50.37%	52.34	60.12	+7.78%
Yorùbá	33.28%	72.13	78.56	+6.43%

Ablation Study

Ablation studies on Trec dataset demonstrate:

Removing Noisy Label Detector (NLD): Significant performance drop (98.46→95.20)
Removing Classifier Training (CT): Notable performance decrease
Removing optimization objectives ( $L_{LoRA}$ , $L_{Detector}$ , $L_{ce}$ ): All lead to performance degradation
Removing noisy sample re-annotation: ~4% performance drop

Memorization Effect Analysis

Experiments verify memorization patterns of different LoRAs:

Clean LoRA: Enhanced memorization of clean samples, reduced memorization of noisy samples
Noisy LoRA: Opposite pattern, primarily absorbing negative effects of noisy samples
Base Model: Follows memorization effect of memorizing clean samples before noisy samples

Efficiency Analysis

Compared to single LoRA baseline:

Parameter Increase: +13.6MB
Memory Increase: +3.2GB
Performance Improvement: +3.26%~+10%

Parameter and memory efficiency analysis shows that Delora achieves a better Pareto frontier in the three-dimensional accuracy-parameter-memory trade-off space.

Noisy Label Learning

Sample Selection Methods: Co-Teaching, SelfMix based on small-loss mechanisms
Threshold Setting: Fixed vs. dynamic threshold strategies
Limitations: Dependence on training-time models, prone to vicious cycles

Parameter-Efficient Fine-Tuning

Main Methods: LoRA, Adapter, Prompt tuning
Noise Robustness: Methods like CleaR exploring PEFT performance in noisy environments
This Paper's Contribution: Leveraging limited capacity of PEFT to separately memorize clean and noisy samples

Conclusions and Discussion

Main Conclusions

Decoupling sample selection and model training effectively avoids vicious cycles in noisy label learning
Dual-LoRA design combined with memorization effects effectively distinguishes clean and noisy samples
The method demonstrates excellent performance across various noise settings and real datasets with good generalization ability

Limitations

Scale Limitations: Resource constraints prevent validation on larger language models (e.g., Llama-3.2 70B)
Task Limitations: Experiments limited to text classification, not exploring other tasks like text generation
Computational Overhead: Dual-LoRA design introduces additional parameters and computational costs

Future Directions

Extension to larger-scale language models
Exploration of applications in text generation tasks
Further optimization of computational and parameter efficiency

In-Depth Evaluation

Strengths

Strong Novelty:
- First framework decoupling sample selection and model training, fundamentally solving vicious cycle problems
- Dual-LoRA design cleverly leverages memorization effects for parameter-level functional separation
Solid Theoretical Foundation:
- Theoretical support based on deep network memorization effects
- Clear mathematical derivations and reasonable optimization objectives
Comprehensive Experiments:
- Coverage of multiple noise types and rates
- Inclusion of both synthetic and real noisy datasets
- Detailed ablation studies and analysis
High Practical Value:
- No manual threshold tuning required
- Adaptable to different classifier models
- Excellent performance in high-noise scenarios

Weaknesses

Computational Complexity:
- Two-stage training increases training time
- Dual-LoRA design increases parameter count and memory consumption
Hyperparameter Sensitivity:
- $h_1$ and $h_2$ in dynamic constraint functions require adjustment for different noise rates
- Lack of adaptive hyperparameter selection strategy
Insufficient Theoretical Analysis:
- Lack of convergence guarantees
- No theoretical bounds on noise detection accuracy
Limited Applicability:
- Primarily focused on text classification
- Effectiveness on other NLP tasks unverified

Impact

Academic Contribution:
- Provides new perspective for noisy label learning research
- Advances application of PEFT methods in robust learning
Practical Value:
- Directly applicable to real-world text classification tasks
- Provides effective tools for handling real-world noisy data
Reproducibility:
- Detailed implementation details and hyperparameter settings provided
- Clear algorithm description facilitates reproduction

Applicable Scenarios

Text Classification Tasks: Particularly suitable for large-scale text classification with poor annotation quality
Resource-Constrained Environments: PEFT characteristics make it suitable for applications with limited computational resources
High-Noise Environments: Particularly outstanding performance in scenarios with high noise rates (>40%)
Multilingual Applications: Potential applications in text classification for low-resource languages

References

This paper cites important literature in noisy label learning and parameter-efficient fine-tuning, including:

Han et al. (2018) - Co-Teaching method
Hu et al. (2022) - LoRA method
Kim et al. (2024) - CleaR method
Yuan et al. (2024) - NoiseAL method

Overall Assessment: This is a high-quality research paper that proposes an innovative solution in the noisy label learning domain. Through clever decoupled design and dual-LoRA mechanism, it effectively addresses core problems of existing methods. Experiments are comprehensive and results convincing. Despite some limitations, its novelty and practical value make it an important contribution to the field.