2025-11-17T20:07:13.334490

Weed Out, Then Harvest: Dual Low-Rank Adaptation is an Effective Noisy Label Detector for Noise-Robust Learning

Yuan, Chen, Zhang
Parameter-efficient fine-tuning (PEFT) large language models (LLMs) have shown impressive performance in various downstream tasks. However, in many real-world scenarios, the collected training data inevitably contains noisy labels. To learn from noisy labels, most solutions select samples with small losses for model training. However, the selected samples, in turn, impact the loss computation in the next iteration. An inaccurate initial selection can create a vicious cycle, leading to suboptimal performance. To break this cycle, we propose Delora, a novel framework that decouples the sample selection from model training. For sample selection, Delora establishes a noisy label detector by introducing clean and noisy LoRA. Benefiting from the memory effect, the clean LoRA is encouraged to memorize clean data, while the noisy LoRA is constrained to memorize mislabeled data, which serves as a learnable threshold for selecting clean and noisy samples. For model training, Delora can use carefully selected samples to fine-tune language models seamlessly. Experimental results on synthetic and real-world noisy datasets demonstrate the effectiveness of Delora in noisy label detection and text classification.
academic

Weed Out, Then Harvest: Dual Low-Rank Adaptation is an Effective Noisy Label Detector for Noise-Robust Learning

Basic Information

  • Paper ID: 2510.10208
  • Title: Weed Out, Then Harvest: Dual Low-Rank Adaptation is an Effective Noisy Label Detector for Noise-Robust Learning
  • Authors: Bo Yuan, Yulin Chen, Yin Zhang (Zhejiang University)
  • Category: cs.CL (Computational Linguistics)
  • Publication Date: October 11, 2024
  • Paper Link: https://arxiv.org/abs/2510.10208v1

Abstract

Parameter-efficient fine-tuning (PEFT) of large language models demonstrates excellent performance across various downstream tasks. However, training data in real-world scenarios inevitably contains noisy labels. Existing noisy label learning methods typically select small-loss samples for training, but this selection affects subsequent loss computation, and inaccurate initial selection creates a vicious cycle. This paper proposes the Delora framework, which breaks this cycle by decoupling sample selection from model training. The framework introduces clean LoRA and noisy LoRA to construct a noisy label detector, leveraging memorization effects to enable clean LoRA to memorize clean data and noisy LoRA to memorize mislabeled data, serving as learnable thresholds for sample selection. Experimental results demonstrate the effectiveness of Delora in noisy label detection and text classification tasks.

Research Background and Motivation

Problem Definition

  1. Core Problem: How to handle inevitable noisy labels in training data during parameter-efficient fine-tuning of large language models
  2. Significance: Annotation errors inevitably exist in real-world data collection processes, severely impacting model performance and generalization ability
  3. Limitations of Existing Methods:
    • Traditional small-loss selection strategies suffer from "vicious cycle" problems: sample selection affects loss computation, which in turn affects sample selection
    • Reliance on manually set thresholds limits practical applicability
    • Performance instability in high-noise scenarios

Research Motivation

The authors observe that the fundamental problem with existing methods lies in the coupling relationship between sample selection and model training. They propose a key insight: Can sample selection and model training be decoupled to make them independent? This question inspired the core framework design of this paper.

Core Contributions

  1. Decoupled Framework: First decomposition of noisy label learning into independent sample selection and model training stages, effectively avoiding vicious cycles
  2. Innovative Dual-LoRA Detector: Introduction of clean LoRA and noisy LoRA to separately memorize clean and noisy samples, constructing a learnable noisy label detector
  3. Dynamic Constraint Mechanism: Design of dynamic regularization strategy based on memorization effects to control parameter update patterns of different LoRAs
  4. Comprehensive Experimental Validation: Verification of method effectiveness on synthetic and real noisy datasets, achieving significant improvements in both noisy label detection and text classification tasks

Method Details

Task Definition

Given training dataset D={(xi,yi)}i=1ND=\{(x_i, y_i)\}_{i=1}^N, where y{1,,K}y \in \{1, \ldots, K\} is the observed label, potentially incorrect. The goal is to learn a robust classifier that achieves good generalization performance in the presence of noisy labels.

Model Architecture

The Delora framework comprises two core stages:

Stage 1: Noisy Label Detector Training

Dual-LoRA Design:

  • Clean LoRA (Δwc\Delta w_c): Ideal parameters for memorizing clean samples
  • Noisy LoRA (Δwn\Delta w_n): Noise parameters for memorizing mislabeled samples

Learnable Threshold Mechanism: For the ii-th training sample, the learnable threshold is defined as: ϕi=CE(f(xi,w0+Δwn),yi)\phi_i = CE(f(x_i, w_0 + \Delta w_n), y_i)

Sample selection criterion: Dc={(xi,yi)CE(f(xi,w0+Δwc),yi)<ϕi}D_c = \{(x_i, y_i) | CE(f(x_i, w_0 + \Delta w_c), y_i) < \phi_i\}

Dynamic Constraint Optimization: LLoRA=τ1(t)Δσc+τ2(t)ΔσnL_{LoRA} = \tau_1(t)\Delta\sigma_c + \tau_2(t)\Delta\sigma_n

Where:

  • τ1(t)=th1\tau_1(t) = t^{h_1} (increasing function constraining clean LoRA)
  • τ2(t)=th2\tau_2(t) = t^{-h_2} (decreasing function constraining noisy LoRA)
  • \Delta\sigma_c = ||\Delta w_t_c - \Delta w_{t-1}_c|| (parameter change magnitude)

Detector Optimization Objective: Binary classification using clean probability: pic=eCE(f(xi,w0+Δwc),yi)eCE(f(xi,w0+Δwc),yi)+eCE(f(xi,w0+Δwn),yi)p_i^c = \frac{e^{CE(f(x_i,w_0+\Delta w_c),y_i)}}{e^{CE(f(x_i,w_0+\Delta w_c),y_i)} + e^{CE(f(x_i,w_0+\Delta w_n),y_i)}}

Total optimization objective: L=Lce+LLoRA+LDetectorL = L_{ce} + L_{LoRA} + L_{Detector}

Stage 2: Classifier Model Training

  • Clean Samples: Direct training using cross-entropy loss
  • Noisy Samples: Re-annotated using GPT-4o, trained with reverse cross-entropy loss for robust learning

Technical Innovations

  1. Decoupled Design: Complete separation of sample selection and model training to avoid mutual interference
  2. Memorization Effect Utilization: Clever exploitation of the deep network property of memorizing clean samples before noisy samples
  3. Learnable Threshold: Data-driven threshold prediction using noisy LoRA, eliminating manual hyperparameter tuning
  4. Parameter-Level Functional Separation: Functional separation at parameter level, architecture-agnostic

Experimental Setup

Datasets

Synthetic Noisy Datasets:

  • Trec, SST-2, SST-5, 20ng, AGNews
  • Noise Types: Symmetric (S), Asymmetric (A), Instance-dependent (I)
  • Noise Rates: 20%, 40%

Real Noisy Datasets:

  • Hausa (noise rate 50.37%)
  • Yorùbá (noise rate 33.28%)
  • AlleNoise (noise rate 15.00%)

Evaluation Metrics

  • Noisy Detection Stage: Precision and Recall
  • Classification Stage: Test Accuracy

Baseline Methods

  • Base Model: Llama3.1-8B-Instruct
  • Noisy Learning Methods: Co-Teaching, SelfMix, NoiseAL, CleaR, SENT, LAFT
  • Detection Methods: LLMs-detection, Small-loss strategy

Implementation Details

  • Backbone Model: LLaMA-3.1-8B-Instruct
  • LoRA Rank: r=32
  • Training Epochs: 8 for detector, 6 for classifier
  • Warm-up Epochs: 2
  • Learning Rate: 1e-4, 5e-4

Experimental Results

Main Results

Noisy Label Detection Performance: Significant improvements of Delora over baselines on Trec dataset:

  • 20% Symmetric Noise: Precision 99.47% vs 81.15% (Small-loss)
  • 40% Asymmetric Noise: Recall 97.27% vs 96.20% (Small-loss)

Text Classification Performance:

DatasetNoise SettingBaseNoiseALDelora
Trec20%S95.2097.3098.46
Trec40%A87.4095.9597.40
SST-520%S54.0855.0057.39

Real Noisy Dataset Results:

DatasetNoise RateNoiseALDeloraImprovement
Hausa50.37%52.3460.12+7.78%
Yorùbá33.28%72.1378.56+6.43%

Ablation Study

Ablation studies on Trec dataset demonstrate:

  • Removing Noisy Label Detector (NLD): Significant performance drop (98.46→95.20)
  • Removing Classifier Training (CT): Notable performance decrease
  • Removing optimization objectives (LLoRAL_{LoRA}, LDetectorL_{Detector}, LceL_{ce}): All lead to performance degradation
  • Removing noisy sample re-annotation: ~4% performance drop

Memorization Effect Analysis

Experiments verify memorization patterns of different LoRAs:

  • Clean LoRA: Enhanced memorization of clean samples, reduced memorization of noisy samples
  • Noisy LoRA: Opposite pattern, primarily absorbing negative effects of noisy samples
  • Base Model: Follows memorization effect of memorizing clean samples before noisy samples

Efficiency Analysis

Compared to single LoRA baseline:

  • Parameter Increase: +13.6MB
  • Memory Increase: +3.2GB
  • Performance Improvement: +3.26%~+10%

Parameter and memory efficiency analysis shows that Delora achieves a better Pareto frontier in the three-dimensional accuracy-parameter-memory trade-off space.

Noisy Label Learning

  • Sample Selection Methods: Co-Teaching, SelfMix based on small-loss mechanisms
  • Threshold Setting: Fixed vs. dynamic threshold strategies
  • Limitations: Dependence on training-time models, prone to vicious cycles

Parameter-Efficient Fine-Tuning

  • Main Methods: LoRA, Adapter, Prompt tuning
  • Noise Robustness: Methods like CleaR exploring PEFT performance in noisy environments
  • This Paper's Contribution: Leveraging limited capacity of PEFT to separately memorize clean and noisy samples

Conclusions and Discussion

Main Conclusions

  1. Decoupling sample selection and model training effectively avoids vicious cycles in noisy label learning
  2. Dual-LoRA design combined with memorization effects effectively distinguishes clean and noisy samples
  3. The method demonstrates excellent performance across various noise settings and real datasets with good generalization ability

Limitations

  1. Scale Limitations: Resource constraints prevent validation on larger language models (e.g., Llama-3.2 70B)
  2. Task Limitations: Experiments limited to text classification, not exploring other tasks like text generation
  3. Computational Overhead: Dual-LoRA design introduces additional parameters and computational costs

Future Directions

  1. Extension to larger-scale language models
  2. Exploration of applications in text generation tasks
  3. Further optimization of computational and parameter efficiency

In-Depth Evaluation

Strengths

  1. Strong Novelty:
    • First framework decoupling sample selection and model training, fundamentally solving vicious cycle problems
    • Dual-LoRA design cleverly leverages memorization effects for parameter-level functional separation
  2. Solid Theoretical Foundation:
    • Theoretical support based on deep network memorization effects
    • Clear mathematical derivations and reasonable optimization objectives
  3. Comprehensive Experiments:
    • Coverage of multiple noise types and rates
    • Inclusion of both synthetic and real noisy datasets
    • Detailed ablation studies and analysis
  4. High Practical Value:
    • No manual threshold tuning required
    • Adaptable to different classifier models
    • Excellent performance in high-noise scenarios

Weaknesses

  1. Computational Complexity:
    • Two-stage training increases training time
    • Dual-LoRA design increases parameter count and memory consumption
  2. Hyperparameter Sensitivity:
    • h1h_1 and h2h_2 in dynamic constraint functions require adjustment for different noise rates
    • Lack of adaptive hyperparameter selection strategy
  3. Insufficient Theoretical Analysis:
    • Lack of convergence guarantees
    • No theoretical bounds on noise detection accuracy
  4. Limited Applicability:
    • Primarily focused on text classification
    • Effectiveness on other NLP tasks unverified

Impact

  1. Academic Contribution:
    • Provides new perspective for noisy label learning research
    • Advances application of PEFT methods in robust learning
  2. Practical Value:
    • Directly applicable to real-world text classification tasks
    • Provides effective tools for handling real-world noisy data
  3. Reproducibility:
    • Detailed implementation details and hyperparameter settings provided
    • Clear algorithm description facilitates reproduction

Applicable Scenarios

  1. Text Classification Tasks: Particularly suitable for large-scale text classification with poor annotation quality
  2. Resource-Constrained Environments: PEFT characteristics make it suitable for applications with limited computational resources
  3. High-Noise Environments: Particularly outstanding performance in scenarios with high noise rates (>40%)
  4. Multilingual Applications: Potential applications in text classification for low-resource languages

References

This paper cites important literature in noisy label learning and parameter-efficient fine-tuning, including:

  • Han et al. (2018) - Co-Teaching method
  • Hu et al. (2022) - LoRA method
  • Kim et al. (2024) - CleaR method
  • Yuan et al. (2024) - NoiseAL method

Overall Assessment: This is a high-quality research paper that proposes an innovative solution in the noisy label learning domain. Through clever decoupled design and dual-LoRA mechanism, it effectively addresses core problems of existing methods. Experiments are comprehensive and results convincing. Despite some limitations, its novelty and practical value make it an important contribution to the field.