Parameter-efficient fine-tuning (PEFT) large language models (LLMs) have shown impressive performance in various downstream tasks. However, in many real-world scenarios, the collected training data inevitably contains noisy labels. To learn from noisy labels, most solutions select samples with small losses for model training. However, the selected samples, in turn, impact the loss computation in the next iteration. An inaccurate initial selection can create a vicious cycle, leading to suboptimal performance. To break this cycle, we propose Delora, a novel framework that decouples the sample selection from model training. For sample selection, Delora establishes a noisy label detector by introducing clean and noisy LoRA. Benefiting from the memory effect, the clean LoRA is encouraged to memorize clean data, while the noisy LoRA is constrained to memorize mislabeled data, which serves as a learnable threshold for selecting clean and noisy samples. For model training, Delora can use carefully selected samples to fine-tune language models seamlessly. Experimental results on synthetic and real-world noisy datasets demonstrate the effectiveness of Delora in noisy label detection and text classification.
- Paper ID: 2510.10208
- Title: Weed Out, Then Harvest: Dual Low-Rank Adaptation is an Effective Noisy Label Detector for Noise-Robust Learning
- Authors: Bo Yuan, Yulin Chen, Yin Zhang (Zhejiang University)
- Category: cs.CL (Computational Linguistics)
- Publication Date: October 11, 2024
- Paper Link: https://arxiv.org/abs/2510.10208v1
Parameter-efficient fine-tuning (PEFT) of large language models demonstrates excellent performance across various downstream tasks. However, training data in real-world scenarios inevitably contains noisy labels. Existing noisy label learning methods typically select small-loss samples for training, but this selection affects subsequent loss computation, and inaccurate initial selection creates a vicious cycle. This paper proposes the Delora framework, which breaks this cycle by decoupling sample selection from model training. The framework introduces clean LoRA and noisy LoRA to construct a noisy label detector, leveraging memorization effects to enable clean LoRA to memorize clean data and noisy LoRA to memorize mislabeled data, serving as learnable thresholds for sample selection. Experimental results demonstrate the effectiveness of Delora in noisy label detection and text classification tasks.
- Core Problem: How to handle inevitable noisy labels in training data during parameter-efficient fine-tuning of large language models
- Significance: Annotation errors inevitably exist in real-world data collection processes, severely impacting model performance and generalization ability
- Limitations of Existing Methods:
- Traditional small-loss selection strategies suffer from "vicious cycle" problems: sample selection affects loss computation, which in turn affects sample selection
- Reliance on manually set thresholds limits practical applicability
- Performance instability in high-noise scenarios
The authors observe that the fundamental problem with existing methods lies in the coupling relationship between sample selection and model training. They propose a key insight: Can sample selection and model training be decoupled to make them independent? This question inspired the core framework design of this paper.
- Decoupled Framework: First decomposition of noisy label learning into independent sample selection and model training stages, effectively avoiding vicious cycles
- Innovative Dual-LoRA Detector: Introduction of clean LoRA and noisy LoRA to separately memorize clean and noisy samples, constructing a learnable noisy label detector
- Dynamic Constraint Mechanism: Design of dynamic regularization strategy based on memorization effects to control parameter update patterns of different LoRAs
- Comprehensive Experimental Validation: Verification of method effectiveness on synthetic and real noisy datasets, achieving significant improvements in both noisy label detection and text classification tasks
Given training dataset D={(xi,yi)}i=1N, where y∈{1,…,K} is the observed label, potentially incorrect. The goal is to learn a robust classifier that achieves good generalization performance in the presence of noisy labels.
The Delora framework comprises two core stages:
Dual-LoRA Design:
- Clean LoRA (Δwc): Ideal parameters for memorizing clean samples
- Noisy LoRA (Δwn): Noise parameters for memorizing mislabeled samples
Learnable Threshold Mechanism:
For the i-th training sample, the learnable threshold is defined as:
ϕi=CE(f(xi,w0+Δwn),yi)
Sample selection criterion:
Dc={(xi,yi)∣CE(f(xi,w0+Δwc),yi)<ϕi}
Dynamic Constraint Optimization:
LLoRA=τ1(t)Δσc+τ2(t)Δσn
Where:
- τ1(t)=th1 (increasing function constraining clean LoRA)
- τ2(t)=t−h2 (decreasing function constraining noisy LoRA)
- \Delta\sigma_c = ||\Delta w_t_c - \Delta w_{t-1}_c|| (parameter change magnitude)
Detector Optimization Objective:
Binary classification using clean probability:
pic=eCE(f(xi,w0+Δwc),yi)+eCE(f(xi,w0+Δwn),yi)eCE(f(xi,w0+Δwc),yi)
Total optimization objective: L=Lce+LLoRA+LDetector
- Clean Samples: Direct training using cross-entropy loss
- Noisy Samples: Re-annotated using GPT-4o, trained with reverse cross-entropy loss for robust learning
- Decoupled Design: Complete separation of sample selection and model training to avoid mutual interference
- Memorization Effect Utilization: Clever exploitation of the deep network property of memorizing clean samples before noisy samples
- Learnable Threshold: Data-driven threshold prediction using noisy LoRA, eliminating manual hyperparameter tuning
- Parameter-Level Functional Separation: Functional separation at parameter level, architecture-agnostic
Synthetic Noisy Datasets:
- Trec, SST-2, SST-5, 20ng, AGNews
- Noise Types: Symmetric (S), Asymmetric (A), Instance-dependent (I)
- Noise Rates: 20%, 40%
Real Noisy Datasets:
- Hausa (noise rate 50.37%)
- Yorùbá (noise rate 33.28%)
- AlleNoise (noise rate 15.00%)
- Noisy Detection Stage: Precision and Recall
- Classification Stage: Test Accuracy
- Base Model: Llama3.1-8B-Instruct
- Noisy Learning Methods: Co-Teaching, SelfMix, NoiseAL, CleaR, SENT, LAFT
- Detection Methods: LLMs-detection, Small-loss strategy
- Backbone Model: LLaMA-3.1-8B-Instruct
- LoRA Rank: r=32
- Training Epochs: 8 for detector, 6 for classifier
- Warm-up Epochs: 2
- Learning Rate: 1e-4, 5e-4
Noisy Label Detection Performance:
Significant improvements of Delora over baselines on Trec dataset:
- 20% Symmetric Noise: Precision 99.47% vs 81.15% (Small-loss)
- 40% Asymmetric Noise: Recall 97.27% vs 96.20% (Small-loss)
Text Classification Performance:
| Dataset | Noise Setting | Base | NoiseAL | Delora |
|---|
| Trec | 20%S | 95.20 | 97.30 | 98.46 |
| Trec | 40%A | 87.40 | 95.95 | 97.40 |
| SST-5 | 20%S | 54.08 | 55.00 | 57.39 |
Real Noisy Dataset Results:
| Dataset | Noise Rate | NoiseAL | Delora | Improvement |
|---|
| Hausa | 50.37% | 52.34 | 60.12 | +7.78% |
| Yorùbá | 33.28% | 72.13 | 78.56 | +6.43% |
Ablation studies on Trec dataset demonstrate:
- Removing Noisy Label Detector (NLD): Significant performance drop (98.46→95.20)
- Removing Classifier Training (CT): Notable performance decrease
- Removing optimization objectives (LLoRA, LDetector, Lce): All lead to performance degradation
- Removing noisy sample re-annotation: ~4% performance drop
Experiments verify memorization patterns of different LoRAs:
- Clean LoRA: Enhanced memorization of clean samples, reduced memorization of noisy samples
- Noisy LoRA: Opposite pattern, primarily absorbing negative effects of noisy samples
- Base Model: Follows memorization effect of memorizing clean samples before noisy samples
Compared to single LoRA baseline:
- Parameter Increase: +13.6MB
- Memory Increase: +3.2GB
- Performance Improvement: +3.26%~+10%
Parameter and memory efficiency analysis shows that Delora achieves a better Pareto frontier in the three-dimensional accuracy-parameter-memory trade-off space.
- Sample Selection Methods: Co-Teaching, SelfMix based on small-loss mechanisms
- Threshold Setting: Fixed vs. dynamic threshold strategies
- Limitations: Dependence on training-time models, prone to vicious cycles
- Main Methods: LoRA, Adapter, Prompt tuning
- Noise Robustness: Methods like CleaR exploring PEFT performance in noisy environments
- This Paper's Contribution: Leveraging limited capacity of PEFT to separately memorize clean and noisy samples
- Decoupling sample selection and model training effectively avoids vicious cycles in noisy label learning
- Dual-LoRA design combined with memorization effects effectively distinguishes clean and noisy samples
- The method demonstrates excellent performance across various noise settings and real datasets with good generalization ability
- Scale Limitations: Resource constraints prevent validation on larger language models (e.g., Llama-3.2 70B)
- Task Limitations: Experiments limited to text classification, not exploring other tasks like text generation
- Computational Overhead: Dual-LoRA design introduces additional parameters and computational costs
- Extension to larger-scale language models
- Exploration of applications in text generation tasks
- Further optimization of computational and parameter efficiency
- Strong Novelty:
- First framework decoupling sample selection and model training, fundamentally solving vicious cycle problems
- Dual-LoRA design cleverly leverages memorization effects for parameter-level functional separation
- Solid Theoretical Foundation:
- Theoretical support based on deep network memorization effects
- Clear mathematical derivations and reasonable optimization objectives
- Comprehensive Experiments:
- Coverage of multiple noise types and rates
- Inclusion of both synthetic and real noisy datasets
- Detailed ablation studies and analysis
- High Practical Value:
- No manual threshold tuning required
- Adaptable to different classifier models
- Excellent performance in high-noise scenarios
- Computational Complexity:
- Two-stage training increases training time
- Dual-LoRA design increases parameter count and memory consumption
- Hyperparameter Sensitivity:
- h1 and h2 in dynamic constraint functions require adjustment for different noise rates
- Lack of adaptive hyperparameter selection strategy
- Insufficient Theoretical Analysis:
- Lack of convergence guarantees
- No theoretical bounds on noise detection accuracy
- Limited Applicability:
- Primarily focused on text classification
- Effectiveness on other NLP tasks unverified
- Academic Contribution:
- Provides new perspective for noisy label learning research
- Advances application of PEFT methods in robust learning
- Practical Value:
- Directly applicable to real-world text classification tasks
- Provides effective tools for handling real-world noisy data
- Reproducibility:
- Detailed implementation details and hyperparameter settings provided
- Clear algorithm description facilitates reproduction
- Text Classification Tasks: Particularly suitable for large-scale text classification with poor annotation quality
- Resource-Constrained Environments: PEFT characteristics make it suitable for applications with limited computational resources
- High-Noise Environments: Particularly outstanding performance in scenarios with high noise rates (>40%)
- Multilingual Applications: Potential applications in text classification for low-resource languages
This paper cites important literature in noisy label learning and parameter-efficient fine-tuning, including:
- Han et al. (2018) - Co-Teaching method
- Hu et al. (2022) - LoRA method
- Kim et al. (2024) - CleaR method
- Yuan et al. (2024) - NoiseAL method
Overall Assessment: This is a high-quality research paper that proposes an innovative solution in the noisy label learning domain. Through clever decoupled design and dual-LoRA mechanism, it effectively addresses core problems of existing methods. Experiments are comprehensive and results convincing. Despite some limitations, its novelty and practical value make it an important contribution to the field.