Label corruption, where training samples are mislabeled due to non-expert annotation or adversarial attacks, significantly degrades model performance. Acquiring large, perfectly labeled datasets is costly, and retraining models from scratch is computationally expensive. To address this, we introduce Scaled Activation Projection (SAP), a novel SVD (Singular Value Decomposition)-based corrective machine unlearning algorithm. SAP mitigates label noise by identifying a small subset of trusted samples using cross-entropy loss and projecting model weights onto a clean activation space estimated using SVD on these trusted samples. This process suppresses the noise introduced in activations due to the mislabeled samples. In our experiments, we demonstrate SAP's effectiveness on synthetic noise with different settings and real-world label noise. SAP applied to the CIFAR dataset with 25% synthetic corruption show upto 6% generalization improvements. Additionally, SAP can improve the generalization over noise robust training approaches on CIFAR dataset by ~3.2% on average. Further, we observe generalization improvements of 2.31% for a Vision Transformer model trained on naturally corrupted Clothing1M.
- Paper ID: 2403.08618
- Title: SAP: Corrective Machine Unlearning with Scaled Activation Projection for Label Noise Robustness
- Authors: Sangamesh Kodge, Deepak Ravikumar, Gobinda Saha, Kaushik Roy (Purdue University)
- Classification: cs.LG cs.AI stat.ML
- Publication Date: January 2, 2025 (arXiv v2)
- Paper Link: https://arxiv.org/abs/2403.08618
- Code Link: https://github.com/sangamesh-kodge/SAP.git
Label corruption is a critical problem in deep learning, where incorrectly labeled training samples resulting from non-expert annotation or adversarial attacks significantly degrade model performance. Acquiring large-scale perfectly labeled datasets is costly, and retraining models from scratch incurs substantial computational overhead. To address this, we propose Scaled Activation Projection (SAP), a corrective machine unlearning algorithm based on Singular Value Decomposition (SVD). SAP mitigates label noise by identifying a small set of trustworthy samples using cross-entropy loss and projecting model weights into a clean activation space estimated from these trustworthy samples using SVD. Experiments demonstrate that SAP achieves up to 6% generalization improvement under 25% synthetic corruption on CIFAR datasets, provides an average improvement of approximately 3.2% over noise-robust training methods, and achieves 2.31% generalization improvement on Vision Transformer models on the naturally corrupted Clothing1M dataset.
- Label Noise Problem: Label errors are prevalent in large-scale datasets, originating from:
- Manual annotation errors
- Misclassifications by automated labeling systems (e.g., large language models)
- Malicious data poisoning attacks
- Limitations of Existing Solutions:
- Data Cleaning Methods: Require model retraining with high computational costs
- Noise-Robust Training: Improves robustness but cannot completely eliminate performance gaps
- Traditional Machine Unlearning: Requires explicit distinction between mislabeled and hard-to-learn samples, difficult in practical applications
- Research Motivation:
- Avoid high computational costs of retraining from scratch
- No need for explicit identification of mislabeled samples
- Achieve efficient noise mitigation through single-step weight updates
- Proposed SAP Algorithm: An SVD-based corrective machine unlearning algorithm that mitigates label noise effects through activation projection
- Automated Trustworthy Sample Selection: Automatically identifies trustworthy samples using cross-entropy loss, eliminating manual annotation
- Single-Step Weight Update: Achieves efficient model correction through one SVD computation and weight projection
- Comprehensive Experimental Validation: Verifies effectiveness in synthetic and real-world noise scenarios, supporting multiple model architectures
Given a training dataset DTr containing label noise, the objective is to correct the parameters θ∗ of a trained model such that its generalization performance on the test set approaches that of a model trained on clean data, without requiring retraining.
For a linear layer aout=ainWT, SAP projects input activations through an activation alignment matrix Wp:
a^out=(ainWp)WT=ain(WWpT)T=ainW^T
The weight update rule is: W^=WWpT
Selects NTrust samples with the lowest cross-entropy loss as the trustworthy set:
DTrust=argminS∑(xi,yi)∈SL(θ∗,xi,yi)
where S={Si⊆DTr∣∣Si∣=NTrust}
- Linear Layers: Rlinear=[(aiin)i=1NTrust]
- Convolutional Layers: Converts convolution to matrix multiplication via unfold operation, Rconv=[(unfold(aiin)T)i=1NTrust]
Performs SVD decomposition on the representation matrix: Rl=UlΣlVlT
Computes importance weights:
λi=(α−1)σ~i+1ασ~i
where σ~i=σi2/∑j=1dσj2 is the normalized singular value, and α is the scaling coefficient.
Constructs the projection matrix: Wp=UΛUT, where Λ=diag(λ1,λ2,...,λd)
- Automated Processing: No manual identification of erroneous samples required; trustworthy samples are automatically selected via loss function
- Efficient Updates: Single SVD computation and matrix multiplication complete weight updates, avoiding iterative optimization
- Activation Space Projection: Suppresses the influence of noisy activations by projecting into clean activation space
- Architecture Agnostic: Applicable to linear and convolutional layers, supporting diverse network architectures
- Synthetic Noise Datasets:
- CIFAR-10/CIFAR-100
- Three noise types: symmetric noise, asymmetric noise, hierarchical noise
- Noise intensities: 10% and 25%
- Real-World Noise Datasets:
- Test set accuracy
- Performance comparison with baseline methods
- Generalization improvement magnitude
- Retrain: Ideal model retrained on clean data
- Vanilla: Baseline model trained on noisy data
- Finetune: Fine-tuned on limited clean data
- SSD: Unlearning algorithm based on selective synaptic suppression
- SCRUB: State-of-the-art machine unlearning algorithm
- Number of trustworthy samples: 1,000
- Scaling coefficient α search range: 2000, 300000
- Model architectures: VGG11, ResNet18, ResNet50, ViT-B/16
- Optimizer: SGD, learning rate 0.01, weight decay 5×10^-4
Results on CIFAR-10 and CIFAR-100 datasets demonstrate:
| Dataset | Noise Level | Vanilla | SAP | Improvement |
|---|
| CIFAR-10 | 25% | 76.68±0.48 | 82.27±0.15 | +5.59% |
| CIFAR-100 | 25% | 50.64±0.60 | 53.31±0.78 | +2.67% |
SAP outperforms other unlearning methods across all noise settings, with average improvements of 1.36% (CIFAR-10) and 0.39% (CIFAR-100).
SAP further improves the performance of existing noise-robust methods:
| Method | CIFAR-10 Baseline | SAP Enhanced | Improvement |
|---|
| MixUp | 83.12±0.44 | 86.45±0.52 | +3.33% |
| SAM | 83.29±0.28 | 87.29±0.08 | +4.0% |
| Average | 83.69 | 87.14 | +3.45% |
Results on real-world noise datasets:
| Dataset | Model | Vanilla | SAP | Improvement |
|---|
| Clothing1M | ResNet50 | 67.48±0.64 | 69.64±0.57 | +2.16% |
| Clothing1M | ViT-B/16 | 69.12±0.45 | 71.43±0.60 | +2.31% |
Experiments show diminishing returns when increasing trustworthy samples beyond 1,000, thus 1,000 samples were selected to balance performance and computational efficiency.
α=30000 demonstrates optimal performance across various synthetic noise settings; both larger and smaller α values degrade performance.
- Computational Efficiency: SAP requires only 16 hyperparameter searches, while SCRUB requires 675
- Robustness: Demonstrates stable performance across different noise types and intensities
- Scalability: Successfully applied to large-scale datasets and Transformer models
- Decision Boundary Optimization: Visualization experiments show SAP smooths decision boundaries, reducing overfitting
- Data Cleaning Methods:
- Data filtering: Removing mislabeled samples
- Sample selection: Dynamically selecting training samples
- Label correction: Correcting incorrect labels
- Noise-Robust Training:
- Regularization techniques: Dropout, label smoothing
- Robust loss functions: Symmetric cross-entropy, MAE
- Data augmentation: MixUp, MentorMix
- Corrective Machine Unlearning:
- Traditional unlearning focuses on privacy protection
- Corrective unlearning focuses on improving generalization performance
Compared to existing methods, SAP offers:
- No need for explicit identification of erroneous samples
- Single update avoids instability of iterative optimization
- Simple hyperparameter tuning and high computational efficiency
- Effectiveness Verification: SAP significantly improves model generalization performance in both synthetic and real-world noise scenarios
- Efficiency Advantages: Single-step weight updates and simple hyperparameter tuning provide significant computational advantages
- Broad Applicability: Supports multiple network architectures and dataset scales
- Practical Value: Can be combined with existing noise-robust methods for further performance improvement
- Trustworthy Sample Assumption: Relies on the assumption that low-loss samples are indeed correctly labeled
- Hyperparameter Sensitivity: The choice of scaling coefficient α significantly impacts performance
- Noise Type Constraints: Primarily targets label noise; limited effectiveness on other noise types
- Insufficient Theoretical Analysis: Lacks theoretical guarantees for method effectiveness
- Theoretical Analysis: Establish theoretical foundations for SAP effectiveness
- Adaptive Parameter Selection: Develop methods for automatically selecting optimal α
- Extended Applications: Explore applications to other noise types and tasks
- Integration with Other Techniques: Investigate combinations with data augmentation, adversarial training, etc.
- Method Innovation:
- First application of SVD to corrective machine unlearning
- Novel and effective activation projection concept
- Automated trustworthy sample selection eliminates manual intervention
- Experimental Comprehensiveness:
- Covers multiple noise types and datasets
- Comparison with multiple baseline methods
- Includes ablation studies and parameter sensitivity analysis
- Practical Value:
- High computational efficiency, easy to deploy
- Can be combined with existing methods
- Supports multiple network architectures
- Result Convincingness:
- Consistent performance improvements
- Statistical significance verification
- Visualization analysis enhances understanding
- Weak Theoretical Foundation:
- Lacks theoretical analysis of method effectiveness
- Does not explain why SVD projection effectively suppresses noise
- Assumption Limitations:
- Assumption that low-loss samples are correctly labeled may not always hold
- Strong assumptions about noise distribution
- Parameter Adjustment:
- Lack of theoretical guidance for α selection
- Different datasets may require different α values
- Limited Comparisons:
- Insufficient comparison with latest noise-robust methods
- Lacks direct comparison with data cleaning methods
- Academic Contribution:
- Provides new research direction for machine unlearning field
- Activation projection concept may inspire other applications
- Practical Application:
- Provides practical tools for handling real-world label noise
- Can be integrated into existing training pipelines
- Reproducibility:
- Provides complete code implementation
- Detailed experimental setup description
- Scenarios with low-quality dataset labels
- Situations where data re-annotation is impossible
- Applications requiring rapid correction of trained models
- Environments with limited computational resources
The paper cites important works in related fields, including:
- Machine Unlearning: SCRUB, SSD, and other methods
- Label Noise Handling: MixUp, MentorMix, SAM, etc.
- Data Cleaning: Confident Learning, etc.
- Foundational Theory: SVD decomposition, activation analysis, etc.
Overall Assessment: The proposed SAP method has significant value in label noise handling. Through clever activation projection design, it achieves efficient model correction. While theoretical analysis is somewhat lacking, experimental validation is comprehensive and practical value is substantial, providing valuable contributions to the field.