2025-11-24T16:37:17.242649

Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning

Tang, Joshi, Kundu
Machine Unlearning (MU) aims to update Machine Learning (ML) models following requests to remove training samples and their influences on a trained model efficiently without retraining the original ML model from scratch. While MU itself has been employed to provide privacy protection and regulatory compliance, it can also increase the attack surface of the model. Existing privacy inference attacks towards MU that aim to infer properties of the unlearned set rely on the weaker threat model that assumes the attacker has access to both the unlearned model and the original model, limiting their feasibility toward real-life scenarios. We propose a novel privacy attack, A Posteriori Label-Only Membership Inference Attack towards MU, Apollo, that infers whether a data sample has been unlearned, following a strict threat model where an adversary has access to the label-output of the unlearned model only. We demonstrate that our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.
academic

Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning

Basic Information

Abstract

Machine Unlearning (MU) aims to efficiently remove training samples and their influence from trained models without retraining from scratch. While MU itself is employed to provide privacy protection and regulatory compliance, it may also expand the attack surface of models. Existing privacy inference attacks against MU assume attackers can access models before and after unlearning, which limits their feasibility in real-world scenarios. This paper proposes a novel privacy attack—Apollo (A Posteriori Label-Only Membership Inference Attack)—which infers whether data samples have been unlearned by accessing only the label outputs of the post-unlearning model. Experiments demonstrate that despite requiring less model access than existing methods, Apollo achieves relatively high accuracy in inferring the membership status of unlearned samples.

Research Background and Motivation

Problem Definition

Core Question: Does machine unlearning, as a privacy protection technique, itself leak privacy information? Specifically, can attackers infer which data has been unlearned by only accessing the post-unlearning model?

Significance

  1. Regulatory Compliance: Regulations such as GDPR and CCPA grant users the "right to be forgotten," requiring ML models to remove user data
  2. Privacy Paradox: Machine unlearning itself is a privacy protection mechanism, but the unlearning process may introduce new privacy risks
  3. Practical Threats: In MLaaS scenarios, users typically cannot access the original model, making existing attack methods inapplicable

Limitations of Existing Methods

Existing membership inference attacks (MIA) against MU suffer from:

  1. Requires Access to Original Model: Most attacks (e.g., Chen et al., Gao et al.) require simultaneous access to models before and after unlearning
  2. Requires Posterior Probabilities: Many methods rely on probability distributions from model outputs
  3. Unrealistic Threat Model: In real MLaaS scenarios, clients typically cannot obtain the original model

Research Motivation

This paper proposes the strictest threat model: attackers can only access label outputs of the post-unlearning model (label-only, a posteriori), which better reflects real-world scenarios. The core insight is that approximate unlearning algorithms produce two types of artifacts in decision space—UNDER-UNLEARNING and OVER-UNLEARNING—which can be exploited to infer membership status.

Core Contributions

  1. Proposes Apollo Attack: The first post-hoc (a posteriori) membership inference attack requiring only black-box, label-only access with the strictest threat model
  2. Formalizes Unlearning Artifacts: Identifies and formally defines UNDER-UNLEARNING and OVER-UNLEARNING phenomena, providing theoretical boundary proofs (Theorems III.3 and III.4)
  3. Comprehensive Experimental Validation: Validates across multiple datasets (CIFAR-10/100, ImageNet) and 6 unlearning algorithms, demonstrating high-precision inference even under strict threat models
  4. Reveals Privacy Threats: Directly contradicts privacy claims of existing unlearning methods, emphasizing the need for more cautious privacy-preserving unlearning approaches

Methodology Details

Task Definition

Input:

  • Post-unlearning model θu=A[D,Du,A(D)]\theta_u = \mathcal{A}[D, D_u, \mathcal{A}(D)] (label-only access)
  • Target sample (x,y)(x, y)
  • Proxy dataset DD' sampled from the same distribution

Output: Binary decision b^{0,1}\hat{b} \in \{0,1\}, determining whether xDux \in D_u (unlearned) or xDx \notin D (not in training)

Constraints:

  • Cannot access the original model θ\theta
  • Cannot access model posterior probabilities, only y^=argmaxfθu(x)\hat{y} = \arg\max f_{\theta_u}(x)
  • Assumes the unlearning algorithm is approximate unlearning

Core Theoretical Foundation

Assumption 1: Over-Learning

Learning causes over-learning: for training samples (x,y)D(x,y) \in D, there exists xxx' \approx x such that: fθ(x)=y (when xD),fθ(x)y (when xD)f_\theta(x') = y \text{ (when } x \in D), \quad f_\theta(x') \neq y \text{ (when } x \notin D)

Conjecture 1: UNDER-UNLEARNING

Approximate unlearning retains partial information. For unlearned samples (x,y)Du(x,y) \in D_u, there exists xxx' \approx x such that:

  • fθ(x)=yf_\theta(x') = y (original model has learned)
  • fθr(x)yf_{\theta_r}(x') \neq y (exact unlearning/retraining does not retain)
  • fθu(x)=yf_{\theta_u}(x') = y (approximate unlearning still retains, under-unlearning)

Intuitive Explanation: Decision boundary has not moved sufficiently; unlearning is incomplete (red region in Figure 2b)

Conjecture 2: OVER-UNLEARNING

Approximate unlearning causes performance degradation. For unlearned samples (x,y)Du(x,y) \in D_u, there exists xxx' \approx x such that:

  • fθ(x)=yf_\theta(x') = y (original model has learned)
  • fθr(x)=yf_{\theta_r}(x') = y (exact unlearning retains)
  • fθu(x)yf_{\theta_u}(x') \neq y (approximate unlearning changes, over-unlearning)

Intuitive Explanation: Decision boundary is over-adjusted, affecting retained set performance (green region in Figure 2c)

Theoretical Bounds

Lemma III.1 (Lipschitz Property of Margin)

Define margin mθ(x):=fθ(x)ymaxjyfθ(x)jm_\theta(x) := f_\theta(x)_y - \max_{j\neq y} f_\theta(x)_j. Under standard Lipschitz conditions: mθ(x)mθ(x)Lxxx+Lθθθ|m_\theta(x) - m_{\theta'}(x')| \leq L_x\|x-x'\| + L_\theta\|\theta-\theta'\|

Theorem III.3 (UNDER-UNLEARNING Bound)

For xx' satisfying UNDER-UNLEARNING, perturbation radius r=xxr = \|x-x'\| satisfies: (mθ(x)LθΔrLx)+=:LUnr<mθ(x)LθΔuLx=:UUn\underbrace{\left(\frac{m_\theta(x) - L_\theta\Delta_r}{L_x}\right)_+}_{=: L_{Un}} \leq r < \underbrace{\frac{m_\theta(x) - L_\theta\Delta_u}{L_x}}_{=: U_{Un}}

where Δu=θuθ\Delta_u = \|\theta_u - \theta\|, Δr=θrθ\Delta_r = \|\theta_r - \theta\|

Theorem III.4 (OVER-UNLEARNING Bound)

Similarly, the OVER-UNLEARNING bound is: (mθ(x)LθΔuLx)+=:LOvr<mθ(x)LθΔrLx=:UOv\underbrace{\left(\frac{m_\theta(x) - L_\theta\Delta_u}{L_x}\right)_+}_{=: L_{Ov}} \leq r < \underbrace{\frac{m_\theta(x) - L_\theta\Delta_r}{L_x}}_{=: U_{Ov}}

Significance: Provides theoretically feasible search space, guiding adversarial example generation

Model Architecture: Apollo Attack Pipeline

Online Attack

  1. Train Shadow Models: Train mm shadow models Θs={θis}\Theta^s = \{\theta^s_i\}, each on dataset DisD^s_i
  2. Unlearn Shadow Models: For each θis\theta^s_i, unlearn target sample xx, obtaining θisu\theta^{su}_i
  3. Generate Adversarial Examples: Optimize xx' to satisfy sensitivity and specificity conditions

UNDER-UNLEARNING Loss Function: Un(x;x,y,Θ)=αxDis(x;θisu)+βxDis^(x;θis)\ell_{Un}(x'; x,y,\Theta) = \alpha \sum_{x \in D^s_i} \ell(x'; \theta^{su}_i) + \beta \sum_{x \notin D^s_i} \hat{\ell}(x'; \theta^s_i)

where:

  • First term (sensitivity): xx' should predict class yy on post-unlearning model
  • Second term (specificity): xx' should not predict yy on models not trained on xx
  • ^=\hat{\ell} = -\ell (negative cross-entropy)

OVER-UNLEARNING Loss Function: Ov(x;x,y,Θ)=αxDis^(x;θisu)+βxDis(x;θis)\ell_{Ov}(x'; x,y,\Theta) = \alpha \sum_{x \in D^s_i} \hat{\ell}(x'; \theta^{su}_i) + \beta \sum_{x \notin D^s_i} \ell(x'; \theta^s_i)

Offline Attack

To reduce computational cost, replace sensitivity condition with decision boundary distance: Unoff(x;x,y,Θ)=αid(x,DB)+βi^(x;θis)\ell^{off}_{Un}(x'; x,y,\Theta) = \alpha \sum_i d(x', DB) + \beta \sum_i \hat{\ell}(x'; \theta^s_i)

Algorithm 1: Adversarial Example Generation

Input: Target model θ_u, target sample (x,y), shadow models Θ^s, step size ε
Output: Adversarial example x'

x' ← x
for t = 1 to T:
    Compute gradient g_{t,i} ← ∇_{x'} ℓ(x'; x,y,Θ)
    x' ← SGD(x', average gradient)
    Project to annulus B_{tε}(x) \ B_{(t-1)ε}(x)  // locality constraint
    if average confidence < τ:
        early stop
return x'

Key Design:

  • Progressively expand search radius (from (t1)ϵ(t-1)\epsilon to tϵt\epsilon)
  • Projection ensures locality (total perturbation Tϵ\leq T\cdot\epsilon)
  • Early stopping mechanism improves efficiency

Technical Innovations

  1. Paradigm Shift: From comparing models before/after unlearning → comparing unlearned model with ideal retrained model
  2. Theoretical Support: First to provide Lipschitz theoretical bounds for unlearning attacks
  3. Strong Practicality: Offline version avoids re-unlearning shadow models for each target sample
  4. Good Adaptability: Leverages both UNDER and OVER phenomena simultaneously, improving robustness

Experimental Setup

Datasets

DatasetTraining SizeTest SizeClassesUnlearning Ratio
CIFAR-1020,00010,0001010%
CIFAR-10020,00010,00010010%
ImageNet512,466256,2351,00010%

Data Partitioning Strategy:

  • Slice (a): Training set DD
  • Slice (b): Shadow datasets (offline)
  • Slice (c): Test set DtD_t
  • Online attack: Shadow set sampled from (a)+(b); Offline attack: Only from (b)

Model Architecture

  • ResNet-18: Primary experimental model
  • VGG-16: Ablation studies
  • Swin Transformer: Transfer learning tests

Training Configuration:

  • Optimizer: AdamW
  • Learning rate: 1×1041 \times 10^{-4}
  • Batch size: 64
  • Epochs: 100 (target model), 50 (shadow models)
  • Accuracy requirement: ≥75% on DtD_t

Unlearning Algorithms

Test 6 representative algorithms + retraining baseline:

AlgorithmTypeCore Idea
GA 45BaselineGradient ascent, focuses on DuD_u
FT 18BaselineFine-tuning, focuses on DrD_r
BT 54Knowledge DistillationGuide unlearning using "bad teacher"
SCRUB 10Posterior DivergenceMaximize divergence between pre/post models
SalUn 55SOTASignificance-based parameter selection
SFR-on 53SOTARetained set geometry preservation
RTExact UnlearningRetrain from scratch (theoretically immune)

Evaluation Metrics

Primary Metric: TPR @ low FPR (True Positive Rate at low False Positive Rate)

  • Rationale: High precision is more valuable for privacy attacks
  • Reporting: TPR @ lowest achievable FPR for each algorithm

Auxiliary Metrics: Precision, Recall, ROC curves

Baseline Methods

  1. U-MIA 10: Naive method using SVM classifier (RBF kernel, C=3)
  2. U-LiRA 11: Likelihood ratio-based attack using logit-transformed posterior probabilities

Note: Chen et al., Gao et al., Lu et al. excluded as they require access to original model

Implementation Details

Apollo Hyperparameters:

  • Number of shadow models: m=32m = 32
  • Search step size: ϵ=1.0\epsilon = 1.0
  • Search rounds: T=50T = 50
  • Loss weights: α=1,β=4\alpha = 1, \beta = 4 (emphasize specificity)
  • Target samples: 200 (unlearned) + 200 (test)

Hardware: NVIDIA A100 (40GB), ~20 minutes training time per model

Experimental Results

Main Results

Table II: Performance on CIFAR-10

MethodGAFTBTSCRUBSalUnSFR-onRT
U-MIA16.5@6.0%11.5@9.5%95.0@2.5%9.0@4.0%15.5@4.5%3.0@2.5%5.5@4.5%
U-LiRA68.5@6.0%6.5@9.5%28.0@2.5%6.0@4.0%20.0@4.5%2.5@2.5%4.0@4.5%
Apollo18.0@6.0%6.5@9.5%4.0@2.5%21.5@4.0%4.5@4.5%10.0@2.5%5.0@4.5%
Apollo (Off)16.0@6.0%6.5@9.5%3.0@2.5%15.0@4.0%7.5@4.5%5.0@2.5%7.0@4.5%

Key Findings:

  1. GA Most Vulnerable: U-LiRA achieves 68.5% TPR, Apollo achieves 18%
  2. SCRUB Easily Attacked: Apollo outperforms U-LiRA (21.5% vs 6.0%)
  3. SFR-on Performance: Apollo achieves 10% TPR, U-LiRA only 2.5%
  4. RT Basically Safe: All attacks TPR ≤ 7%, close to random guessing

Table III: Performance on CIFAR-100

MethodGAFTBTSCRUBSalUnSFR-onRT
U-MIA7.5@0.5%0.5@1.0%48.5@13.5%17.0@5.0%8.5@1.5%2.0@1.5%1.0@1.0%
U-LiRA14.5@0.5%1.0@1.0%25.0@13.5%12.5@5.0%17.0@1.5%2.0@1.5%1.5@1.0%
Apollo15.5@0.5%2.0@1.0%50.0@13.5%41.5@5.0%5.0@1.5%0.5@1.5%1.5@1.0%
Apollo (Off)13.0@0.5%2.0@1.0%41.5@13.5%39.0@5.0%4.5@1.5%1.0@1.5%0.5@1.0%

Key Findings:

  1. Performance Improvement: Apollo performs better on CIFAR-100 (more classes, fewer samples per class)
  2. SCRUB Major Weakness: Apollo achieves 41.5%, far exceeding U-LiRA's 12.5%
  3. BT Consistently Vulnerable: Apollo achieves 50% TPR

Table IV: Performance on ImageNet

Trends similar to CIFAR-100, with Apollo excelling on GA and SCRUB

ROC Curve Analysis (Figure 4)

GA (4a): U-LiRA strongest, Apollo second, overall high AUC FT (4b): All attacks ineffective, Apollo slightly better BT (4c): U-MIA strongest (95% TPR), Apollo weaker SCRUB (4d): Apollo clearly superior to U-LiRA SalUn (4e): U-LiRA slightly better SFR-on (4f): Apollo shows clear advantage in low FPR region RT (4g): All attacks near random line

Ablation Studies

1. UNDER vs OVER Dynamics (Figure 5)

Heatmaps showing TPR under different search radii for both phenomena:

Success Cases (GA, SFR-on):

  • Clear boundary effects: low TPR in regions near axes
  • Validates Theorems III.3 and III.4
  • UNDER and OVER effective in different radius ranges

Failure Cases (BT, SalUn):

  • OVER-UNLEARNING nearly uniformly distributed
  • UNDER-UNLEARNING scarce
  • Hypothesis: Algorithm design violates local Lipschitz assumption

2. Hyperparameter Impact (Figure 6)

Loss Weight β/α\beta/\alpha (6a):

  • Higher β/α\beta/\alpha → better precision-recall tradeoff
  • Recommended β/α=4\beta/\alpha = 4 (emphasize specificity)

Number of Shadow Models mm (6b):

  • m16m \leq 16: Increasing mm improves performance
  • m=32m = 32: Performance decreases (overfitting to specific shadow models)
  • Consistent with Wen et al. 36 observations

3. Architecture Transferability (Table V)

Target ModelShadow ModelTPR@FPR
ResNet-18ResNet-1818.0@6.0%
ResNet-18VGG-1612.0@6.0%
ResNet-18Swin-T13.5@6.0%
VGG-16VGG-165.5@2.5%
Swin-TSwin-T11.5@4.5%

Conclusion: Architecture mismatch reduces performance but maintains high accuracy

Case Study: 2D Example (Figure 3)

Experimental Setup:

  • Data: R2×{0,1,2,3}\mathbb{R}^2 \times \{0,1,2,3\}, 500 samples
  • Model: 12-layer small NN (Table VI)
  • Unlearning: 10% training set using GA

Observations (3a):

  • Red region: UNDER-UNLEARNING (θu\theta_u predicts same as θ\theta, different from θr\theta_r)
  • Green region: OVER-UNLEARNING (θu\theta_u predicts different from θr\theta_r, same as θ\theta)
  • Both phenomena present simultaneously

Adversarial Example Trajectory (3c):

  • Starts from unlearned sample
  • Progressively moves to UNDER-UNLEARNING region
  • Validates Algorithm 1 effectiveness

Experimental Findings

  1. Massive Differences Between Algorithms:
    • GA, SCRUB, SFR-on vulnerable to attack
    • BT vulnerable to U-MIA, robust to Apollo
    • SalUn relatively safe overall
  2. Dataset Complexity Impact:
    • Attacks more effective on CIFAR-100 and ImageNet (more classes, fewer samples)
    • Decision boundaries more sensitive
  3. Theory-Practice Consistency:
    • Successful attacks show clear boundary effects
    • Failed cases possibly violate Lipschitz assumption
  4. Offline Attack Feasibility:
    • Slightly lower performance than online version
    • Significantly reduces computational cost
  5. Threat Ubiquity:
    • Even under strictest threat model, most algorithms remain attackable
    • Retraining (RT) basically safe but not scalable

Machine Unlearning

Exact Unlearning:

  • Bourtoule et al. 2 SISA: Partition training, retrain only affected sub-models
  • Yan et al. 20: Partition by class

Approximate Unlearning (focus of this paper):

  • Baselines: GA 45 (gradient ascent), FT 18 (fine-tuning)
  • Knowledge Distillation: BT 54
  • Posterior Divergence: SCRUB 10
  • Significance Methods: SalUn 55, SFR-on 53

Membership Inference Attacks (MIA)

Classical MIA:

  • Shokri et al. 27: Shadow model training attack classifier
  • Yeom et al. 28: Exploit overfitting-induced member advantage
  • Carlini et al. 29: Likelihood ratio-based LiRA attack

Label-Only Attacks:

  • Choquette-Choo et al. 32: First label-only MIA
  • Peng et al. 33 OSLO: Measure confidence via adversarial perturbation
  • Wu et al. 34 YOQO: Reduce query count

MIA Against Machine Unlearning

AttackAccess θ\thetaAccess θu\theta_uPosterior
Chen et al. 7
Gao et al. 8
Lu et al. 9
U-MIA 10
U-LiRA 11
Apollo

Paper's Advantage: Strictest threat model, no need for original model or posterior probabilities

Conclusions and Discussion

Main Conclusions

  1. Privacy Threat is Real: Even under strictest threat model (label-only access, no original model), attackers can infer unlearned samples with high accuracy
  2. Solid Theoretical Foundation: UNDER-UNLEARNING and OVER-UNLEARNING have clear theoretical bounds (under Lipschitz conditions)
  3. Strong Practicality:
    • Online version: TPR up to 68.5% (GA on CIFAR-10)
    • Offline version: Slightly lower performance, significantly reduced computational cost
  4. Significant Algorithm Differences: Vulnerability of different unlearning algorithms varies dramatically, requiring targeted defenses
  5. Challenges Existing Claims: Directly contradicts privacy protection claims of most unlearning methods

Limitations

Author-Acknowledged Limitations:

  1. FPR Adjustment Difficulty: Adjusting FPR via hyperparameters (T,ϵ,τT, \epsilon, \tau) less flexible than likelihood methods
  2. Computational Cost: Requires training multiple shadow models (offline version mitigates this)
  3. Theoretical Assumptions: Local Lipschitz conditions not always satisfied (e.g., BT, SalUn cases)

Potential Unmentioned Issues:

  1. Sample Selection Bias: Only 200 samples tested, may not represent overall distribution
  2. Fixed Unlearning Ratio: Only 10% unlearning tested, other ratios unknown
  3. Adversarial Defenses: No discussion of possible defenses (e.g., noise addition, differential privacy)
  4. LLM Applicability: Primarily for image classification, unlearning in large language models untested

Future Directions

  1. More Efficient Attacks: Reduce shadow model count and query count
  2. Defense Mechanisms: Design unlearning algorithms robust to Apollo
  3. Theory Refinement: Relax Lipschitz assumptions, extend to non-local cases
  4. Other Modalities: Extend to object detection, semantic segmentation, etc.
  5. Privacy-Preserving Unlearning: Combine differential privacy with unlearning methods

In-Depth Evaluation

Strengths

Methodological Innovation:

  1. Paradigm Shift: From "comparing before/after unlearning" to "comparing unlearned vs. ideal retrained," better aligns with unlearning definition
  2. Theoretical Depth: First to provide Lipschitz theoretical bounds, formalizes UNDER/OVER phenomena
  3. Strict Threat Model: Label-only + a posteriori is most challenging setting

Experimental Sufficiency:

  1. Diverse Datasets: CIFAR-10/100 (small-scale), ImageNet (large-scale)
  2. Broad Algorithm Coverage: 6 representative unlearning algorithms + retraining baseline
  3. Thorough Ablations: Hyperparameters, architecture transfer, UNDER/OVER dynamics
  4. Clear Visualizations: 2D examples intuitively demonstrate core ideas

Result Convincingness:

  1. Comprehensive Comparisons: Compared with U-MIA, U-LiRA, highlighting advantages
  2. Statistical Significance: 200 samples × multiple experiments, results reliable
  3. Theory-Practice Alignment: Experimental observations consistent with theoretical predictions (Figure 5)

Writing Quality:

  1. Clear Structure: Motivation → Theory → Method → Experiments, logically rigorous
  2. Rigorous Terminology: Formal definitions (Def. 1-3), complete theorem proofs
  3. High Reproducibility: Open-source code, detailed hyperparameters (Table VII)

Weaknesses

Methodological Limitations:

  1. Strong Lipschitz Assumption: Inapplicable to all models and unlearning algorithms (e.g., BT failure)
  2. Locality Constraint: Fixed search radius TϵT\cdot\epsilon may miss distant artifacts
  3. Binary Classification Simplification: Ignores DrD_r membership, actually a three-class problem

Experimental Defects:

  1. Single Unlearning Ratio: Only 10% tested, 1% or 50% ratios unknown
  2. Small Sample Size: 200+200 samples may insufficient for tail risk assessment
  3. Missing Defense Experiments: No testing of noise addition, differential privacy defenses
  4. Limited Architecture: Primarily ResNet-18, insufficient Transformer testing

Insufficient Analysis:

  1. Shallow Failure Explanation: "Violates Lipschitz" lacks deep analysis
  2. Algorithm Difference Unexplained: Why is BT vulnerable to U-MIA but robust to Apollo?
  3. Missing Practicality Discussion: Real MLaaS scenario feasibility (e.g., query limits)

Ethical Considerations:

  1. Double-Edged Nature: Attack method could be maliciously used
  2. Insufficient Defense Suggestions: Only emphasizes "need more caution," lacks specific solutions

Impact

Contribution to Field:

  1. Breaks Assumptions: Proves no need for original model to attack, pushes stricter privacy definitions
  2. Theoretical Tools: Lipschitz bounds applicable to analyzing other unlearning methods
  3. Evaluation Benchmark: Apollo serves as privacy audit tool for unlearning algorithms

Practical Value:

  1. Audit Tool: Helps assess privacy leakage risks of unlearning algorithms
  2. Design Guidance: UNDER/OVER phenomena suggest algorithm improvement directions
  3. Regulatory Reference: Provides technical implementation basis for GDPR-like regulations

Reproducibility:

Potential Impact:

  1. Short-term: Drives improvement of unlearning algorithms (e.g., further optimization of SalUn, SFR-on)
  2. Medium-term: May spark research surge in privacy-preserving unlearning (e.g., DP-Unlearning)
  3. Long-term: Influences technical standard-setting for privacy regulations

Applicable Scenarios

Suitable Applications:

  1. Privacy Audit: Assess privacy guarantees of unlearning services
  2. Algorithm Testing: Robustness testing for new unlearning methods
  3. Regulatory Compliance: Verify GDPR requirement satisfaction

Unsuitable Applications:

  1. LLM Unlearning: Label definition unclear for text generation
  2. Small-Sample Scenarios: Shadow model training requires large data
  3. Real-Time Systems: Adversarial example generation time-consuming (50 SGD steps)

Generalization Potential:

  • Other Tasks: Object detection, semantic segmentation (requires "label" redefinition)
  • Federated Learning: Distributed unlearning privacy audit
  • Model Compression: Pruning, distillation membership inference scenarios

Key References

  1. Cao & Yang (2015): First proposed machine unlearning concept
  2. Bourtoule et al. (2021): SISA exact unlearning algorithm
  3. Carlini et al. (2022): LiRA likelihood ratio attack
  4. Choquette-Choo et al. (2021): First label-only MIA
  5. Hayes et al. (2024): U-LiRA attack against unlearning
  6. Huang et al. (2024): SFR-on unified gradient unlearning framework
  7. Fan et al. (2024): SalUn significance-based unlearning

Summary

Apollo is a high-quality machine learning security paper that reveals privacy risks of machine unlearning through the strictest threat model (label-only, a posteriori). Its core contributions are:

  1. Theoretical Innovation: Formalizes UNDER/OVER-UNLEARNING, provides Lipschitz bounds
  2. Practical Method: Online/offline versions balance effectiveness and cost
  3. Solid Experiments: Multi-dataset, multi-algorithm, thorough ablations, credible conclusions

Despite limitations like strong Lipschitz assumptions and small sample sizes, the paper directly challenges unlearning's effectiveness as privacy tool, providing important warning to the field. Recommended future work:

  • Explore attacks in non-Lipschitz scenarios
  • Design unlearning algorithms robust to Apollo
  • Extend to LLMs and other modalities

Recommendation Score: ⭐⭐⭐⭐☆ (4.5/5)

  • Innovation: 5/5
  • Rigor: 4/5
  • Practicality: 4/5
  • Readability: 5/5