2025-11-24T16:37:17.242649

Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning

Tang, Joshi, Kundu

Machine Unlearning (MU) aims to update Machine Learning (ML) models following requests to remove training samples and their influences on a trained model efficiently without retraining the original ML model from scratch. While MU itself has been employed to provide privacy protection and regulatory compliance, it can also increase the attack surface of the model. Existing privacy inference attacks towards MU that aim to infer properties of the unlearned set rely on the weaker threat model that assumes the attacker has access to both the unlearned model and the original model, limiting their feasibility toward real-life scenarios. We propose a novel privacy attack, A Posteriori Label-Only Membership Inference Attack towards MU, Apollo, that infers whether a data sample has been unlearned, following a strict threat model where an adversary has access to the label-output of the unlearned model only. We demonstrate that our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.

academic

Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning

Basic Information

Paper ID: 2506.09923
Title: Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning
Authors: Liou Tang, James Joshi (University of Pittsburgh), Ashish Kundu (Cisco Research)
Classification: cs.LG (Machine Learning)
Publication Date: October 27, 2025 (arXiv v2)
Paper Link: https://arxiv.org/abs/2506.09923v2
Code Link: https://github.com/LiouTang/Unlearn-Apollo-Attack

Abstract

Machine Unlearning (MU) aims to efficiently remove training samples and their influence from trained models without retraining from scratch. While MU itself is employed to provide privacy protection and regulatory compliance, it may also expand the attack surface of models. Existing privacy inference attacks against MU assume attackers can access models before and after unlearning, which limits their feasibility in real-world scenarios. This paper proposes a novel privacy attack—Apollo (A Posteriori Label-Only Membership Inference Attack)—which infers whether data samples have been unlearned by accessing only the label outputs of the post-unlearning model. Experiments demonstrate that despite requiring less model access than existing methods, Apollo achieves relatively high accuracy in inferring the membership status of unlearned samples.

Research Background and Motivation

Problem Definition

Core Question: Does machine unlearning, as a privacy protection technique, itself leak privacy information? Specifically, can attackers infer which data has been unlearned by only accessing the post-unlearning model?

Significance

Regulatory Compliance: Regulations such as GDPR and CCPA grant users the "right to be forgotten," requiring ML models to remove user data
Privacy Paradox: Machine unlearning itself is a privacy protection mechanism, but the unlearning process may introduce new privacy risks
Practical Threats: In MLaaS scenarios, users typically cannot access the original model, making existing attack methods inapplicable

Limitations of Existing Methods

Existing membership inference attacks (MIA) against MU suffer from:

Requires Access to Original Model: Most attacks (e.g., Chen et al., Gao et al.) require simultaneous access to models before and after unlearning
Requires Posterior Probabilities: Many methods rely on probability distributions from model outputs
Unrealistic Threat Model: In real MLaaS scenarios, clients typically cannot obtain the original model

Research Motivation

This paper proposes the strictest threat model: attackers can only access label outputs of the post-unlearning model (label-only, a posteriori), which better reflects real-world scenarios. The core insight is that approximate unlearning algorithms produce two types of artifacts in decision space—UNDER-UNLEARNING and OVER-UNLEARNING—which can be exploited to infer membership status.

Core Contributions

Proposes Apollo Attack: The first post-hoc (a posteriori) membership inference attack requiring only black-box, label-only access with the strictest threat model
Formalizes Unlearning Artifacts: Identifies and formally defines UNDER-UNLEARNING and OVER-UNLEARNING phenomena, providing theoretical boundary proofs (Theorems III.3 and III.4)
Comprehensive Experimental Validation: Validates across multiple datasets (CIFAR-10/100, ImageNet) and 6 unlearning algorithms, demonstrating high-precision inference even under strict threat models
Reveals Privacy Threats: Directly contradicts privacy claims of existing unlearning methods, emphasizing the need for more cautious privacy-preserving unlearning approaches

Methodology Details

Task Definition

Input:

Post-unlearning model $\theta_u = \mathcal{A}[D, D_u, \mathcal{A}(D)]$ (label-only access)
Target sample $(x, y)$
Proxy dataset $D'$ sampled from the same distribution

Output: Binary decision $\hat{b} \in \{0,1\}$ , determining whether $x \in D_u$ (unlearned) or $x \notin D$ (not in training)

Constraints:

Cannot access the original model $\theta$
Cannot access model posterior probabilities, only $\hat{y} = \arg\max f_{\theta_u}(x)$
Assumes the unlearning algorithm is approximate unlearning

Core Theoretical Foundation

Assumption 1: Over-Learning

Learning causes over-learning: for training samples $(x,y) \in D$ , there exists $x' \approx x$ such that: $f_\theta(x') = y \text{ (when } x \in D), \quad f_\theta(x') \neq y \text{ (when } x \notin D)$

Conjecture 1: UNDER-UNLEARNING

Approximate unlearning retains partial information. For unlearned samples $(x,y) \in D_u$ , there exists $x' \approx x$ such that:

$f_\theta(x') = y$ (original model has learned)
$f_{\theta_r}(x') \neq y$ (exact unlearning/retraining does not retain)
$f_{\theta_u}(x') = y$ (approximate unlearning still retains, under-unlearning)

Intuitive Explanation: Decision boundary has not moved sufficiently; unlearning is incomplete (red region in Figure 2b)

Conjecture 2: OVER-UNLEARNING

Approximate unlearning causes performance degradation. For unlearned samples $(x,y) \in D_u$ , there exists $x' \approx x$ such that:

$f_\theta(x') = y$ (original model has learned)
$f_{\theta_r}(x') = y$ (exact unlearning retains)
$f_{\theta_u}(x') \neq y$ (approximate unlearning changes, over-unlearning)

Intuitive Explanation: Decision boundary is over-adjusted, affecting retained set performance (green region in Figure 2c)

Theoretical Bounds

Lemma III.1 (Lipschitz Property of Margin)

Define margin $m_\theta(x) := f_\theta(x)_y - \max_{j\neq y} f_\theta(x)_j$ . Under standard Lipschitz conditions: $|m_\theta(x) - m_{\theta'}(x')| \leq L_x\|x-x'\| + L_\theta\|\theta-\theta'\|$

Theorem III.3 (UNDER-UNLEARNING Bound)

For $x'$ satisfying UNDER-UNLEARNING, perturbation radius $r = \|x-x'\|$ satisfies: $\underbrace{\left(\frac{m_\theta(x) - L_\theta\Delta_r}{L_x}\right)_+}_{=: L_{Un}} \leq r < \underbrace{\frac{m_\theta(x) - L_\theta\Delta_u}{L_x}}_{=: U_{Un}}$

where $\Delta_u = \|\theta_u - \theta\|$ , $\Delta_r = \|\theta_r - \theta\|$

Theorem III.4 (OVER-UNLEARNING Bound)

Similarly, the OVER-UNLEARNING bound is: $\underbrace{\left(\frac{m_\theta(x) - L_\theta\Delta_u}{L_x}\right)_+}_{=: L_{Ov}} \leq r < \underbrace{\frac{m_\theta(x) - L_\theta\Delta_r}{L_x}}_{=: U_{Ov}}$

Significance: Provides theoretically feasible search space, guiding adversarial example generation

Model Architecture: Apollo Attack Pipeline

Online Attack

Train Shadow Models: Train $m$ shadow models $\Theta^s = \{\theta^s_i\}$ , each on dataset $D^s_i$
Unlearn Shadow Models: For each $\theta^s_i$ , unlearn target sample $x$ , obtaining $\theta^{su}_i$
Generate Adversarial Examples: Optimize $x'$ to satisfy sensitivity and specificity conditions

UNDER-UNLEARNING Loss Function: $\ell_{Un}(x'; x,y,\Theta) = \alpha \sum_{x \in D^s_i} \ell(x'; \theta^{su}_i) + \beta \sum_{x \notin D^s_i} \hat{\ell}(x'; \theta^s_i)$

where:

First term (sensitivity): $x'$ should predict class $y$ on post-unlearning model
Second term (specificity): $x'$ should not predict $y$ on models not trained on $x$
$\hat{\ell} = -\ell$ (negative cross-entropy)

OVER-UNLEARNING Loss Function: $\ell_{Ov}(x'; x,y,\Theta) = \alpha \sum_{x \in D^s_i} \hat{\ell}(x'; \theta^{su}_i) + \beta \sum_{x \notin D^s_i} \ell(x'; \theta^s_i)$

Offline Attack

To reduce computational cost, replace sensitivity condition with decision boundary distance: $\ell^{off}_{Un}(x'; x,y,\Theta) = \alpha \sum_i d(x', DB) + \beta \sum_i \hat{\ell}(x'; \theta^s_i)$

Algorithm 1: Adversarial Example Generation

Input: Target model θ_u, target sample (x,y), shadow models Θ^s, step size ε
Output: Adversarial example x'

x' ← x
for t = 1 to T:
    Compute gradient g_{t,i} ← ∇_{x'} ℓ(x'; x,y,Θ)
    x' ← SGD(x', average gradient)
    Project to annulus B_{tε}(x) \ B_{(t-1)ε}(x)  // locality constraint
    if average confidence < τ:
        early stop
return x'

Key Design:

Progressively expand search radius (from $(t-1)\epsilon$ to $t\epsilon$ )
Projection ensures locality (total perturbation $\leq T\cdot\epsilon$ )
Early stopping mechanism improves efficiency

Technical Innovations

Paradigm Shift: From comparing models before/after unlearning → comparing unlearned model with ideal retrained model
Theoretical Support: First to provide Lipschitz theoretical bounds for unlearning attacks
Strong Practicality: Offline version avoids re-unlearning shadow models for each target sample
Good Adaptability: Leverages both UNDER and OVER phenomena simultaneously, improving robustness

Experimental Setup

Datasets

Dataset	Training Size	Test Size	Classes	Unlearning Ratio
CIFAR-10	20,000	10,000	10	10%
CIFAR-100	20,000	10,000	100	10%
ImageNet	512,466	256,235	1,000	10%

Data Partitioning Strategy:

Slice (a): Training set $D$
Slice (b): Shadow datasets (offline)
Slice (c): Test set $D_t$
Online attack: Shadow set sampled from (a)+(b); Offline attack: Only from (b)

Model Architecture

ResNet-18: Primary experimental model
VGG-16: Ablation studies
Swin Transformer: Transfer learning tests

Training Configuration:

Optimizer: AdamW
Learning rate: $1 \times 10^{-4}$
Batch size: 64
Epochs: 100 (target model), 50 (shadow models)
Accuracy requirement: ≥75% on $D_t$

Unlearning Algorithms

Test 6 representative algorithms + retraining baseline:

Algorithm	Type	Core Idea
GA 45	Baseline	Gradient ascent, focuses on $D_u$
FT 18	Baseline	Fine-tuning, focuses on $D_r$
BT 54	Knowledge Distillation	Guide unlearning using "bad teacher"
SCRUB 10	Posterior Divergence	Maximize divergence between pre/post models
SalUn 55	SOTA	Significance-based parameter selection
SFR-on 53	SOTA	Retained set geometry preservation
RT	Exact Unlearning	Retrain from scratch (theoretically immune)

Evaluation Metrics

Primary Metric: TPR @ low FPR (True Positive Rate at low False Positive Rate)

Rationale: High precision is more valuable for privacy attacks
Reporting: TPR @ lowest achievable FPR for each algorithm

Auxiliary Metrics: Precision, Recall, ROC curves

Baseline Methods

U-MIA 10: Naive method using SVM classifier (RBF kernel, C=3)
U-LiRA 11: Likelihood ratio-based attack using logit-transformed posterior probabilities

Note: Chen et al., Gao et al., Lu et al. excluded as they require access to original model

Implementation Details

Apollo Hyperparameters:

Number of shadow models: $m = 32$
Search step size: $\epsilon = 1.0$
Search rounds: $T = 50$
Loss weights: $\alpha = 1, \beta = 4$ (emphasize specificity)
Target samples: 200 (unlearned) + 200 (test)

Hardware: NVIDIA A100 (40GB), ~20 minutes training time per model

Experimental Results

Main Results

Table II: Performance on CIFAR-10

Method	GA	FT	BT	SCRUB	SalUn	SFR-on	RT
U-MIA	16.5@6.0%	11.5@9.5%	95.0@2.5%	9.0@4.0%	15.5@4.5%	3.0@2.5%	5.5@4.5%
U-LiRA	68.5@6.0%	6.5@9.5%	28.0@2.5%	6.0@4.0%	20.0@4.5%	2.5@2.5%	4.0@4.5%
Apollo	18.0@6.0%	6.5@9.5%	4.0@2.5%	21.5@4.0%	4.5@4.5%	10.0@2.5%	5.0@4.5%
Apollo (Off)	16.0@6.0%	6.5@9.5%	3.0@2.5%	15.0@4.0%	7.5@4.5%	5.0@2.5%	7.0@4.5%

Key Findings:

GA Most Vulnerable: U-LiRA achieves 68.5% TPR, Apollo achieves 18%
SCRUB Easily Attacked: Apollo outperforms U-LiRA (21.5% vs 6.0%)
SFR-on Performance: Apollo achieves 10% TPR, U-LiRA only 2.5%
RT Basically Safe: All attacks TPR ≤ 7%, close to random guessing

Table III: Performance on CIFAR-100

Method	GA	FT	BT	SCRUB	SalUn	SFR-on	RT
U-MIA	7.5@0.5%	0.5@1.0%	48.5@13.5%	17.0@5.0%	8.5@1.5%	2.0@1.5%	1.0@1.0%
U-LiRA	14.5@0.5%	1.0@1.0%	25.0@13.5%	12.5@5.0%	17.0@1.5%	2.0@1.5%	1.5@1.0%
Apollo	15.5@0.5%	2.0@1.0%	50.0@13.5%	41.5@5.0%	5.0@1.5%	0.5@1.5%	1.5@1.0%
Apollo (Off)	13.0@0.5%	2.0@1.0%	41.5@13.5%	39.0@5.0%	4.5@1.5%	1.0@1.5%	0.5@1.0%

Key Findings:

Performance Improvement: Apollo performs better on CIFAR-100 (more classes, fewer samples per class)
SCRUB Major Weakness: Apollo achieves 41.5%, far exceeding U-LiRA's 12.5%
BT Consistently Vulnerable: Apollo achieves 50% TPR

Table IV: Performance on ImageNet

Trends similar to CIFAR-100, with Apollo excelling on GA and SCRUB

ROC Curve Analysis (Figure 4)

GA (4a): U-LiRA strongest, Apollo second, overall high AUC FT (4b): All attacks ineffective, Apollo slightly better BT (4c): U-MIA strongest (95% TPR), Apollo weaker SCRUB (4d): Apollo clearly superior to U-LiRA SalUn (4e): U-LiRA slightly better SFR-on (4f): Apollo shows clear advantage in low FPR region RT (4g): All attacks near random line

Ablation Studies

1. UNDER vs OVER Dynamics (Figure 5)

Heatmaps showing TPR under different search radii for both phenomena:

Success Cases (GA, SFR-on):

Clear boundary effects: low TPR in regions near axes
Validates Theorems III.3 and III.4
UNDER and OVER effective in different radius ranges

Failure Cases (BT, SalUn):

OVER-UNLEARNING nearly uniformly distributed
UNDER-UNLEARNING scarce
Hypothesis: Algorithm design violates local Lipschitz assumption

2. Hyperparameter Impact (Figure 6)

Loss Weight $\beta/\alpha$ (6a):

Higher $\beta/\alpha$ → better precision-recall tradeoff
Recommended $\beta/\alpha = 4$ (emphasize specificity)

Number of Shadow Models $m$ (6b):

$m \leq 16$ : Increasing $m$ improves performance
$m = 32$ : Performance decreases (overfitting to specific shadow models)
Consistent with Wen et al. 36 observations

3. Architecture Transferability (Table V)

Target Model	Shadow Model	TPR@FPR
ResNet-18	ResNet-18	18.0@6.0%
ResNet-18	VGG-16	12.0@6.0%
ResNet-18	Swin-T	13.5@6.0%
VGG-16	VGG-16	5.5@2.5%
Swin-T	Swin-T	11.5@4.5%

Conclusion: Architecture mismatch reduces performance but maintains high accuracy

Case Study: 2D Example (Figure 3)

Experimental Setup:

Data: $\mathbb{R}^2 \times \{0,1,2,3\}$ , 500 samples
Model: 12-layer small NN (Table VI)
Unlearning: 10% training set using GA

Observations (3a):

Red region: UNDER-UNLEARNING ( $\theta_u$ predicts same as $\theta$ , different from $\theta_r$ )
Green region: OVER-UNLEARNING ( $\theta_u$ predicts different from $\theta_r$ , same as $\theta$ )
Both phenomena present simultaneously

Adversarial Example Trajectory (3c):

Starts from unlearned sample
Progressively moves to UNDER-UNLEARNING region
Validates Algorithm 1 effectiveness

Experimental Findings

Massive Differences Between Algorithms:
- GA, SCRUB, SFR-on vulnerable to attack
- BT vulnerable to U-MIA, robust to Apollo
- SalUn relatively safe overall
Dataset Complexity Impact:
- Attacks more effective on CIFAR-100 and ImageNet (more classes, fewer samples)
- Decision boundaries more sensitive
Theory-Practice Consistency:
- Successful attacks show clear boundary effects
- Failed cases possibly violate Lipschitz assumption
Offline Attack Feasibility:
- Slightly lower performance than online version
- Significantly reduces computational cost
Threat Ubiquity:
- Even under strictest threat model, most algorithms remain attackable
- Retraining (RT) basically safe but not scalable

Machine Unlearning

Exact Unlearning:

Bourtoule et al. 2 SISA: Partition training, retrain only affected sub-models
Yan et al. 20: Partition by class

Approximate Unlearning (focus of this paper):

Baselines: GA 45 (gradient ascent), FT 18 (fine-tuning)
Knowledge Distillation: BT 54
Posterior Divergence: SCRUB 10
Significance Methods: SalUn 55, SFR-on 53

Membership Inference Attacks (MIA)

Classical MIA:

Shokri et al. 27: Shadow model training attack classifier
Yeom et al. 28: Exploit overfitting-induced member advantage
Carlini et al. 29: Likelihood ratio-based LiRA attack

Label-Only Attacks:

Choquette-Choo et al. 32: First label-only MIA
Peng et al. 33 OSLO: Measure confidence via adversarial perturbation
Wu et al. 34 YOQO: Reduce query count

MIA Against Machine Unlearning

Attack	Access $\theta$	Access $\theta_u$	Posterior
Chen et al. 7	✓	✓	✓
Gao et al. 8	✓	✓	✓
Lu et al. 9	✓	✓	✗
U-MIA 10	✗	✓	✓
U-LiRA 11	✗	✓	✓
Apollo	✗	✓	✗

Paper's Advantage: Strictest threat model, no need for original model or posterior probabilities

Conclusions and Discussion

Main Conclusions

Privacy Threat is Real: Even under strictest threat model (label-only access, no original model), attackers can infer unlearned samples with high accuracy
Solid Theoretical Foundation: UNDER-UNLEARNING and OVER-UNLEARNING have clear theoretical bounds (under Lipschitz conditions)
Strong Practicality:
- Online version: TPR up to 68.5% (GA on CIFAR-10)
- Offline version: Slightly lower performance, significantly reduced computational cost
Significant Algorithm Differences: Vulnerability of different unlearning algorithms varies dramatically, requiring targeted defenses
Challenges Existing Claims: Directly contradicts privacy protection claims of most unlearning methods

Limitations

Author-Acknowledged Limitations:

FPR Adjustment Difficulty: Adjusting FPR via hyperparameters ( $T, \epsilon, \tau$ ) less flexible than likelihood methods
Computational Cost: Requires training multiple shadow models (offline version mitigates this)
Theoretical Assumptions: Local Lipschitz conditions not always satisfied (e.g., BT, SalUn cases)

Potential Unmentioned Issues:

Sample Selection Bias: Only 200 samples tested, may not represent overall distribution
Fixed Unlearning Ratio: Only 10% unlearning tested, other ratios unknown
Adversarial Defenses: No discussion of possible defenses (e.g., noise addition, differential privacy)
LLM Applicability: Primarily for image classification, unlearning in large language models untested

Future Directions

More Efficient Attacks: Reduce shadow model count and query count
Defense Mechanisms: Design unlearning algorithms robust to Apollo
Theory Refinement: Relax Lipschitz assumptions, extend to non-local cases
Other Modalities: Extend to object detection, semantic segmentation, etc.
Privacy-Preserving Unlearning: Combine differential privacy with unlearning methods

In-Depth Evaluation

Strengths

Methodological Innovation:

Paradigm Shift: From "comparing before/after unlearning" to "comparing unlearned vs. ideal retrained," better aligns with unlearning definition
Theoretical Depth: First to provide Lipschitz theoretical bounds, formalizes UNDER/OVER phenomena
Strict Threat Model: Label-only + a posteriori is most challenging setting

Experimental Sufficiency:

Diverse Datasets: CIFAR-10/100 (small-scale), ImageNet (large-scale)
Broad Algorithm Coverage: 6 representative unlearning algorithms + retraining baseline
Thorough Ablations: Hyperparameters, architecture transfer, UNDER/OVER dynamics
Clear Visualizations: 2D examples intuitively demonstrate core ideas

Result Convincingness:

Comprehensive Comparisons: Compared with U-MIA, U-LiRA, highlighting advantages
Statistical Significance: 200 samples × multiple experiments, results reliable
Theory-Practice Alignment: Experimental observations consistent with theoretical predictions (Figure 5)

Writing Quality:

Clear Structure: Motivation → Theory → Method → Experiments, logically rigorous
Rigorous Terminology: Formal definitions (Def. 1-3), complete theorem proofs
High Reproducibility: Open-source code, detailed hyperparameters (Table VII)

Weaknesses

Methodological Limitations:

Strong Lipschitz Assumption: Inapplicable to all models and unlearning algorithms (e.g., BT failure)
Locality Constraint: Fixed search radius $T\cdot\epsilon$ may miss distant artifacts
Binary Classification Simplification: Ignores $D_r$ membership, actually a three-class problem

Experimental Defects:

Single Unlearning Ratio: Only 10% tested, 1% or 50% ratios unknown
Small Sample Size: 200+200 samples may insufficient for tail risk assessment
Missing Defense Experiments: No testing of noise addition, differential privacy defenses
Limited Architecture: Primarily ResNet-18, insufficient Transformer testing

Insufficient Analysis:

Shallow Failure Explanation: "Violates Lipschitz" lacks deep analysis
Algorithm Difference Unexplained: Why is BT vulnerable to U-MIA but robust to Apollo?
Missing Practicality Discussion: Real MLaaS scenario feasibility (e.g., query limits)

Ethical Considerations:

Double-Edged Nature: Attack method could be maliciously used
Insufficient Defense Suggestions: Only emphasizes "need more caution," lacks specific solutions

Impact

Contribution to Field:

Breaks Assumptions: Proves no need for original model to attack, pushes stricter privacy definitions
Theoretical Tools: Lipschitz bounds applicable to analyzing other unlearning methods
Evaluation Benchmark: Apollo serves as privacy audit tool for unlearning algorithms

Practical Value:

Audit Tool: Helps assess privacy leakage risks of unlearning algorithms
Design Guidance: UNDER/OVER phenomena suggest algorithm improvement directions
Regulatory Reference: Provides technical implementation basis for GDPR-like regulations

Reproducibility:

✅ Open-source code: https://github.com/LiouTang/Unlearn-Apollo-Attack
✅ Detailed hyperparameters: Table VII complete
✅ Public datasets: CIFAR, ImageNet available
⚠️ Resource requirement: A100 GPU needed, may limit reproduction

Potential Impact:

Short-term: Drives improvement of unlearning algorithms (e.g., further optimization of SalUn, SFR-on)
Medium-term: May spark research surge in privacy-preserving unlearning (e.g., DP-Unlearning)
Long-term: Influences technical standard-setting for privacy regulations

Applicable Scenarios

Suitable Applications:

Privacy Audit: Assess privacy guarantees of unlearning services
Algorithm Testing: Robustness testing for new unlearning methods
Regulatory Compliance: Verify GDPR requirement satisfaction

Unsuitable Applications:

LLM Unlearning: Label definition unclear for text generation
Small-Sample Scenarios: Shadow model training requires large data
Real-Time Systems: Adversarial example generation time-consuming (50 SGD steps)

Generalization Potential:

Other Tasks: Object detection, semantic segmentation (requires "label" redefinition)
Federated Learning: Distributed unlearning privacy audit
Model Compression: Pruning, distillation membership inference scenarios

Key References

Cao & Yang (2015): First proposed machine unlearning concept
Bourtoule et al. (2021): SISA exact unlearning algorithm
Carlini et al. (2022): LiRA likelihood ratio attack
Choquette-Choo et al. (2021): First label-only MIA
Hayes et al. (2024): U-LiRA attack against unlearning
Huang et al. (2024): SFR-on unified gradient unlearning framework
Fan et al. (2024): SalUn significance-based unlearning

Summary

Apollo is a high-quality machine learning security paper that reveals privacy risks of machine unlearning through the strictest threat model (label-only, a posteriori). Its core contributions are:

Theoretical Innovation: Formalizes UNDER/OVER-UNLEARNING, provides Lipschitz bounds
Practical Method: Online/offline versions balance effectiveness and cost
Solid Experiments: Multi-dataset, multi-algorithm, thorough ablations, credible conclusions

Despite limitations like strong Lipschitz assumptions and small sample sizes, the paper directly challenges unlearning's effectiveness as privacy tool, providing important warning to the field. Recommended future work:

Explore attacks in non-Lipschitz scenarios
Design unlearning algorithms robust to Apollo
Extend to LLMs and other modalities

Recommendation Score: ⭐⭐⭐⭐☆ (4.5/5)

Innovation: 5/5
Rigor: 4/5
Practicality: 4/5
Readability: 5/5