Machine Unlearning (MU) aims to update Machine Learning (ML) models following requests to remove training samples and their influences on a trained model efficiently without retraining the original ML model from scratch. While MU itself has been employed to provide privacy protection and regulatory compliance, it can also increase the attack surface of the model. Existing privacy inference attacks towards MU that aim to infer properties of the unlearned set rely on the weaker threat model that assumes the attacker has access to both the unlearned model and the original model, limiting their feasibility toward real-life scenarios. We propose a novel privacy attack, A Posteriori Label-Only Membership Inference Attack towards MU, Apollo, that infers whether a data sample has been unlearned, following a strict threat model where an adversary has access to the label-output of the unlearned model only. We demonstrate that our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.
Paper ID : 2506.09923Title : Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine UnlearningAuthors : Liou Tang, James Joshi (University of Pittsburgh), Ashish Kundu (Cisco Research)Classification : cs.LG (Machine Learning)Publication Date : October 27, 2025 (arXiv v2)Paper Link : https://arxiv.org/abs/2506.09923v2 Code Link : https://github.com/LiouTang/Unlearn-Apollo-Attack Machine Unlearning (MU) aims to efficiently remove training samples and their influence from trained models without retraining from scratch. While MU itself is employed to provide privacy protection and regulatory compliance, it may also expand the attack surface of models. Existing privacy inference attacks against MU assume attackers can access models before and after unlearning, which limits their feasibility in real-world scenarios. This paper proposes a novel privacy attack—Apollo (A Posteriori Label-Only Membership Inference Attack)—which infers whether data samples have been unlearned by accessing only the label outputs of the post-unlearning model. Experiments demonstrate that despite requiring less model access than existing methods, Apollo achieves relatively high accuracy in inferring the membership status of unlearned samples.
Core Question : Does machine unlearning, as a privacy protection technique, itself leak privacy information? Specifically, can attackers infer which data has been unlearned by only accessing the post-unlearning model?
Regulatory Compliance : Regulations such as GDPR and CCPA grant users the "right to be forgotten," requiring ML models to remove user dataPrivacy Paradox : Machine unlearning itself is a privacy protection mechanism, but the unlearning process may introduce new privacy risksPractical Threats : In MLaaS scenarios, users typically cannot access the original model, making existing attack methods inapplicableExisting membership inference attacks (MIA) against MU suffer from:
Requires Access to Original Model : Most attacks (e.g., Chen et al., Gao et al.) require simultaneous access to models before and after unlearningRequires Posterior Probabilities : Many methods rely on probability distributions from model outputsUnrealistic Threat Model : In real MLaaS scenarios, clients typically cannot obtain the original modelThis paper proposes the strictest threat model: attackers can only access label outputs of the post-unlearning model (label-only, a posteriori), which better reflects real-world scenarios. The core insight is that approximate unlearning algorithms produce two types of artifacts in decision space—UNDER-UNLEARNING and OVER-UNLEARNING —which can be exploited to infer membership status.
Proposes Apollo Attack : The first post-hoc (a posteriori) membership inference attack requiring only black-box, label-only access with the strictest threat modelFormalizes Unlearning Artifacts : Identifies and formally defines UNDER-UNLEARNING and OVER-UNLEARNING phenomena, providing theoretical boundary proofs (Theorems III.3 and III.4)Comprehensive Experimental Validation : Validates across multiple datasets (CIFAR-10/100, ImageNet) and 6 unlearning algorithms, demonstrating high-precision inference even under strict threat modelsReveals Privacy Threats : Directly contradicts privacy claims of existing unlearning methods, emphasizing the need for more cautious privacy-preserving unlearning approachesInput :
Post-unlearning model θ u = A [ D , D u , A ( D ) ] \theta_u = \mathcal{A}[D, D_u, \mathcal{A}(D)] θ u = A [ D , D u , A ( D )] (label-only access) Target sample ( x , y ) (x, y) ( x , y ) Proxy dataset D ′ D' D ′ sampled from the same distribution Output : Binary decision b ^ ∈ { 0 , 1 } \hat{b} \in \{0,1\} b ^ ∈ { 0 , 1 } , determining whether x ∈ D u x \in D_u x ∈ D u (unlearned) or x ∉ D x \notin D x ∈ / D (not in training)
Constraints :
Cannot access the original model θ \theta θ Cannot access model posterior probabilities, only y ^ = arg max f θ u ( x ) \hat{y} = \arg\max f_{\theta_u}(x) y ^ = arg max f θ u ( x ) Assumes the unlearning algorithm is approximate unlearning Learning causes over-learning: for training samples ( x , y ) ∈ D (x,y) \in D ( x , y ) ∈ D , there exists x ′ ≈ x x' \approx x x ′ ≈ x such that:
f θ ( x ′ ) = y (when x ∈ D ) , f θ ( x ′ ) ≠ y (when x ∉ D ) f_\theta(x') = y \text{ (when } x \in D), \quad f_\theta(x') \neq y \text{ (when } x \notin D) f θ ( x ′ ) = y (when x ∈ D ) , f θ ( x ′ ) = y (when x ∈ / D )
Approximate unlearning retains partial information. For unlearned samples ( x , y ) ∈ D u (x,y) \in D_u ( x , y ) ∈ D u , there exists x ′ ≈ x x' \approx x x ′ ≈ x such that:
f θ ( x ′ ) = y f_\theta(x') = y f θ ( x ′ ) = y (original model has learned)f θ r ( x ′ ) ≠ y f_{\theta_r}(x') \neq y f θ r ( x ′ ) = y (exact unlearning/retraining does not retain)f θ u ( x ′ ) = y f_{\theta_u}(x') = y f θ u ( x ′ ) = y (approximate unlearning still retains, under-unlearning )Intuitive Explanation : Decision boundary has not moved sufficiently; unlearning is incomplete (red region in Figure 2b)
Approximate unlearning causes performance degradation. For unlearned samples ( x , y ) ∈ D u (x,y) \in D_u ( x , y ) ∈ D u , there exists x ′ ≈ x x' \approx x x ′ ≈ x such that:
f θ ( x ′ ) = y f_\theta(x') = y f θ ( x ′ ) = y (original model has learned)f θ r ( x ′ ) = y f_{\theta_r}(x') = y f θ r ( x ′ ) = y (exact unlearning retains)f θ u ( x ′ ) ≠ y f_{\theta_u}(x') \neq y f θ u ( x ′ ) = y (approximate unlearning changes, over-unlearning )Intuitive Explanation : Decision boundary is over-adjusted, affecting retained set performance (green region in Figure 2c)
Define margin m θ ( x ) : = f θ ( x ) y − max j ≠ y f θ ( x ) j m_\theta(x) := f_\theta(x)_y - \max_{j\neq y} f_\theta(x)_j m θ ( x ) := f θ ( x ) y − max j = y f θ ( x ) j . Under standard Lipschitz conditions:
∣ m θ ( x ) − m θ ′ ( x ′ ) ∣ ≤ L x ∥ x − x ′ ∥ + L θ ∥ θ − θ ′ ∥ |m_\theta(x) - m_{\theta'}(x')| \leq L_x\|x-x'\| + L_\theta\|\theta-\theta'\| ∣ m θ ( x ) − m θ ′ ( x ′ ) ∣ ≤ L x ∥ x − x ′ ∥ + L θ ∥ θ − θ ′ ∥
For x ′ x' x ′ satisfying UNDER-UNLEARNING, perturbation radius r = ∥ x − x ′ ∥ r = \|x-x'\| r = ∥ x − x ′ ∥ satisfies:
( m θ ( x ) − L θ Δ r L x ) + ⏟ = : L U n ≤ r < m θ ( x ) − L θ Δ u L x ⏟ = : U U n \underbrace{\left(\frac{m_\theta(x) - L_\theta\Delta_r}{L_x}\right)_+}_{=: L_{Un}} \leq r < \underbrace{\frac{m_\theta(x) - L_\theta\Delta_u}{L_x}}_{=: U_{Un}} =: L U n ( L x m θ ( x ) − L θ Δ r ) + ≤ r < =: U U n L x m θ ( x ) − L θ Δ u
where Δ u = ∥ θ u − θ ∥ \Delta_u = \|\theta_u - \theta\| Δ u = ∥ θ u − θ ∥ , Δ r = ∥ θ r − θ ∥ \Delta_r = \|\theta_r - \theta\| Δ r = ∥ θ r − θ ∥
Similarly, the OVER-UNLEARNING bound is:
( m θ ( x ) − L θ Δ u L x ) + ⏟ = : L O v ≤ r < m θ ( x ) − L θ Δ r L x ⏟ = : U O v \underbrace{\left(\frac{m_\theta(x) - L_\theta\Delta_u}{L_x}\right)_+}_{=: L_{Ov}} \leq r < \underbrace{\frac{m_\theta(x) - L_\theta\Delta_r}{L_x}}_{=: U_{Ov}} =: L O v ( L x m θ ( x ) − L θ Δ u ) + ≤ r < =: U O v L x m θ ( x ) − L θ Δ r
Significance : Provides theoretically feasible search space, guiding adversarial example generation
Train Shadow Models : Train m m m shadow models Θ s = { θ i s } \Theta^s = \{\theta^s_i\} Θ s = { θ i s } , each on dataset D i s D^s_i D i s Unlearn Shadow Models : For each θ i s \theta^s_i θ i s , unlearn target sample x x x , obtaining θ i s u \theta^{su}_i θ i s u Generate Adversarial Examples : Optimize x ′ x' x ′ to satisfy sensitivity and specificity conditionsUNDER-UNLEARNING Loss Function :
ℓ U n ( x ′ ; x , y , Θ ) = α ∑ x ∈ D i s ℓ ( x ′ ; θ i s u ) + β ∑ x ∉ D i s ℓ ^ ( x ′ ; θ i s ) \ell_{Un}(x'; x,y,\Theta) = \alpha \sum_{x \in D^s_i} \ell(x'; \theta^{su}_i) + \beta \sum_{x \notin D^s_i} \hat{\ell}(x'; \theta^s_i) ℓ U n ( x ′ ; x , y , Θ ) = α ∑ x ∈ D i s ℓ ( x ′ ; θ i s u ) + β ∑ x ∈ / D i s ℓ ^ ( x ′ ; θ i s )
where:
First term (sensitivity): x ′ x' x ′ should predict class y y y on post-unlearning model Second term (specificity): x ′ x' x ′ should not predict y y y on models not trained on x x x ℓ ^ = − ℓ \hat{\ell} = -\ell ℓ ^ = − ℓ (negative cross-entropy)OVER-UNLEARNING Loss Function :
ℓ O v ( x ′ ; x , y , Θ ) = α ∑ x ∈ D i s ℓ ^ ( x ′ ; θ i s u ) + β ∑ x ∉ D i s ℓ ( x ′ ; θ i s ) \ell_{Ov}(x'; x,y,\Theta) = \alpha \sum_{x \in D^s_i} \hat{\ell}(x'; \theta^{su}_i) + \beta \sum_{x \notin D^s_i} \ell(x'; \theta^s_i) ℓ O v ( x ′ ; x , y , Θ ) = α ∑ x ∈ D i s ℓ ^ ( x ′ ; θ i s u ) + β ∑ x ∈ / D i s ℓ ( x ′ ; θ i s )
To reduce computational cost, replace sensitivity condition with decision boundary distance:
ℓ U n o f f ( x ′ ; x , y , Θ ) = α ∑ i d ( x ′ , D B ) + β ∑ i ℓ ^ ( x ′ ; θ i s ) \ell^{off}_{Un}(x'; x,y,\Theta) = \alpha \sum_i d(x', DB) + \beta \sum_i \hat{\ell}(x'; \theta^s_i) ℓ U n o ff ( x ′ ; x , y , Θ ) = α ∑ i d ( x ′ , D B ) + β ∑ i ℓ ^ ( x ′ ; θ i s )
Input: Target model θ_u, target sample (x,y), shadow models Θ^s, step size ε
Output: Adversarial example x'
x' ← x
for t = 1 to T:
Compute gradient g_{t,i} ← ∇_{x'} ℓ(x'; x,y,Θ)
x' ← SGD(x', average gradient)
Project to annulus B_{tε}(x) \ B_{(t-1)ε}(x) // locality constraint
if average confidence < τ:
early stop
return x'
Key Design :
Progressively expand search radius (from ( t − 1 ) ϵ (t-1)\epsilon ( t − 1 ) ϵ to t ϵ t\epsilon t ϵ ) Projection ensures locality (total perturbation ≤ T ⋅ ϵ \leq T\cdot\epsilon ≤ T ⋅ ϵ ) Early stopping mechanism improves efficiency Paradigm Shift : From comparing models before/after unlearning → comparing unlearned model with ideal retrained modelTheoretical Support : First to provide Lipschitz theoretical bounds for unlearning attacksStrong Practicality : Offline version avoids re-unlearning shadow models for each target sampleGood Adaptability : Leverages both UNDER and OVER phenomena simultaneously, improving robustnessDataset Training Size Test Size Classes Unlearning Ratio CIFAR-10 20,000 10,000 10 10% CIFAR-100 20,000 10,000 100 10% ImageNet 512,466 256,235 1,000 10%
Data Partitioning Strategy :
Slice (a): Training set D D D Slice (b): Shadow datasets (offline) Slice (c): Test set D t D_t D t Online attack: Shadow set sampled from (a)+(b); Offline attack: Only from (b) ResNet-18 : Primary experimental modelVGG-16 : Ablation studiesSwin Transformer : Transfer learning testsTraining Configuration :
Optimizer: AdamW Learning rate: 1 × 10 − 4 1 \times 10^{-4} 1 × 1 0 − 4 Batch size: 64 Epochs: 100 (target model), 50 (shadow models) Accuracy requirement: ≥75% on D t D_t D t Test 6 representative algorithms + retraining baseline:
Algorithm Type Core Idea GA 45 Baseline Gradient ascent, focuses on D u D_u D u FT 18 Baseline Fine-tuning, focuses on D r D_r D r BT 54 Knowledge Distillation Guide unlearning using "bad teacher" SCRUB 10 Posterior Divergence Maximize divergence between pre/post models SalUn 55 SOTA Significance-based parameter selection SFR-on 53 SOTA Retained set geometry preservation RT Exact Unlearning Retrain from scratch (theoretically immune)
Primary Metric : TPR @ low FPR (True Positive Rate at low False Positive Rate)
Rationale: High precision is more valuable for privacy attacks Reporting: TPR @ lowest achievable FPR for each algorithm Auxiliary Metrics : Precision, Recall, ROC curves
U-MIA 10 : Naive method using SVM classifier (RBF kernel, C=3)U-LiRA 11 : Likelihood ratio-based attack using logit-transformed posterior probabilitiesNote : Chen et al., Gao et al., Lu et al. excluded as they require access to original model
Apollo Hyperparameters :
Number of shadow models: m = 32 m = 32 m = 32 Search step size: ϵ = 1.0 \epsilon = 1.0 ϵ = 1.0 Search rounds: T = 50 T = 50 T = 50 Loss weights: α = 1 , β = 4 \alpha = 1, \beta = 4 α = 1 , β = 4 (emphasize specificity) Target samples: 200 (unlearned) + 200 (test) Hardware : NVIDIA A100 (40GB), ~20 minutes training time per model
Method GA FT BT SCRUB SalUn SFR-on RT U-MIA 16.5@6.0% 11.5@9.5% 95.0@2.5% 9.0@4.0% 15.5@4.5% 3.0@2.5% 5.5@4.5% U-LiRA 68.5@6.0% 6.5@9.5% 28.0@2.5% 6.0@4.0% 20.0@4.5% 2.5@2.5% 4.0@4.5% Apollo 18.0@6.0% 6.5@9.5% 4.0@2.5% 21.5@4.0% 4.5@4.5% 10.0@2.5% 5.0@4.5% Apollo (Off) 16.0@6.0% 6.5@9.5% 3.0@2.5% 15.0@4.0% 7.5@4.5% 5.0@2.5% 7.0@4.5%
Key Findings :
GA Most Vulnerable : U-LiRA achieves 68.5% TPR, Apollo achieves 18%SCRUB Easily Attacked : Apollo outperforms U-LiRA (21.5% vs 6.0%)SFR-on Performance : Apollo achieves 10% TPR, U-LiRA only 2.5%RT Basically Safe : All attacks TPR ≤ 7%, close to random guessingMethod GA FT BT SCRUB SalUn SFR-on RT U-MIA 7.5@0.5% 0.5@1.0% 48.5@13.5% 17.0@5.0% 8.5@1.5% 2.0@1.5% 1.0@1.0% U-LiRA 14.5@0.5% 1.0@1.0% 25.0@13.5% 12.5@5.0% 17.0@1.5% 2.0@1.5% 1.5@1.0% Apollo 15.5@0.5% 2.0@1.0% 50.0@13.5% 41.5@5.0% 5.0@1.5% 0.5@1.5% 1.5@1.0% Apollo (Off) 13.0@0.5% 2.0@1.0% 41.5@13.5% 39.0@5.0% 4.5@1.5% 1.0@1.5% 0.5@1.0%
Key Findings :
Performance Improvement : Apollo performs better on CIFAR-100 (more classes, fewer samples per class)SCRUB Major Weakness : Apollo achieves 41.5%, far exceeding U-LiRA's 12.5%BT Consistently Vulnerable : Apollo achieves 50% TPRTrends similar to CIFAR-100, with Apollo excelling on GA and SCRUB
GA (4a) : U-LiRA strongest, Apollo second, overall high AUC
FT (4b) : All attacks ineffective, Apollo slightly better
BT (4c) : U-MIA strongest (95% TPR), Apollo weaker
SCRUB (4d) : Apollo clearly superior to U-LiRA
SalUn (4e) : U-LiRA slightly better
SFR-on (4f) : Apollo shows clear advantage in low FPR region
RT (4g) : All attacks near random line
Heatmaps showing TPR under different search radii for both phenomena:
Success Cases (GA, SFR-on) :
Clear boundary effects: low TPR in regions near axes Validates Theorems III.3 and III.4 UNDER and OVER effective in different radius ranges Failure Cases (BT, SalUn) :
OVER-UNLEARNING nearly uniformly distributed UNDER-UNLEARNING scarce Hypothesis : Algorithm design violates local Lipschitz assumptionLoss Weight β / α \beta/\alpha β / α (6a) :
Higher β / α \beta/\alpha β / α → better precision-recall tradeoff Recommended β / α = 4 \beta/\alpha = 4 β / α = 4 (emphasize specificity) Number of Shadow Models m m m (6b) :
m ≤ 16 m \leq 16 m ≤ 16 : Increasing m m m improves performancem = 32 m = 32 m = 32 : Performance decreases (overfitting to specific shadow models)Consistent with Wen et al. 36 observations Target Model Shadow Model TPR@FPR ResNet-18 ResNet-18 18.0@6.0% ResNet-18 VGG-16 12.0@6.0% ResNet-18 Swin-T 13.5@6.0% VGG-16 VGG-16 5.5@2.5% Swin-T Swin-T 11.5@4.5%
Conclusion : Architecture mismatch reduces performance but maintains high accuracy
Experimental Setup :
Data: R 2 × { 0 , 1 , 2 , 3 } \mathbb{R}^2 \times \{0,1,2,3\} R 2 × { 0 , 1 , 2 , 3 } , 500 samples Model: 12-layer small NN (Table VI) Unlearning: 10% training set using GA Observations (3a) :
Red region: UNDER-UNLEARNING (θ u \theta_u θ u predicts same as θ \theta θ , different from θ r \theta_r θ r ) Green region: OVER-UNLEARNING (θ u \theta_u θ u predicts different from θ r \theta_r θ r , same as θ \theta θ ) Both phenomena present simultaneously Adversarial Example Trajectory (3c) :
Starts from unlearned sample Progressively moves to UNDER-UNLEARNING region Validates Algorithm 1 effectiveness Massive Differences Between Algorithms :GA, SCRUB, SFR-on vulnerable to attack BT vulnerable to U-MIA, robust to Apollo SalUn relatively safe overall Dataset Complexity Impact :Attacks more effective on CIFAR-100 and ImageNet (more classes, fewer samples) Decision boundaries more sensitive Theory-Practice Consistency :Successful attacks show clear boundary effects Failed cases possibly violate Lipschitz assumption Offline Attack Feasibility :Slightly lower performance than online version Significantly reduces computational cost Threat Ubiquity :Even under strictest threat model, most algorithms remain attackable Retraining (RT) basically safe but not scalable Exact Unlearning :
Bourtoule et al. 2 SISA: Partition training, retrain only affected sub-models Yan et al. 20 : Partition by class Approximate Unlearning (focus of this paper):
Baselines: GA 45 (gradient ascent), FT 18 (fine-tuning) Knowledge Distillation: BT 54 Posterior Divergence: SCRUB 10 Significance Methods: SalUn 55 , SFR-on 53 Classical MIA :
Shokri et al. 27 : Shadow model training attack classifier Yeom et al. 28 : Exploit overfitting-induced member advantage Carlini et al. 29 : Likelihood ratio-based LiRA attack Label-Only Attacks :
Choquette-Choo et al. 32 : First label-only MIA Peng et al. 33 OSLO: Measure confidence via adversarial perturbation Wu et al. 34 YOQO: Reduce query count Attack Access θ \theta θ Access θ u \theta_u θ u Posterior Chen et al. 7 ✓ ✓ ✓ Gao et al. 8 ✓ ✓ ✓ Lu et al. 9 ✓ ✓ ✗ U-MIA 10 ✗ ✓ ✓ U-LiRA 11 ✗ ✓ ✓ Apollo ✗ ✓ ✗
Paper's Advantage : Strictest threat model, no need for original model or posterior probabilities
Privacy Threat is Real : Even under strictest threat model (label-only access, no original model), attackers can infer unlearned samples with high accuracySolid Theoretical Foundation : UNDER-UNLEARNING and OVER-UNLEARNING have clear theoretical bounds (under Lipschitz conditions)Strong Practicality :Online version: TPR up to 68.5% (GA on CIFAR-10) Offline version: Slightly lower performance, significantly reduced computational cost Significant Algorithm Differences : Vulnerability of different unlearning algorithms varies dramatically, requiring targeted defensesChallenges Existing Claims : Directly contradicts privacy protection claims of most unlearning methodsAuthor-Acknowledged Limitations :
FPR Adjustment Difficulty : Adjusting FPR via hyperparameters (T , ϵ , τ T, \epsilon, \tau T , ϵ , τ ) less flexible than likelihood methodsComputational Cost : Requires training multiple shadow models (offline version mitigates this)Theoretical Assumptions : Local Lipschitz conditions not always satisfied (e.g., BT, SalUn cases)Potential Unmentioned Issues :
Sample Selection Bias : Only 200 samples tested, may not represent overall distributionFixed Unlearning Ratio : Only 10% unlearning tested, other ratios unknownAdversarial Defenses : No discussion of possible defenses (e.g., noise addition, differential privacy)LLM Applicability : Primarily for image classification, unlearning in large language models untestedMore Efficient Attacks : Reduce shadow model count and query countDefense Mechanisms : Design unlearning algorithms robust to ApolloTheory Refinement : Relax Lipschitz assumptions, extend to non-local casesOther Modalities : Extend to object detection, semantic segmentation, etc.Privacy-Preserving Unlearning : Combine differential privacy with unlearning methodsMethodological Innovation :
Paradigm Shift : From "comparing before/after unlearning" to "comparing unlearned vs. ideal retrained," better aligns with unlearning definitionTheoretical Depth : First to provide Lipschitz theoretical bounds, formalizes UNDER/OVER phenomenaStrict Threat Model : Label-only + a posteriori is most challenging settingExperimental Sufficiency :
Diverse Datasets : CIFAR-10/100 (small-scale), ImageNet (large-scale)Broad Algorithm Coverage : 6 representative unlearning algorithms + retraining baselineThorough Ablations : Hyperparameters, architecture transfer, UNDER/OVER dynamicsClear Visualizations : 2D examples intuitively demonstrate core ideasResult Convincingness :
Comprehensive Comparisons : Compared with U-MIA, U-LiRA, highlighting advantagesStatistical Significance : 200 samples × multiple experiments, results reliableTheory-Practice Alignment : Experimental observations consistent with theoretical predictions (Figure 5)Writing Quality :
Clear Structure : Motivation → Theory → Method → Experiments, logically rigorousRigorous Terminology : Formal definitions (Def. 1-3), complete theorem proofsHigh Reproducibility : Open-source code, detailed hyperparameters (Table VII)Methodological Limitations :
Strong Lipschitz Assumption : Inapplicable to all models and unlearning algorithms (e.g., BT failure)Locality Constraint : Fixed search radius T ⋅ ϵ T\cdot\epsilon T ⋅ ϵ may miss distant artifactsBinary Classification Simplification : Ignores D r D_r D r membership, actually a three-class problemExperimental Defects :
Single Unlearning Ratio : Only 10% tested, 1% or 50% ratios unknownSmall Sample Size : 200+200 samples may insufficient for tail risk assessmentMissing Defense Experiments : No testing of noise addition, differential privacy defensesLimited Architecture : Primarily ResNet-18, insufficient Transformer testingInsufficient Analysis :
Shallow Failure Explanation : "Violates Lipschitz" lacks deep analysisAlgorithm Difference Unexplained : Why is BT vulnerable to U-MIA but robust to Apollo?Missing Practicality Discussion : Real MLaaS scenario feasibility (e.g., query limits)Ethical Considerations :
Double-Edged Nature : Attack method could be maliciously usedInsufficient Defense Suggestions : Only emphasizes "need more caution," lacks specific solutionsContribution to Field :
Breaks Assumptions : Proves no need for original model to attack, pushes stricter privacy definitionsTheoretical Tools : Lipschitz bounds applicable to analyzing other unlearning methodsEvaluation Benchmark : Apollo serves as privacy audit tool for unlearning algorithmsPractical Value :
Audit Tool : Helps assess privacy leakage risks of unlearning algorithmsDesign Guidance : UNDER/OVER phenomena suggest algorithm improvement directionsRegulatory Reference : Provides technical implementation basis for GDPR-like regulationsReproducibility :
✅ Open-source code: https://github.com/LiouTang/Unlearn-Apollo-Attack ✅ Detailed hyperparameters: Table VII complete ✅ Public datasets: CIFAR, ImageNet available ⚠️ Resource requirement: A100 GPU needed, may limit reproduction Potential Impact :
Short-term : Drives improvement of unlearning algorithms (e.g., further optimization of SalUn, SFR-on)Medium-term : May spark research surge in privacy-preserving unlearning (e.g., DP-Unlearning)Long-term : Influences technical standard-setting for privacy regulationsSuitable Applications :
Privacy Audit : Assess privacy guarantees of unlearning servicesAlgorithm Testing : Robustness testing for new unlearning methodsRegulatory Compliance : Verify GDPR requirement satisfactionUnsuitable Applications :
LLM Unlearning : Label definition unclear for text generationSmall-Sample Scenarios : Shadow model training requires large dataReal-Time Systems : Adversarial example generation time-consuming (50 SGD steps)Generalization Potential :
Other Tasks : Object detection, semantic segmentation (requires "label" redefinition)Federated Learning : Distributed unlearning privacy auditModel Compression : Pruning, distillation membership inference scenariosCao & Yang (2015) : First proposed machine unlearning conceptBourtoule et al. (2021) : SISA exact unlearning algorithmCarlini et al. (2022) : LiRA likelihood ratio attackChoquette-Choo et al. (2021) : First label-only MIAHayes et al. (2024) : U-LiRA attack against unlearningHuang et al. (2024) : SFR-on unified gradient unlearning frameworkFan et al. (2024) : SalUn significance-based unlearningApollo is a high-quality machine learning security paper that reveals privacy risks of machine unlearning through the strictest threat model (label-only, a posteriori). Its core contributions are:
Theoretical Innovation : Formalizes UNDER/OVER-UNLEARNING, provides Lipschitz boundsPractical Method : Online/offline versions balance effectiveness and costSolid Experiments : Multi-dataset, multi-algorithm, thorough ablations, credible conclusionsDespite limitations like strong Lipschitz assumptions and small sample sizes, the paper directly challenges unlearning's effectiveness as privacy tool , providing important warning to the field. Recommended future work:
Explore attacks in non-Lipschitz scenarios Design unlearning algorithms robust to Apollo Extend to LLMs and other modalities Recommendation Score : ⭐⭐⭐⭐☆ (4.5/5)
Innovation: 5/5 Rigor: 4/5 Practicality: 4/5 Readability: 5/5