2025-11-18T11:58:13.432393

CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment

Li, Luo, Zhang et al.

Proprietary large language models (LLMs) exhibit strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy reasons. However, deploying proprietary LLMs at the edge without adequate protection introduces critical security threats. Attackers can extract model weights and architectures, enabling unauthorized copying and misuse. Even when protective measures prevent full extraction of model weights, attackers may still perform advanced attacks, such as fine-tuning, to further exploit the model. Existing defenses against these threats typically incur significant computational and communication overhead, making them impractical for edge deployment. To safeguard the edge-deployed LLMs, we introduce CoreGuard, a computation- and communication-efficient protection method. CoreGuard employs an efficient protection protocol to reduce computational overhead and minimize communication overhead via a propagation protocol. Extensive experiments show that CoreGuard achieves upper-bound security protection with negligible overhead.

academic

CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment

Basic Information

Paper ID: 2410.13903
Title: CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment
Authors: Qinfeng Li, Tianyue Luo, Xuhong Zhang, Yangfan Xie, Zhiqiang Shen, Lijun Zhang, Yier Jin, Hao Peng, Xinkui Zhao, Xianwei Zhu, Jianwei Yin
Classification: cs.CR (Cryptography and Security), cs.AI (Artificial Intelligence), cs.DC (Distributed Computing)
Publication Venue/Conference: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Paper Link: https://arxiv.org/abs/2410.13903

Abstract

Proprietary large language models (LLMs) demonstrate strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy considerations. However, deploying proprietary LLMs on edge devices without adequate protection poses serious security threats. Attackers can extract model weights and architecture, enabling unauthorized replication and misuse. Even if protective measures prevent complete model weight extraction, attackers can still execute advanced attacks (such as fine-tuning) to further exploit the model. Existing defense solutions typically incur significant computational and communication overhead, making them impractical for edge deployment. To protect LLMs deployed on edge devices, this paper proposes CoreGuard, a computationally and communicationally efficient protection method. CoreGuard employs efficient protection protocols to reduce computational overhead and minimizes communication overhead through propagation protocols. Extensive experiments demonstrate that CoreGuard achieves upper-bound security protection with negligible overhead.

Research Background and Motivation

Problem Definition

Core Problem: Proprietary LLMs deployed on edge devices face model stealing threats, where attackers can extract model architecture and weights through software analysis techniques, leading to unauthorized replication and misuse.
Problem Significance:
- Proprietary LLMs (such as ChatGPT, Claude) possess strong generalization capabilities with enormous development costs
- Clear trend toward edge deployment (e.g., Apple Intelligence integrating 3B-parameter LLMs into iOS devices)
- Domain-specific proprietary LLMs (such as BloombergGPT in finance, Med-PaLM 2 in healthcare) lack open-source alternatives
Limitations of Existing Methods:
- Passive Protection (e.g., watermarking): Only provides ownership proof, cannot prevent misuse in unsupervised edge environments
- Model Encryption: Remains vulnerable during runtime
- Direct TEE Protection: Placing entire models in trusted execution environments results in approximately 50× efficiency reduction
- Partial Parameter TEE Execution (PPTE): Protects limited number of weights, susceptible to reconstruction
- Parameter Shuffling Protection (PSP): Methods like ShadowNet incur excessive data transfer overhead
Research Motivation: Need for solutions that ensure adequate security while maintaining acceptable computational and communication overhead.

Core Contributions

First systematic protection of foundational capabilities of edge-deployed LLMs: Systematically characterizes security challenges in this scenario and identifies requirements for protecting edge-deployed LLMs.
Proposes CoreGuard plug-and-play solution: Leverages lightweight authorization mechanisms to protect edge-deployed LLMs, employs propagation protocols to significantly reduce transmission overhead while maintaining low computational overhead.
Comprehensive experimental validation: Compared to existing solutions, CoreGuard provides higher security guarantees, lower overhead, and no accuracy loss.

Method Details

Task Definition

Input: Trained LLM model Output: Locked model that functions normally only with proper authorization through trusted hardware (TEE) within the device Constraints: Minimize computational and communication overhead while maintaining model accuracy

Model Architecture

CoreGuard operates in two stages:

1. Model Locking Stage (Pre-deployment)

Protection Protocol:

Perform row permutation on weight matrices of linear layers: $W'_q = \pi^T W_q, W'_k = \pi^T W_k, W'_v = \pi^T W_v, W'_m = \pi^T W_m$
These row permutations act as "locks," disabling linear layers such that normal computation only occurs with corresponding column permutation inputs (authorization)
Permutation matrix $\pi \in \{0,1\}^{d \times d}$ satisfies $\pi\pi^T = I$

Propagation Protocol:

Perform column permutation on output processing layers: $W'_o = W_o\pi, W'_n = W_n\pi$
Achieve automatic authorization through column permutation of features via network operations
TEE only needs to manage initial authorization, which propagates to all subsequent layers

2. Inference Authorization Stage (Post-deployment)

Encryption Process: $m' = m\pi + p\pi$ where $p$ is one-time pad (OTP) noise and $m'$ is the encrypted permuted feature.

Output Linear Layer Processing: $n' = m'W'_n = (m\pi + p\pi)\pi^T W_n + b_n = n + pW_n$

Decryption and Authorization: $n'' = n' - pW_n = n$ $z' = (\gamma_2 \odot \frac{n + y - \mu_{y+n}}{\sigma_{y+n}} + \beta_2)\pi = z\pi$

Technical Innovations

Single Authorization Propagation Mechanism: Through clever permutation design, achieves automatic propagation of authorization throughout the network, avoiding the need for TEE authorization at each layer.
OTP Encryption Combined with Position Obfuscation: Uses one-time pad encryption combined with permutation to hide encryption and decryption processes.
Optimal Communication Complexity: Requires only 5 rounds of TEE-GPU transfer per inference, achieving theoretical optimality.
Mathematical Security Guarantees: Provides security proofs based on the NP-hardness assumption of the Learning With Errors (LWE) problem.

Experimental Setup

Datasets

GSM8k: Mathematical reasoning tasks
Spider: Code generation tasks
PubMedQA: Medical question-answering tasks
SQuAD: Reading comprehension tasks

Models

Edge Deployment Models: Qwen2-0.5B-Instruct, Gemma2-2B-it
Large Models: ChatGLM3-6B-32k, LLaMA3-8B-Instruct

Evaluation Metrics

Security: Accuracy of model stealing attacks (lower is safer)
Efficiency: Floating-point operations (FLOPs), TEE-GPU transfer overhead
Accuracy: Task-specific accuracy

Comparison Methods

TPTE: NPLO
PPTE: DarkneTZ, SOTER, Serdab, DTE
PSP: ShadowNet, TransLinkGuard (TLG)
Bounds: No-shield (lower bound), Black-box (upper bound)

Implementation Details

Implemented using Hugging Face library
AdamW optimizer with linear learning rate scheduling
Experiments conducted on NVIDIA A800 GPU
Assumes attackers possess 100% of training dataset (stricter than 1% in prior work)

Experimental Results

Main Results

Security Evaluation:

Unauthorized inference accuracy: 0% in all cases
Model stealing attacks: CoreGuard relative accuracy of 1.17× (approaching Black-box's 1.00×)
Significantly outperforms TPTE method NPLO (9.59×) and PPTE method DarkneTZ (8.43×)
Comparable performance to other PSP methods (TLG: 1.07×, ShadowNet: 1.09×)

Efficiency Comparison:

TEE Execution Overhead: CoreGuard < 1.17e-03%, PPTE methods 2.91%-21.52%
TEE-GPU Transfer Overhead: CoreGuard requires only 5 rounds of transfer, while ShadowNet requires 448 rounds (LLaMA3-8B)
Transfer Data Volume: CoreGuard approximately 20KB, ShadowNet approximately 1.3GB

Ablation Studies

Security Under Different Attack Settings:

LoRA fine-tuning attacks: CoreGuard maintains security close to upper bound
Different data proportions (1%-100%): Maintains security close to Black-box protection across all settings
Task alignment: Maintains security regardless of whether attacker's target task aligns with deployed model's task

Authorization Position Impact:

Mid-layer authorization provides optimal security
First and last layer authorization shows lower security, as attackers only need to recover limited parameters

Accuracy Preservation

In most cases, protected model accuracy is identical to original model
Minor fluctuations of ±0.5% in individual cases, attributed to floating-point precision limitations

Main Research Directions

Model Protection Methods:
- Watermarking techniques: Passive protection providing only ownership proof
- Model encryption: Vulnerable during runtime
- TEE protection: Direct protection with excessive computational overhead
Parameter Shuffling Protection:
- ShadowNet: Channel shuffling protection for convolutional layers
- TransLinkGuard: Protection for Transformer models
Trusted Execution Environment Applications:
- CPU-based TEE: ARM TrustZone, Intel SGX
- GPU TEE: Still in early stages, primarily targeting data centers

Advantages Over Existing Work

Compared to existing work, CoreGuard achieves orders of magnitude efficiency improvements while maintaining the same security level, particularly in communication overhead.

Conclusions and Discussion

Main Conclusions

CoreGuard successfully addresses the security protection problem for edge-deployed LLMs
Achieves optimal communication complexity through propagation protocols
Provides upper-bound security guarantees while maintaining negligible computational and communication overhead
Preserves original model accuracy

Limitations

Side-Channel Attacks: Relies on TEE as security root, potentially vulnerable to side-channel attacks
GPU TEE Limitations: Currently primarily dependent on CPU-based TEE, GPU TEE remains immature
Practical Deployment: Paper focuses on core framework without deep device-specific implementation details
Architecture Compatibility: Primarily designed for mainstream Transformer architectures

Future Directions

Integrate side-channel attack countermeasures
Adapt to GPU TEE technology development
Extend to additional model architectures
Optimize for actual device deployment

In-Depth Evaluation

Strengths

Strong Innovation: First systematic solution to foundational capability protection for edge-deployed LLMs
Clever Technical Design: Propagation protocol design is elegant, achieving single authorization coverage across entire network
Solid Theoretical Foundation: Provides mathematical security guarantees based on LWE problem
Comprehensive Experiments: Full evaluation across multiple models, tasks, and attack scenarios
High Practical Value: Significant efficiency improvements make it viable for actual deployment

Weaknesses

Security Assumptions: Relies on TEE security, potentially vulnerable to side-channel attacks
Limited Scope: Primarily targets Transformer architecture, limited applicability to other architectures
Deployment Complexity: Actual deployment requires consideration of additional hardware and system-level factors
Long-term Security: Continued effectiveness of current protection measures needs verification as attack techniques evolve

Impact

Academic Contribution: Provides new research directions and solutions for edge AI security
Practical Value: Offers important guidance for commercial LLM edge deployment
Technology Advancement: May promote further development of TEE technology in AI protection domain

Applicable Scenarios

Edge device deployment of proprietary LLMs
AI applications sensitive to latency and privacy
Commercial AI services requiring intellectual property protection
Model protection in resource-constrained environments

References

The paper cites 52 related references covering important work in model protection, trusted execution environments, large language models, and other relevant domains, providing solid theoretical foundation and technical support for the research.

Overall Assessment: CoreGuard is a high-quality research work demonstrating excellence in technical innovation, experimental validation, and practical value. This work not only addresses an important practical problem but also provides valuable insights and methodologies for subsequent research in related fields.