Proprietary large language models (LLMs) exhibit strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy reasons. However, deploying proprietary LLMs at the edge without adequate protection introduces critical security threats. Attackers can extract model weights and architectures, enabling unauthorized copying and misuse. Even when protective measures prevent full extraction of model weights, attackers may still perform advanced attacks, such as fine-tuning, to further exploit the model. Existing defenses against these threats typically incur significant computational and communication overhead, making them impractical for edge deployment. To safeguard the edge-deployed LLMs, we introduce CoreGuard, a computation- and communication-efficient protection method. CoreGuard employs an efficient protection protocol to reduce computational overhead and minimize communication overhead via a propagation protocol. Extensive experiments show that CoreGuard achieves upper-bound security protection with negligible overhead.
- Paper ID: 2410.13903
- Title: CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment
- Authors: Qinfeng Li, Tianyue Luo, Xuhong Zhang, Yangfan Xie, Zhiqiang Shen, Lijun Zhang, Yier Jin, Hao Peng, Xinkui Zhao, Xianwei Zhu, Jianwei Yin
- Classification: cs.CR (Cryptography and Security), cs.AI (Artificial Intelligence), cs.DC (Distributed Computing)
- Publication Venue/Conference: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
- Paper Link: https://arxiv.org/abs/2410.13903
Proprietary large language models (LLMs) demonstrate strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy considerations. However, deploying proprietary LLMs on edge devices without adequate protection poses serious security threats. Attackers can extract model weights and architecture, enabling unauthorized replication and misuse. Even if protective measures prevent complete model weight extraction, attackers can still execute advanced attacks (such as fine-tuning) to further exploit the model. Existing defense solutions typically incur significant computational and communication overhead, making them impractical for edge deployment. To protect LLMs deployed on edge devices, this paper proposes CoreGuard, a computationally and communicationally efficient protection method. CoreGuard employs efficient protection protocols to reduce computational overhead and minimizes communication overhead through propagation protocols. Extensive experiments demonstrate that CoreGuard achieves upper-bound security protection with negligible overhead.
- Core Problem: Proprietary LLMs deployed on edge devices face model stealing threats, where attackers can extract model architecture and weights through software analysis techniques, leading to unauthorized replication and misuse.
- Problem Significance:
- Proprietary LLMs (such as ChatGPT, Claude) possess strong generalization capabilities with enormous development costs
- Clear trend toward edge deployment (e.g., Apple Intelligence integrating 3B-parameter LLMs into iOS devices)
- Domain-specific proprietary LLMs (such as BloombergGPT in finance, Med-PaLM 2 in healthcare) lack open-source alternatives
- Limitations of Existing Methods:
- Passive Protection (e.g., watermarking): Only provides ownership proof, cannot prevent misuse in unsupervised edge environments
- Model Encryption: Remains vulnerable during runtime
- Direct TEE Protection: Placing entire models in trusted execution environments results in approximately 50× efficiency reduction
- Partial Parameter TEE Execution (PPTE): Protects limited number of weights, susceptible to reconstruction
- Parameter Shuffling Protection (PSP): Methods like ShadowNet incur excessive data transfer overhead
- Research Motivation: Need for solutions that ensure adequate security while maintaining acceptable computational and communication overhead.
- First systematic protection of foundational capabilities of edge-deployed LLMs: Systematically characterizes security challenges in this scenario and identifies requirements for protecting edge-deployed LLMs.
- Proposes CoreGuard plug-and-play solution: Leverages lightweight authorization mechanisms to protect edge-deployed LLMs, employs propagation protocols to significantly reduce transmission overhead while maintaining low computational overhead.
- Comprehensive experimental validation: Compared to existing solutions, CoreGuard provides higher security guarantees, lower overhead, and no accuracy loss.
Input: Trained LLM model
Output: Locked model that functions normally only with proper authorization through trusted hardware (TEE) within the device
Constraints: Minimize computational and communication overhead while maintaining model accuracy
CoreGuard operates in two stages:
Protection Protocol:
- Perform row permutation on weight matrices of linear layers: Wq′=πTWq,Wk′=πTWk,Wv′=πTWv,Wm′=πTWm
- These row permutations act as "locks," disabling linear layers such that normal computation only occurs with corresponding column permutation inputs (authorization)
- Permutation matrix π∈{0,1}d×d satisfies ππT=I
Propagation Protocol:
- Perform column permutation on output processing layers: Wo′=Woπ,Wn′=Wnπ
- Achieve automatic authorization through column permutation of features via network operations
- TEE only needs to manage initial authorization, which propagates to all subsequent layers
Encryption Process:
m′=mπ+pπ
where p is one-time pad (OTP) noise and m′ is the encrypted permuted feature.
Output Linear Layer Processing:
n′=m′Wn′=(mπ+pπ)πTWn+bn=n+pWn
Decryption and Authorization:
n′′=n′−pWn=nz′=(γ2⊙σy+nn+y−μy+n+β2)π=zπ
- Single Authorization Propagation Mechanism: Through clever permutation design, achieves automatic propagation of authorization throughout the network, avoiding the need for TEE authorization at each layer.
- OTP Encryption Combined with Position Obfuscation: Uses one-time pad encryption combined with permutation to hide encryption and decryption processes.
- Optimal Communication Complexity: Requires only 5 rounds of TEE-GPU transfer per inference, achieving theoretical optimality.
- Mathematical Security Guarantees: Provides security proofs based on the NP-hardness assumption of the Learning With Errors (LWE) problem.
- GSM8k: Mathematical reasoning tasks
- Spider: Code generation tasks
- PubMedQA: Medical question-answering tasks
- SQuAD: Reading comprehension tasks
- Edge Deployment Models: Qwen2-0.5B-Instruct, Gemma2-2B-it
- Large Models: ChatGLM3-6B-32k, LLaMA3-8B-Instruct
- Security: Accuracy of model stealing attacks (lower is safer)
- Efficiency: Floating-point operations (FLOPs), TEE-GPU transfer overhead
- Accuracy: Task-specific accuracy
- TPTE: NPLO
- PPTE: DarkneTZ, SOTER, Serdab, DTE
- PSP: ShadowNet, TransLinkGuard (TLG)
- Bounds: No-shield (lower bound), Black-box (upper bound)
- Implemented using Hugging Face library
- AdamW optimizer with linear learning rate scheduling
- Experiments conducted on NVIDIA A800 GPU
- Assumes attackers possess 100% of training dataset (stricter than 1% in prior work)
Security Evaluation:
- Unauthorized inference accuracy: 0% in all cases
- Model stealing attacks: CoreGuard relative accuracy of 1.17× (approaching Black-box's 1.00×)
- Significantly outperforms TPTE method NPLO (9.59×) and PPTE method DarkneTZ (8.43×)
- Comparable performance to other PSP methods (TLG: 1.07×, ShadowNet: 1.09×)
Efficiency Comparison:
- TEE Execution Overhead: CoreGuard < 1.17e-03%, PPTE methods 2.91%-21.52%
- TEE-GPU Transfer Overhead: CoreGuard requires only 5 rounds of transfer, while ShadowNet requires 448 rounds (LLaMA3-8B)
- Transfer Data Volume: CoreGuard approximately 20KB, ShadowNet approximately 1.3GB
Security Under Different Attack Settings:
- LoRA fine-tuning attacks: CoreGuard maintains security close to upper bound
- Different data proportions (1%-100%): Maintains security close to Black-box protection across all settings
- Task alignment: Maintains security regardless of whether attacker's target task aligns with deployed model's task
Authorization Position Impact:
- Mid-layer authorization provides optimal security
- First and last layer authorization shows lower security, as attackers only need to recover limited parameters
- In most cases, protected model accuracy is identical to original model
- Minor fluctuations of ±0.5% in individual cases, attributed to floating-point precision limitations
- Model Protection Methods:
- Watermarking techniques: Passive protection providing only ownership proof
- Model encryption: Vulnerable during runtime
- TEE protection: Direct protection with excessive computational overhead
- Parameter Shuffling Protection:
- ShadowNet: Channel shuffling protection for convolutional layers
- TransLinkGuard: Protection for Transformer models
- Trusted Execution Environment Applications:
- CPU-based TEE: ARM TrustZone, Intel SGX
- GPU TEE: Still in early stages, primarily targeting data centers
Compared to existing work, CoreGuard achieves orders of magnitude efficiency improvements while maintaining the same security level, particularly in communication overhead.
- CoreGuard successfully addresses the security protection problem for edge-deployed LLMs
- Achieves optimal communication complexity through propagation protocols
- Provides upper-bound security guarantees while maintaining negligible computational and communication overhead
- Preserves original model accuracy
- Side-Channel Attacks: Relies on TEE as security root, potentially vulnerable to side-channel attacks
- GPU TEE Limitations: Currently primarily dependent on CPU-based TEE, GPU TEE remains immature
- Practical Deployment: Paper focuses on core framework without deep device-specific implementation details
- Architecture Compatibility: Primarily designed for mainstream Transformer architectures
- Integrate side-channel attack countermeasures
- Adapt to GPU TEE technology development
- Extend to additional model architectures
- Optimize for actual device deployment
- Strong Innovation: First systematic solution to foundational capability protection for edge-deployed LLMs
- Clever Technical Design: Propagation protocol design is elegant, achieving single authorization coverage across entire network
- Solid Theoretical Foundation: Provides mathematical security guarantees based on LWE problem
- Comprehensive Experiments: Full evaluation across multiple models, tasks, and attack scenarios
- High Practical Value: Significant efficiency improvements make it viable for actual deployment
- Security Assumptions: Relies on TEE security, potentially vulnerable to side-channel attacks
- Limited Scope: Primarily targets Transformer architecture, limited applicability to other architectures
- Deployment Complexity: Actual deployment requires consideration of additional hardware and system-level factors
- Long-term Security: Continued effectiveness of current protection measures needs verification as attack techniques evolve
- Academic Contribution: Provides new research directions and solutions for edge AI security
- Practical Value: Offers important guidance for commercial LLM edge deployment
- Technology Advancement: May promote further development of TEE technology in AI protection domain
- Edge device deployment of proprietary LLMs
- AI applications sensitive to latency and privacy
- Commercial AI services requiring intellectual property protection
- Model protection in resource-constrained environments
The paper cites 52 related references covering important work in model protection, trusted execution environments, large language models, and other relevant domains, providing solid theoretical foundation and technical support for the research.
Overall Assessment: CoreGuard is a high-quality research work demonstrating excellence in technical innovation, experimental validation, and practical value. This work not only addresses an important practical problem but also provides valuable insights and methodologies for subsequent research in related fields.