2025-11-18T11:58:13.432393

CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment

Li, Luo, Zhang et al.

Proprietary large language models (LLMs) exhibit strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy reasons. However, deploying proprietary LLMs at the edge without adequate protection introduces critical security threats. Attackers can extract model weights and architectures, enabling unauthorized copying and misuse. Even when protective measures prevent full extraction of model weights, attackers may still perform advanced attacks, such as fine-tuning, to further exploit the model. Existing defenses against these threats typically incur significant computational and communication overhead, making them impractical for edge deployment. To safeguard the edge-deployed LLMs, we introduce CoreGuard, a computation- and communication-efficient protection method. CoreGuard employs an efficient protection protocol to reduce computational overhead and minimize communication overhead via a propagation protocol. Extensive experiments show that CoreGuard achieves upper-bound security protection with negligible overhead.

academic

CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment

基本信息

论文ID: 2410.13903
标题: CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment
作者: Qinfeng Li, Tianyue Luo, Xuhong Zhang, Yangfan Xie, Zhiqiang Shen, Lijun Zhang, Yier Jin, Hao Peng, Xinkui Zhao, Xianwei Zhu, Jianwei Yin
分类: cs.CR (Cryptography and Security), cs.AI (Artificial Intelligence), cs.DC (Distributed Computing)
发表时间/会议: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
论文链接: https://arxiv.org/abs/2410.13903

摘要

专有大语言模型(LLMs)在多种任务中表现出强大的泛化能力，出于效率和隐私考虑，越来越多地部署在边缘设备上。然而，在边缘部署专有LLMs而缺乏充分保护会带来严重的安全威胁。攻击者可以提取模型权重和架构，实现未授权复制和滥用。即使保护措施能阻止完整的模型权重提取，攻击者仍可能执行高级攻击(如微调)来进一步利用模型。现有的防御方案通常会产生显著的计算和通信开销，使其在边缘部署中不切实际。为了保护边缘部署的LLMs，本文提出了CoreGuard，一种计算和通信高效的保护方法。CoreGuard采用高效的保护协议来降低计算开销，并通过传播协议最小化通信开销。大量实验表明，CoreGuard在可忽略开销下实现了上界安全保护。

研究背景与动机

问题定义

核心问题: 边缘部署的专有LLMs面临模型窃取威胁，攻击者可以通过软件分析技术提取模型架构和权重，导致未授权复制和滥用。
问题重要性:
- 专有LLMs(如ChatGPT、Claude)具有强大的泛化能力，开发成本巨大
- 边缘部署趋势明显(如Apple Intelligence集成3B参数LLM到iOS设备)
- 特定领域的专有LLMs(如金融领域的BloombergGPT、医疗领域的Med-PaLM 2)缺乏开源替代品
现有方法局限性:
- 被动保护(如水印)：仅提供所有权证明，无法阻止无监督边缘环境中的滥用
- 模型加密：运行时仍易受攻击
- TEE直接保护：将整个模型放入可信执行环境会导致约50倍的效率降低
- 部分参数TEE执行(PPTE)：保护权重数量有限，易被重构
- 参数洗牌保护(PSP)：如ShadowNet存在过大的数据传输开销
研究动机: 需要在保证充分安全性的同时，维持可接受的计算和通信开销的解决方案。

核心贡献

首次针对边缘部署LLMs的基础能力保护：系统性地刻画了该场景下的安全挑战，并确定了保护边缘部署LLMs的需求。
提出CoreGuard即插即用解决方案：利用轻量级授权机制保护边缘部署的LLMs，采用传播协议显著减少传输开销，同时保持低计算开销。
全面的实验验证：相比现有解决方案，CoreGuard提供更高的安全保证、更低的开销且无准确率损失。

对线性层的权重矩阵进行行置换： $W'_q = \pi^T W_q, W'_k = \pi^T W_k, W'_v = \pi^T W_v, W'_m = \pi^T W_m$
这些行置换作为"锁"，使线性层失效，只有相应的列置换输入(授权)才能正常计算
置换矩阵 $\pi \in \{0,1\}^{d \times d}$ 满足 $\pi\pi^T = I$