2025-11-23T07:19:15.673915

Distilling Large Language Models for Efficient Clinical Information Extraction

Vedula, Gupta, Swaminathan et al.

Large language models (LLMs) excel at clinical information extraction but their computational demands limit practical deployment. Knowledge distillation--the process of transferring knowledge from larger to smaller models--offers a potential solution. We evaluate the performance of distilled BERT models, which are approximately 1,000 times smaller than modern LLMs, for clinical named entity recognition (NER) tasks. We leveraged state-of-the-art LLMs (Gemini and OpenAI models) and medical ontologies (RxNorm and SNOMED) as teacher labelers for medication, disease, and symptom extraction. We applied our approach to over 3,300 clinical notes spanning five publicly available datasets, comparing distilled BERT models against both their teacher labelers and BERT models fine-tuned on human labels. External validation was conducted using clinical notes from the MedAlign dataset. For disease extraction, F1 scores were 0.82 (teacher model), 0.89 (BioBERT trained on human labels), and 0.84 (BioBERT-distilled). For medication, F1 scores were 0.84 (teacher model), 0.91 (BioBERT-human), and 0.87 (BioBERT-distilled). For symptoms: F1 score of 0.73 (teacher model) and 0.68 (BioBERT-distilled). Distilled BERT models had faster inference (12x, 4x, 8x faster than GPT-4o, o1-mini, and Gemini Flash respectively) and lower costs (85x, 101x, 2x cheaper than GPT-4o, o1-mini, and Gemini Flash respectively). On the external validation dataset, the distilled BERT model achieved F1 scores of 0.883 (medication), 0.726 (disease), and 0.699 (symptom). Distilled BERT models were up to 101x cheaper and 12x faster than state-of-the-art LLMs while achieving similar performance on NER tasks. Distillation offers a computationally efficient and scalable alternative to large LLMs for clinical information extraction.

academic

Distilling Large Language Models for Efficient Clinical Information Extraction

基本信息

论文ID: 2501.00031
标题: Distilling Large Language Models for Efficient Clinical Information Extraction
作者: Karthik S. Vedula, Annika Gupta, Akshay Swaminathan, Ivan Lopez, Suhana Bedi, Nigam H. Shah
分类: cs.CL (Computation and Language)
发表时间: 2025年1月3日 (arXiv预印本)
论文链接: https://arxiv.org/abs/2501.00031

传统方法：基于规则的方法使用字符串匹配和医学本体，虽然可解释且计算高效，但往往无法捕获临床实体的多样化表示，包括同义词、缩写、细致描述和拼写错误。
机器学习方法：BERT类模型表现优异，但当前的临床NER模型往往专注于特定领域或实体类型，限制了广泛适用性。微调需要大量标注数据，成本高且耗时。
大型语言模型：LLMs在临床NER任务中表现出色，但需要大量计算资源，成本高昂，且专有LLMs需要HIPAA兼容端点处理受保护健康信息。

研究动机

知识蒸馏技术提供了解决这些挑战的有前景方案，能够将大型模型的知识转移到小型模型中，既解决了领域特定BERT模型的局限性，又避免了计算昂贵的LLMs的部署问题。

核心贡献

多教师标注器系统：开发了结合最新LLMs（Gemini和OpenAI模型）与医学本体（RxNorm和SNOMED）的教师标注器，用于多种笔记类型的临床NER任务。
高效蒸馏模型：创建并发布了基于BERT的蒸馏模型，体积约为现代LLMs的1/1000，在超过2000份临床文档上训练，涵盖肿瘤进展笔记、出院摘要、放射学报告和科学摘要。
全面评估验证：在五个公开临床数据集上进行综合评估，包括模型失效模式分析和跨健康系统的外部验证分析。

方法详解

任务定义

本研究专注于三个不同的NER任务：

药物提取：识别临床笔记中的药物名称和药物类别
疾病提取：识别疾病、综合征和病理状况
症状提取：识别患者症状和临床表现

每个任务使用"内部-外部"（IO）标注格式，实体内的词标记为"Inside"，其他词标记为"Outside"。

模型架构

教师标注管道

LLM标注器：评估四个最新LLMs作为教师标注器
- GPT-4o (version 2024-08-06)
- GPT-4o-mini (version 2024-07-18)
- o1-mini (version 2024-09-12)
- Gemini 1.5 Flash (gemini-1.5-flash-002)
本体标注器：利用BioPortal注释器API访问生物医学本体
- RxNorm：用于药物提取
- SNOMED CT：用于疾病和症状提取
最优教师组合：评估5个教师标注器的所有31种可能子集组合，选择在开发集上F1分数最高的组合。