2025-11-19T20:19:14.203751

Hybrid Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning

Liu, Wang, Liu et al.

Few-shot named entity recognition can identify new types of named entities based on a few labeled examples. Previous methods employing token-level or span-level metric learning suffer from the computational burden and a large number of negative sample spans. In this paper, we propose the Hybrid Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning (MsFNER), which splits the general NER into two stages: entity-span detection and entity classification. There are 3 processes for introducing MsFNER: training, finetuning, and inference. In the training process, we train and get the best entity-span detection model and the entity classification model separately on the source domain using meta-learning, where we create a contrastive learning module to enhance entity representations for entity classification. During finetuning, we finetune the both models on the support dataset of target domain. In the inference process, for the unlabeled data, we first detect the entity-spans, then the entity-spans are jointly determined by the entity classification model and the KNN. We conduct experiments on the open FewNERD dataset and the results demonstrate the advance of MsFNER.

academic

Hybrid Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning

基本信息

论文ID: 2404.06970
标题: Hybrid Multi-stage Decoding for Few-shot NER with Entity-aware Contrastive Learning
作者: Congying Liu, Gaosheng Wang, Peipei Liu, Xingyuan Wei, Hongsong Zhu
分类: cs.CL
发表时间: 2024年4月 (arXiv预印本)
论文链接: https://arxiv.org/abs/2404.06970

摘要

研究背景与动机

问题定义

Few-shot命名实体识别(Few-shot NER)旨在基于少量标注样本快速识别新类型的命名实体。这一任务对于适应动态变化的现实应用场景具有重要意义，特别是在需要模型快速适应新数据或环境变化的情况下。

现有方法的局限性

Token级别方法：虽然基于token与原型或支持集token距离的方法简单直观，但存在计算成本高、无法保持实体token语义完整性的问题，容易受到非实体标记的干扰。
Span级别方法：虽然能通过评估整个span来缓解token级方法的部分问题，但枚举所有可能span会导致O(N²)的复杂度，并增加大量负样本的噪声。

研究动机

作者希望解决两个核心问题：

如何提高few-shot NER识别效率，通过增强实体与非实体间的语义差异来确定有效的实体span
如何改进实体span分类，通过控制和协调不同实体类型的语义距离，使同类实体语义表示更接近，异类实体更远离

核心贡献

提出了MsFNER框架：将传统NER任务分解为实体span检测和实体分类两个阶段，有效降低计算复杂度并减少负样本影响
设计了实体感知对比学习模块：增强实体表示学习，提升同类实体的一致性并拉大不同类实体间的距离
构建了混合推理机制：结合实体分类模型和KNN方法进行联合预测，提升分类准确性
取得了SOTA性能：在FewNERD和FewAPTER数据集上显著超越现有方法，并与ChatGPT进行了全面比较

方法详解

任务定义

Few-shot NER任务定义为：模型首先在源域数据集 $D_{source} = (S_{source}, Q_{source})$ 上训练，然后迁移到目标域数据集 $D_{target} = (S_{target}, Q_{target})$ 进行推理。其中 $S_{target}$ 为支持集，包含N个实体类型(N-way)，每个类型有K个标注样例(K-shot)； $Q_{target}$ 为查询集，包含与支持集相同的实体类型。

模型架构

MsFNER包含三个主要过程：

1. 训练过程(Training Process)

实体Span检测(ESD)模块：

将实体span检测视为序列标注任务，使用BIOES标注方案
对输入句子 $x = (x_1, x_2, ..., x_n)$ ，使用BERT编码器获得上下文表示 $h = (h_1, h_2, ..., h_n)$
通过CRF层进行实体span检测，训练损失为：

$L_{ESD} = -\sum \log P(y|x)$

其中： $P(y|x) = \frac{\prod_{i=1}^{|x|} \phi_i(y_{i-1}, y_i, x)}{\sum_{y'} \prod_{i=1}^{|x|} \phi_i(y'_{i-1}, y'_i, x)}$

采用MAML元学习方法训练，包含内循环更新和外循环更新

实体分类(EC)模块：

对实体 $e_k = (x_f, ..., x_{f+l})$ ，使用最大池化获得表示： $\hat{e}_k = \max(h_f, ..., h_{f+l})$
引入实体感知对比学习，损失函数为： $L_{CL} = \sum_j -\frac{1}{|P(j)|} \sum_{p \in P(j)} \log \frac{\exp(\text{sim}(z_j, z_p)/\tau)}{\sum_{a \in A(j)} \exp(\text{sim}(z_j, z_a)/\tau)}$
构建原型表示并进行分类： $c_t(S) = \frac{1}{|S_t|} \sum_{e_m \in S_t} \hat{e}_m$