2025-11-17T06:22:13.355563

Survey in Characterization of Semantic Change

de SÃ¡, Da Silveira, Pruski

Live languages continuously evolve to integrate the cultural change of human societies. This evolution manifests through neologisms (new words) or \textbf{semantic changes} of words (new meaning to existing words). Understanding the meaning of words is vital for interpreting texts coming from different cultures (regionalism or slang), domains (e.g., technical terms), or periods. In computer science, these words are relevant to computational linguistics algorithms such as translation, information retrieval, question answering, etc. Semantic changes can potentially impact the quality of the outcomes of these algorithms. Therefore, it is important to understand and characterize these changes formally. The study of this impact is a recent problem that has attracted the attention of the computational linguistics community. Several approaches propose methods to detect semantic changes with good precision, but more effort is needed to characterize how the meaning of words changes and to reason about how to reduce the impact of semantic change. This survey provides an understandable overview of existing approaches to the \textit{characterization of semantic changes} and also formally defines three classes of characterizations: if the meaning of a word becomes more general or narrow (change in dimension) if the word is used in a more pejorative or positive/ameliorated sense (change in orientation), and if there is a trend to use the word in a, for instance, metaphoric or metonymic context (change in relation). We summarized the main aspects of the selected publications in a table and discussed the needs and trends in the research activities on semantic change characterization.

academic

Survey in Characterization of Semantic Change

基本信息

论文ID: 2402.19088
标题: Survey in Characterization of Semantic Change
作者: Jader Martins Camboim de Sá, Marcos Da Silveira, Cédric Pruski (Luxembourg Institute of Science and Technology & University of Luxembourg)
分类: cs.CL (Computational Linguistics), cs.AI
发表时间: Preprint, November 17, 2025 (arXiv v4)
论文链接: https://arxiv.org/abs/2402.19088

摘要

语言是动态演化的，通过新词（neologisms）或现有词的语义变化来反映社会文化变迁。理解词义对于解读不同文化、领域或时期的文本至关重要，也直接影响机器翻译、信息检索、问答系统等NLP应用的性能。虽然现有方法在语义变化检测上已取得良好精度，但如何表征（characterize）语义变化的类型仍缺乏系统研究。本综述首次全面梳理了语义变化表征的现有方法，形式化定义了三类变化：维度变化（词义变宽或变窄）、取向变化（词义变得更贬义或褒义）、关系变化（词义通过隐喻或转喻等修辞方式转变）。论文总结了主要研究成果，分析了当前局限，并指出未来研究方向。

研究背景与动机

1. 核心问题

语义变化（Lexical Semantic Change, LSC）是自然语言演化的核心现象。现有研究主要聚焦于检测（detection）语义变化是否发生，但对于如何变化（how it changed）的表征研究严重不足。例如：

"gay"从"快乐的"变为"同性恋的"（维度窄化 + 取向中性化）
"heart"从"心脏器官"扩展到"勇气""核心"等隐喻义（关系变化）
"awful"从"令人敬畏的"变为"糟糕的"（取向贬义化）

2. 重要性

语言学价值：理解语言演化规律，揭示文化、社会、技术对语言的影响
NLP应用：
- 历史文本理解（如数字人文研究）
- 知识图谱维护（如Wikidata的时序一致性）
- 跨时代信息检索（如"cloud"在技术文献中的语义漂移）
- 情感分析（如"sick"在俚语中的褒义化）

3. 现有方法的局限

缺乏统一形式化框架：各研究使用不同术语和定义，难以比较
评估标准不一致：缺少标准数据集和评价指标
重检测轻表征：90%研究关注"是否变化"，仅10%研究"如何变化"
数据稀缺：历史语料库规模远小于现代NLP所需（百万级 vs 万亿级tokens）

4. 研究动机

本文是首个系统性综述语义变化表征的工作，旨在：

识别现有表示方法和分类方法的局限性
评估不同方法的优势
提供基于一阶逻辑的形式化定义
概念性演示LSC表征任务

核心贡献

首个表征导向的LSC综述：区别于现有综述（Tahmasebi et al. 2018, Kutuzov et al. 2018）聚焦检测，本文专注表征
三极分类法（Three-Pole Taxonomy）：
- 维度（Dimension）：broadening/narrowing（词义数量变化）
- 取向（Orientation）：amelioration/pejoration（情感倾向变化）
- 关系（Relation）：metaphorization/metonymization（修辞关系变化）
形式化框架：基于集合论提供数学定义（Section 5），区分identification与characterization
系统性方法分类：按表示方法（频率/主题/图/嵌入）× 变化极（D/R/O）构建二维分类矩阵（Table 3）
实证演示：使用SEMCOR和MASC数据集验证框架可行性
研究空白识别：指出关系极（R）和多极联合表征的研究匮乏

语义宇宙： $S_T$ 为所有可能词义的集合
词义函数： $S: V \times T \rightarrow \wp(S_t)$ ，将词 $w$ 在语料 $t$ 中映射到词义集合 $S(w, t) = \{s_1, s_2, ..., s_k\}$

语义变化判定

词 $w$ 在 $t_1, t_2$ 间发生变化当且仅当：

Survey in Characterization of Semantic Change

Survey in Characterization of Semantic Change

基本信息

摘要

研究背景与动机

1. 核心问题

2. 重要性

3. 现有方法的局限

4. 研究动机

核心贡献

方法详解

任务定义

语义变化检测（Identification）

语义变化表征（Characterization）★核心创新

形式化框架（Section 5核心）

基础定义

语义变化判定