2025-11-17T06:22:13.355563

Survey in Characterization of Semantic Change

de SÃ¡, Da Silveira, Pruski

Live languages continuously evolve to integrate the cultural change of human societies. This evolution manifests through neologisms (new words) or \textbf{semantic changes} of words (new meaning to existing words). Understanding the meaning of words is vital for interpreting texts coming from different cultures (regionalism or slang), domains (e.g., technical terms), or periods. In computer science, these words are relevant to computational linguistics algorithms such as translation, information retrieval, question answering, etc. Semantic changes can potentially impact the quality of the outcomes of these algorithms. Therefore, it is important to understand and characterize these changes formally. The study of this impact is a recent problem that has attracted the attention of the computational linguistics community. Several approaches propose methods to detect semantic changes with good precision, but more effort is needed to characterize how the meaning of words changes and to reason about how to reduce the impact of semantic change. This survey provides an understandable overview of existing approaches to the \textit{characterization of semantic changes} and also formally defines three classes of characterizations: if the meaning of a word becomes more general or narrow (change in dimension) if the word is used in a more pejorative or positive/ameliorated sense (change in orientation), and if there is a trend to use the word in a, for instance, metaphoric or metonymic context (change in relation). We summarized the main aspects of the selected publications in a table and discussed the needs and trends in the research activities on semantic change characterization.

academic

Survey in Characterization of Semantic Change

Basic Information

Paper ID: 2402.19088
Title: Survey in Characterization of Semantic Change
Authors: Jader Martins Camboim de Sá, Marcos Da Silveira, Cédric Pruski (Luxembourg Institute of Science and Technology & University of Luxembourg)
Classification: cs.CL (Computational Linguistics), cs.AI
Publication Date: Preprint, November 17, 2025 (arXiv v4)
Paper Link: https://arxiv.org/abs/2402.19088

Abstract

Language evolves dynamically, reflecting sociocultural changes through neologisms or semantic shifts of existing words. Understanding word meanings is crucial for interpreting texts across different cultures, domains, or time periods, and directly impacts the performance of NLP applications such as machine translation, information retrieval, and question-answering systems. While existing methods have achieved good accuracy in detecting semantic change, there is a lack of systematic research on how to characterize the types of semantic changes. This survey provides the first comprehensive review of existing methods for characterizing lexical semantic change, formally defining three categories of change: dimensional change (broadening or narrowing of word meaning), orientational change (shift toward more negative or positive connotations), and relational change (transformation of word meaning through rhetorical devices such as metaphor or metonymy). The paper summarizes major research findings, analyzes current limitations, and identifies future research directions.

Research Background and Motivation

1. Core Problem

Lexical Semantic Change (LSC) is a core phenomenon in natural language evolution. Existing research primarily focuses on detecting whether semantic change occurs, but there is a severe shortage of research on characterizing how it changes. For example:

"gay" shifted from "happy" to "homosexual" (dimensional narrowing + orientational neutralization)
"heart" expanded from "cardiac organ" to metaphorical meanings like "courage" and "core" (relational change)
"awful" shifted from "awe-inspiring" to "terrible" (orientational pejoration)

2. Significance

Linguistic Value: Understanding language evolution patterns and revealing the impact of culture, society, and technology on language
NLP Applications:
- Historical text understanding (e.g., digital humanities research)
- Knowledge graph maintenance (e.g., temporal consistency in Wikidata)
- Cross-temporal information retrieval (e.g., semantic drift of "cloud" in technical literature)
- Sentiment analysis (e.g., amelioration of "sick" in slang)

3. Limitations of Existing Methods

Lack of Unified Formal Framework: Different studies use different terminology and definitions, making comparison difficult
Inconsistent Evaluation Standards: Absence of standard datasets and evaluation metrics
Emphasis on Detection over Characterization: 90% of research focuses on "whether change occurs," while only 10% addresses "how it changes"
Data Scarcity: Historical corpora are orders of magnitude smaller than required for modern NLP (millions vs. trillions of tokens)

4. Research Motivation

This paper is the first systematic survey of semantic change characterization, aiming to:

Identify limitations of existing representation and classification methods
Evaluate the strengths of different approaches
Provide formal definitions based on first-order logic
Demonstrate conceptually the LSC characterization task

Core Contributions

First Characterization-Oriented LSC Survey: Distinguished from existing surveys (Tahmasebi et al. 2018, Kutuzov et al. 2018) that focus on detection, this work emphasizes characterization
Three-Pole Taxonomy:
- Dimension (D): broadening/narrowing (quantitative change in word senses)
- Orientation (O): amelioration/pejoration (change in sentiment tendency)
- Relation (R): metaphorization/metonymization (change in rhetorical relationships)
Formal Framework: Provides mathematical definitions based on set theory (Section 5), distinguishing between identification and characterization
Systematic Method Classification: Constructs a two-dimensional classification matrix (Table 3) organized by representation method (frequency/topic/graph/embedding) × change pole (D/R/O)
Empirical Demonstration: Validates framework feasibility using SEMCOR and MASC datasets
Research Gap Identification: Highlights the scarcity of research on the relational pole (R) and joint multi-pole characterization

Methodology Details

Task Definition

Lexical Semantic Change Detection (Identification)

Given word $w$ with representations $R(w, t_1), R(w, t_2)$ in two corpora at times $t_1, t_2$ , determine whether change occurs: $f_C(R(w, t_1), R(w, t_2)) \rightarrow y$ where $y \in \{0,1\}$ (binary classification) or $y \in \mathbb{R}$ (continuous distance)

Semantic Universe: $S_T$ is the set of all possible word senses
Sense Function: $S: V \times T \rightarrow \wp(S_t)$ , mapping word $w$ in corpus $t$ to a set of senses $S(w, t) = \{s_1, s_2, ..., s_k\}$

Semantic Change Determination

Word $w$ undergoes change between $t_1, t_2$ if and only if:

Survey in Characterization of Semantic Change

Survey in Characterization of Semantic Change

Basic Information

Abstract

Research Background and Motivation

1. Core Problem

2. Significance

3. Limitations of Existing Methods

4. Research Motivation

Core Contributions

Methodology Details

Task Definition

Lexical Semantic Change Detection (Identification)

Lexical Semantic Change Characterization (Characterization) ★Core Innovation

Formal Framework (Section 5 Core)

Basic Definitions

Semantic Change Determination