2025-11-17T06:22:13.355563

Survey in Characterization of Semantic Change

de Sá, Da Silveira, Pruski
Live languages continuously evolve to integrate the cultural change of human societies. This evolution manifests through neologisms (new words) or \textbf{semantic changes} of words (new meaning to existing words). Understanding the meaning of words is vital for interpreting texts coming from different cultures (regionalism or slang), domains (e.g., technical terms), or periods. In computer science, these words are relevant to computational linguistics algorithms such as translation, information retrieval, question answering, etc. Semantic changes can potentially impact the quality of the outcomes of these algorithms. Therefore, it is important to understand and characterize these changes formally. The study of this impact is a recent problem that has attracted the attention of the computational linguistics community. Several approaches propose methods to detect semantic changes with good precision, but more effort is needed to characterize how the meaning of words changes and to reason about how to reduce the impact of semantic change. This survey provides an understandable overview of existing approaches to the \textit{characterization of semantic changes} and also formally defines three classes of characterizations: if the meaning of a word becomes more general or narrow (change in dimension) if the word is used in a more pejorative or positive/ameliorated sense (change in orientation), and if there is a trend to use the word in a, for instance, metaphoric or metonymic context (change in relation). We summarized the main aspects of the selected publications in a table and discussed the needs and trends in the research activities on semantic change characterization.
academic

Survey in Characterization of Semantic Change

Basic Information

  • Paper ID: 2402.19088
  • Title: Survey in Characterization of Semantic Change
  • Authors: Jader Martins Camboim de Sá, Marcos Da Silveira, Cédric Pruski (Luxembourg Institute of Science and Technology & University of Luxembourg)
  • Classification: cs.CL (Computational Linguistics), cs.AI
  • Publication Date: Preprint, November 17, 2025 (arXiv v4)
  • Paper Link: https://arxiv.org/abs/2402.19088

Abstract

Language evolves dynamically, reflecting sociocultural changes through neologisms or semantic shifts of existing words. Understanding word meanings is crucial for interpreting texts across different cultures, domains, or time periods, and directly impacts the performance of NLP applications such as machine translation, information retrieval, and question-answering systems. While existing methods have achieved good accuracy in detecting semantic change, there is a lack of systematic research on how to characterize the types of semantic changes. This survey provides the first comprehensive review of existing methods for characterizing lexical semantic change, formally defining three categories of change: dimensional change (broadening or narrowing of word meaning), orientational change (shift toward more negative or positive connotations), and relational change (transformation of word meaning through rhetorical devices such as metaphor or metonymy). The paper summarizes major research findings, analyzes current limitations, and identifies future research directions.

Research Background and Motivation

1. Core Problem

Lexical Semantic Change (LSC) is a core phenomenon in natural language evolution. Existing research primarily focuses on detecting whether semantic change occurs, but there is a severe shortage of research on characterizing how it changes. For example:

  • "gay" shifted from "happy" to "homosexual" (dimensional narrowing + orientational neutralization)
  • "heart" expanded from "cardiac organ" to metaphorical meanings like "courage" and "core" (relational change)
  • "awful" shifted from "awe-inspiring" to "terrible" (orientational pejoration)

2. Significance

  • Linguistic Value: Understanding language evolution patterns and revealing the impact of culture, society, and technology on language
  • NLP Applications:
    • Historical text understanding (e.g., digital humanities research)
    • Knowledge graph maintenance (e.g., temporal consistency in Wikidata)
    • Cross-temporal information retrieval (e.g., semantic drift of "cloud" in technical literature)
    • Sentiment analysis (e.g., amelioration of "sick" in slang)

3. Limitations of Existing Methods

  • Lack of Unified Formal Framework: Different studies use different terminology and definitions, making comparison difficult
  • Inconsistent Evaluation Standards: Absence of standard datasets and evaluation metrics
  • Emphasis on Detection over Characterization: 90% of research focuses on "whether change occurs," while only 10% addresses "how it changes"
  • Data Scarcity: Historical corpora are orders of magnitude smaller than required for modern NLP (millions vs. trillions of tokens)

4. Research Motivation

This paper is the first systematic survey of semantic change characterization, aiming to:

  1. Identify limitations of existing representation and classification methods
  2. Evaluate the strengths of different approaches
  3. Provide formal definitions based on first-order logic
  4. Demonstrate conceptually the LSC characterization task

Core Contributions

  1. First Characterization-Oriented LSC Survey: Distinguished from existing surveys (Tahmasebi et al. 2018, Kutuzov et al. 2018) that focus on detection, this work emphasizes characterization
  2. Three-Pole Taxonomy:
    • Dimension (D): broadening/narrowing (quantitative change in word senses)
    • Orientation (O): amelioration/pejoration (change in sentiment tendency)
    • Relation (R): metaphorization/metonymization (change in rhetorical relationships)
  3. Formal Framework: Provides mathematical definitions based on set theory (Section 5), distinguishing between identification and characterization
  4. Systematic Method Classification: Constructs a two-dimensional classification matrix (Table 3) organized by representation method (frequency/topic/graph/embedding) × change pole (D/R/O)
  5. Empirical Demonstration: Validates framework feasibility using SEMCOR and MASC datasets
  6. Research Gap Identification: Highlights the scarcity of research on the relational pole (R) and joint multi-pole characterization

Methodology Details

Task Definition

Lexical Semantic Change Detection (Identification)

Given word ww with representations R(w,t1),R(w,t2)R(w, t_1), R(w, t_2) in two corpora at times t1,t2t_1, t_2, determine whether change occurs: fC(R(w,t1),R(w,t2))yf_C(R(w, t_1), R(w, t_2)) \rightarrow y where y{0,1}y \in \{0,1\} (binary classification) or yRy \in \mathbb{R} (continuous distance)

Lexical Semantic Change Characterization (Characterization) ★Core Innovation

Building on detection, further classify the type of change: fx(R(w,t1),R(w,t2))y,x{D,R,O}f_x(R(w, t_1), R(w, t_2)) \rightarrow y, \quad x \in \{D, R, O\}

Formal Framework (Section 5 Core)

Basic Definitions

  • Semantic Universe: STS_T is the set of all possible word senses
  • Sense Function: S:V×T(St)S: V \times T \rightarrow \wp(S_t), mapping word ww in corpus tt to a set of senses S(w,t)={s1,s2,...,sk}S(w, t) = \{s_1, s_2, ..., s_k\}

Semantic Change Determination

Word ww undergoes change between t1,t2t_1, t_2 if and only if:

\text{True} & S(w, t_1) \neq S(w, t_2) \\ \text{False} & \text{otherwise} \end{cases}$$ #### Three-Pole Definitions **1. Dimensional Change (Dimension)** $$|S(w, t_1)| \neq |S(w, t_2)|$$ - Broadening: $|S(w, t_1)| < |S(w, t_2)|$ (increase in senses) - Narrowing: $|S(w, t_1)| > |S(w, t_2)|$ (decrease in senses) **Example**: - "plane" has 5 senses in SEMCOR (plane, aircraft, planer, etc.) but only 2 in MASC → narrowing **2. Orientational Change (Orientation)** Define sentiment function $f: V \times T \rightarrow \{-1, 0, +1\}$, then: $$f(w, t_1) \neq f(w, t_2)$$ - Amelioration: $f(w, t_1) < f(w, t_2)$ (shift toward positive) - Pejoration: $f(w, t_1) > f(w, t_2)$ (shift toward negative) **Implementation**: Weighted sum of SentiWordNet scores $$f(w, t) = \frac{1}{N}\sum_{i=1}^{N} p(s_i) \cdot \text{positive}(s_i)$$ **Example**: - "heart" has $f=0.15$ in SEMCOR and $f=0.97$ in MASC → amelioration **3. Relational Change (Relation)** Define relational similarity $l: S \times S \rightarrow \mathbb{R}$, total relational strength: $$R(w, t) = \sum_{i=1}^{N-1}\sum_{j=i+1}^{N} l(s_i, s_j), \quad s_i, s_j \in S(w, t)$$ - Increase: $R(w, t_1) < R(w, t_2)$ (more metaphorical/metonymical uses) **Example**: - "heart" expands from literal sense "cardiac organ" to metaphorical senses "core," "courage" → strengthened relations ### Technical Innovations 1. **Set-Theoretic Formalization**: First rigorous mathematical formulation of LSC characterization, eliminating ambiguity 2. **Pole Symmetry**: Three poles naturally pair (broadening/narrowing share dimensional measurement), simplifying computational framework 3. **Operationalizability**: Definitions directly translate to algorithms (e.g., sense counting, sentiment scoring, relational graph analysis) 4. **Cambridge Perspective**: Adopts static comparison (contrasting two corpora) rather than McTaggart dynamic tracking, suitable for computational methods ## Experimental Setup ### Dataset Classification #### Diachronic Corpora (Table 2) | Corpus | Language | Time Span | Scale | Characteristics | |--------|----------|-----------|-------|-----------------| | **COHA** | English | 1810s-2000s | 400M words | Most commonly used, balanced genres | | **Google N-Gram** | Multilingual | 1600-2009 | 300B words | Largest scale, but noisy | | **DTA** | German | 1741-1900 | 1022 texts | High quality, manually selected | | **CLMET** | English | 1710-1920 | 34M words | Primarily literary works | #### Demonstration Datasets - **SEMCOR** (1993): 200K words, WordNet sense annotations - **MASC** (2013): 500K words, modern American English - **Annotation Sources**: - Senses: WordNet - Relations: ChainNet (metaphor/metonymy links) - Orientation: SentiWordNet (positive/negative scores) ### Evaluation Dimensions As a survey paper, this work does not provide unified evaluation metrics, but analyzes existing methods' evaluation approaches: #### Dimensional Pole (D) - **Metrics**: Sense quantity change, clustering density, topic count - **Data Sources**: Dictionaries, sense-induced clustering, topic models #### Orientational Pole (O) - **Metrics**: Distance from seed words, VAD framework scores (Valence-Arousal-Dominance) - **Challenges**: Seed word stability assumptions, irony/negation handling #### Relational Pole (R) - **Metrics**: Entropy increase (Schlechtweg 2017), relational graph edge count - **Issues**: Difficulty distinguishing metaphor vs. new homonymy ### Method Classification (Table 3 Core) | Method | D | R | O | Representation | |--------|---|---|---|-----------------| | Biemann 2006 | ✓ | - | - | Graph | | Tang et al. 2013 | ✓ | ✓ | - | Frequency | | Hamilton et al. 2016a | - | - | ✓ | Graph (SentiProp) | | Inoue et al. 2022 | ✓ | - | - | Topics (InfiniteSCAN) | | Giulianelli et al. 2020 | ✓ | - | - | Embeddings (BERT) | | Fonteyn & Manjavacas 2021 | - | ✓ | ✓ | Embeddings | **Key Findings**: - **No method covers all three poles**: Characterization complexity is high - **Dimensional pole most researched**: 18/23 methods - **Relational pole most underdeveloped**: Only 3 methods - **Embedding methods dominant**: Recent trend ## Experimental Results ### Framework Validation (Section 5.7) #### Case 1: Multi-Pole Change of "heart" **Data** (SEMCOR → MASC): ``` Sense distribution changes: - heart.n.02 (organ, literal): 34.8% → 0% - heart.n.03 (courage, metaphorical+): 12.1% → 90.1% - heart.n.10 (poker suit, new): 0% → 2.8% ``` **Computational Results**: 1. **Dimension**: $|S|: 5 \rightarrow 3$, narrowing 2. **Orientation**: $f: 0.15 \rightarrow 0.97$, strong amelioration 3. **Relation**: Metaphorical uses dominant (90.1%), strengthened relations **Interpretation**: Literal sense "heart" disappears, metaphorical sense "courage/core" becomes prototypical #### Case 2: Narrowing of "plane" **Data**: ``` SEMCOR: 5 senses (aircraft 48.8%, plane 37.2%, planer 4.7%, etc.) MASC: 2 senses (aircraft 90.9%, plane 9.1%) ``` **Computational Results**: 1. **Dimension**: $5 \rightarrow 2$, significant narrowing 2. **Orientation**: Loss of positive sense (flat.s.01, +0.375) → slight pejoration 3. **Relation**: $R: 1 \rightarrow 0$ (metonymic relation between plane.n.03 and plane.n.02 disappears) ### Method Comparison Analysis (Table 4) #### Frequency Methods **Advantages**: - Simple and interpretable - Suitable for detecting neologisms - Low data requirements **Disadvantages**: - Cannot distinguish senses (polysemy problem) - Difficult to capture semantic similarity - Sensitive to irony/negation **Applicable Scenarios**: Seed word co-occurrence statistics for orientational pole #### Topic Models **Advantages**: - Unsupervised discovery of new senses - Visualization of topic evolution - InfiniteSCAN dynamically adjusts topic count **Disadvantages**: - Requires manual topic interpretation - Topic granularity difficult to control - Research gaps in relational and orientational poles **Representative Works**: - SCAN (Frermann & Lapata 2016) - InfiniteSCAN (Inoue et al. 2022): Automatic detection of sense quantity changes #### Graph Methods **Advantages**: - Natural representation of word relations - Visualization of sense evolution trees (Ehmüller et al. 2020) - Suitable for sentiment propagation (SentiProp) **Disadvantages**: - Dependent on graph construction quality - High computational complexity - Severe underexploration of relational pole **Representative Works**: - Chinese Whispers clustering (Biemann 2006) - Ego-network + PMI filtering (Ehmüller et al. 2020) #### Embedding Methods **Advantages**: - Capture subtle semantic changes - BERT and contextual embeddings improve performance - Density embeddings (word2gauss) model polysemy **Disadvantages**: - **Meaning Conflation Deficiency**: Single vectors cannot distinguish fine-grained senses - Instability for low-frequency words - Contextual embeddings over-contextualize → false positives **Representative Works**: - Diachronic embeddings (Hamilton et al. 2016b) - Gaussian embeddings (Moss 2020, Yüksel et al. 2021) - XL-LEXEME (Cassotti et al. 2023): Cross-lingual WSD pretraining ### Important Findings 1. **Characterization is harder than detection**: SemEval-2020 shows contextual embeddings did not surpass static embeddings in LSC detection; characterization requires specialized design 2. **Data Bottleneck**: Historical corpora at million scale vs. modern LLMs requiring trillion scale → need few-shot learning 3. **Multilingual Scarcity**: 90% of research focuses only on English 4. **Relational Pole Gap**: Only 3 papers, no standard dataset 5. **Evaluation Difficulty**: Lack of gold standards, mostly qualitative analysis ## Related Work ### Comparison with Existing Surveys | Survey | Year | Focus | Difference from This Work | |--------|------|-------|---------------------------| | **Tang 2018** | 2018 | Four-step framework (corpus→sense→modeling→validation) | Focuses on detection, characterization only briefly discussed | | **Tahmasebi et al. 2018** | 2018 | Word-level/sense-level distinction, lexical replacement | Recommends deeper characterization research | | **Kutuzov et al. 2018** | 2018 | Word representation models and data | Points out insufficient validation of classification schemes | | **Montanelli & Periti 2023** | 2023 | Contextual embedding methods | Calls for research on "laws of semantic shift" | | **This Work** | 2025 | **Characterization three-poles + formalization** | First systematic characterization survey | ### Theoretical Foundations #### Linguistic Classification (Traugott 2017) - **Broadening/Narrowing**: Range of word meaning changes - **Amelioration/Pejoration**: Change in sentiment value - **Metaphorization/Metonymization**: Change in rhetorical mechanisms #### Computational Perspective Classification - **Cambridge Perspective**: Static comparison of two corpora (adopted in this work) - **McTaggart Perspective**: Dynamic tracking of evolution process (requires historical knowledge) ### Evolution of Sense Representation 1. **Early Period**: Frequency + co-occurrence matrices (Sagi et al. 2009) 2. **2010s**: Topic models (Lau et al. 2012), graph clustering (Biemann 2006) 3. **2016+**: Static embeddings (Hamilton et al. 2016b) 4. **2019+**: BERT and contextual embeddings (Giulianelli et al. 2020) 5. **Future**: LLM generative methods (Cassotti et al. 2024) ## Conclusions and Discussion ### Main Conclusions 1. **Characterization Research Severely Underdeveloped**: Detection vs. characterization paper ratio approximately 9:1 2. **Three-Pole Imbalance**: Dimensional pole (D) well-researched, relational pole (R) nearly absent 3. **Method Fragmentation**: Lack of unified framework and evaluation standards 4. **Formalization Necessity**: Set-theoretic definitions can eliminate ambiguity and promote method comparison 5. **Data Challenges**: Historical corpus scale limitations restrict deep learning applications ### Limitations #### Methodological Limitations 1. **Simplifying Assumptions**: Sense objectivism ignores context-dependency 2. **Binary Classification Limitations**: Broadening/narrowing cannot describe changes in word meaning intension (connotation) 3. **Relational Pole Definition Ambiguity**: Difficult to distinguish metaphor vs. metonymy vs. new homonymy #### Data Limitations 1. **Corpus Bias**: - Balanced corpora like COHA still have genre bias - Google N-Gram has high noise (OCR errors) 2. **Annotation Lag**: Dictionary adoption of new senses lags 5-10 years 3. **Multilingual Scarcity**: Non-English research <10% #### Evaluation Limitations 1. **Lack of Gold Standards**: Most work relies on qualitative analysis 2. **Seed Word Stability**: Orientational pole assumes seed words don't change (they actually do) 3. **Threshold Subjectivity**: Binary classification change thresholds lack consensus ### Future Directions #### Short-term (1-2 years) 1. **Relational Pole Breakthrough**: - Construct metaphor/metonymy annotated datasets - Leverage knowledge graphs (Wikidata) to model conceptual relations 2. **Multi-Pole Joint Modeling**: Single model characterizing D+R+O simultaneously 3. **Standard Evaluation**: Establish LSC characterization benchmarks #### Medium-term (3-5 years) 1. **LLM Applications**: - Few-shot learning to address data scarcity - Generative methods to synthesize historical corpora (Cassotti et al. 2024) 2. **Cross-Lingual Research**: - Validate universal laws of semantic change - Leverage multilingual pretrained models 3. **Causal Analysis**: From "how it changes" to "why it changes" (sociocultural factors) #### Long-term (5+ years) 1. **Laws of Semantic Change**: - Which word classes undergo broadening? - Relationship between frequency and change rate 2. **Application-Driven**: - Historical text machine translation - Dynamic knowledge graph maintenance - Cultural evolution modeling ## In-Depth Evaluation ### Strengths #### Academic Contributions 1. **Fills Research Gap**: First systematic characterization survey, clearly distinguishing identification from characterization 2. **Theoretical Innovation**: - Three-pole taxonomy integrates linguistic and computational perspectives - Formal framework (Section 5) directly guides algorithm design 3. **Comprehensiveness**: - Time span: 2006-2024 - Method coverage: 4 representation types × 3 change types = 12-dimensional analysis - In-depth analysis of 23 core papers #### Methodological Advantages 1. **Literature Search**: Uses Research Rabbit tool for iterative expansion (11→151 papers) 2. **Empirical Validation**: SEMCOR/MASC cases demonstrate framework operationalizability 3. **Visualization**: Figure 1 classification tree, Figure 11 three-dimensional space provide intuitive presentation #### Writing Quality 1. **Clear Structure**: Background→methods→formalization→discussion follows logical progression 2. **Unified Terminology**: Clearly defines LSC, D/R/O and other core concepts 3. **Information-Dense Tables**: Tables 2-4 compress substantial information ### Weaknesses #### Theoretical Level 1. **Sense Objectivity Controversy**: - Assumes word senses can be discretely enumerated ($S(w,t)=\{s_1,...,s_k\}$) - Ignores Wittgenstein's "family resemblance" and usage theory - Response: Authors acknowledge "pragmatic stance" but insufficiently discuss prototype theory 2. **Relational Pole Definition Insufficient**: - Formula (6)'s $l(s_i, s_j)$ calculation not explicitly specified - Metaphor vs. metonymy distinction depends on external resources like ChainNet 3. **Orientational Pole Oversimplification**: - Considers only positive/negative polarity, ignoring multidimensionality of emotion (except VAD) - Circular reasoning problem in seed word selection #### Experimental Level 1. **Insufficient Validation**: - Section 5.7 provides only 2 word case studies, lacking statistical significance - SEMCOR/MASC time span only 20 years, insufficient for demonstrating diachronic change - No comparison with human annotations for validation 2. **Missing Method Comparison**: - Table 3 only classifies, does not quantitatively compare accuracy - Lacks comparative experiments of different representation methods on same tasks 3. **Dataset Limitations**: - Depends on WordNet annotations, but coverage incomplete (slang, neologisms) - Noise in ChainNet/SentiWordNet not discussed #### Coverage Range 1. **Insufficient LLM-Era Methods**: - Only briefly mentions GPT/BERT applications to LSC - Does not discuss prompt engineering, in-context learning and other new paradigms 2. **Multimodal Absence**: Image-text joint modeling could assist sense understanding 3. **Weak Cognitive Linguistics Perspective**: Does not incorporate computational models of conceptual metaphor theory (Lakoff & Johnson) ### Impact Assessment #### Expected Contribution to Field 1. **Paradigm Shift**: Pushes LSC research from detection toward characterization 2. **Method Guidance**: Formal framework directly translates to algorithms (e.g., Algorithm 1 pseudocode) 3. **Dataset Needs**: Calls for three-pole annotated data, potentially catalyzing new benchmarks #### Practical Value 1. **Historical NLP**: Improves historical text understanding (e.g., word sense disambiguation in Shakespeare) 2. **Knowledge Engineering**: Guides Wikidata and similar temporal knowledge graph maintenance 3. **Social Computing**: Tracks semantic evolution on social media (e.g., politicization of "woke") #### Reproducibility - **High**: Formal definitions clear, SEMCOR/MASC publicly available - **Medium**: Some methods (e.g., ChainNet) difficult to access - **Low**: No code repository; readers must implement independently ### Applicable Scenarios #### Suitable Applications 1. **Digital Humanities**: Analyze semantic evolution of key terms in literary works 2. **Dictionary Compilation**: Automatically discover entries needing updates 3. **Sociolinguistics**: Study discourse shifts in social movements (e.g., "feminism") 4. **Low-Resource Languages**: Formal framework transferable to non-English languages #### Unsuitable Scenarios 1. **Real-Time Systems**: Diachronic analysis requires substantial historical data, unsuitable for online applications 2. **Fine-Grained WSD**: Three-pole classification too coarse for subtle semantic distinctions 3. **Causal Inference**: Only describes "how it changes," cannot explain "why it changes" ## Key References (Selected) ### Theoretical Foundations 1. **Traugott (2017)**: Semantic change - authoritative linguistic classification source 2. **Koch (2016)**: Meaning change and semantic shifts - detailed rhetorical mechanisms 3. **Blank (2012)**: Prinzipien des lexikalischen Bedeutungswandels - German semantic change research ### Detection Methods 4. **Hamilton et al. (2016b)**: Diachronic word embeddings reveal statistical laws - static embedding milestone 5. **Giulianelli et al. (2020)**: Analysing lexical semantic change with contextualised word representations - BERT application 6. **Schlechtweg et al. (2020)**: SemEval-2020 Task 1 - standard evaluation task ### Characterization Methods 7. **Inoue et al. (2022)**: Infinite SCAN - topic model dynamically detecting sense quantity 8. **Fonteyn & Manjavacas (2021)**: Adjusting scope - multi-pole joint analysis case study 9. **Ehmüller et al. (2020)**: Sense tree discovery - graph method visualization ### Survey Comparisons 10. **Tahmasebi et al. (2018)**: Survey of computational approaches to LSC - most comprehensive detection survey 11. **Kutuzov et al. (2018)**: Diachronic word embeddings and semantic shifts - word representation model survey --- ## Summary This paper is a **landmark survey** in semantic change research, systematizing the characterization problem for the first time and proposing a three-pole framework (D/R/O) with formal definitions that establish theoretical foundations for subsequent research. Its greatest value lies in: 1. **Clarifying Research Direction**: Identifying gaps in relational pole and multi-pole joint modeling 2. **Unifying Terminology**: Eliminating confusion between detection vs. characterization, broadening vs. generalization 3. **Operationalizability**: Set-theoretic definitions directly translate to algorithms However, the paper has room for improvement in experimental validation, LLM-era method integration, and cognitive linguistics depth. Recommended future work: - Construct large-scale three-pole annotated datasets (e.g., annotating D/R/O changes for 1000 words in COHA) - Develop end-to-end characterization models (e.g., multitask learning predicting all three poles) - Explore LLM zero-shot characterization capabilities (e.g., using GPT-4 to judge metaphorization) For NLP researchers, this paper is **essential reading** for entering the LSC field; for application developers, its formal framework provides **theoretical guidance** for building historical text understanding systems.