Language evolves dynamically, reflecting sociocultural changes through neologisms or semantic shifts of existing words. Understanding word meanings is crucial for interpreting texts across different cultures, domains, or time periods, and directly impacts the performance of NLP applications such as machine translation, information retrieval, and question-answering systems. While existing methods have achieved good accuracy in detecting semantic change, there is a lack of systematic research on how to characterize the types of semantic changes. This survey provides the first comprehensive review of existing methods for characterizing lexical semantic change, formally defining three categories of change: dimensional change (broadening or narrowing of word meaning), orientational change (shift toward more negative or positive connotations), and relational change (transformation of word meaning through rhetorical devices such as metaphor or metonymy). The paper summarizes major research findings, analyzes current limitations, and identifies future research directions.
Lexical Semantic Change (LSC) is a core phenomenon in natural language evolution. Existing research primarily focuses on detecting whether semantic change occurs, but there is a severe shortage of research on characterizing how it changes. For example:
This paper is the first systematic survey of semantic change characterization, aiming to:
Given word with representations in two corpora at times , determine whether change occurs: where (binary classification) or (continuous distance)
Building on detection, further classify the type of change:
Word undergoes change between if and only if:
\text{True} & S(w, t_1) \neq S(w, t_2) \\ \text{False} & \text{otherwise} \end{cases}$$ #### Three-Pole Definitions **1. Dimensional Change (Dimension)** $$|S(w, t_1)| \neq |S(w, t_2)|$$ - Broadening: $|S(w, t_1)| < |S(w, t_2)|$ (increase in senses) - Narrowing: $|S(w, t_1)| > |S(w, t_2)|$ (decrease in senses) **Example**: - "plane" has 5 senses in SEMCOR (plane, aircraft, planer, etc.) but only 2 in MASC → narrowing **2. Orientational Change (Orientation)** Define sentiment function $f: V \times T \rightarrow \{-1, 0, +1\}$, then: $$f(w, t_1) \neq f(w, t_2)$$ - Amelioration: $f(w, t_1) < f(w, t_2)$ (shift toward positive) - Pejoration: $f(w, t_1) > f(w, t_2)$ (shift toward negative) **Implementation**: Weighted sum of SentiWordNet scores $$f(w, t) = \frac{1}{N}\sum_{i=1}^{N} p(s_i) \cdot \text{positive}(s_i)$$ **Example**: - "heart" has $f=0.15$ in SEMCOR and $f=0.97$ in MASC → amelioration **3. Relational Change (Relation)** Define relational similarity $l: S \times S \rightarrow \mathbb{R}$, total relational strength: $$R(w, t) = \sum_{i=1}^{N-1}\sum_{j=i+1}^{N} l(s_i, s_j), \quad s_i, s_j \in S(w, t)$$ - Increase: $R(w, t_1) < R(w, t_2)$ (more metaphorical/metonymical uses) **Example**: - "heart" expands from literal sense "cardiac organ" to metaphorical senses "core," "courage" → strengthened relations ### Technical Innovations 1. **Set-Theoretic Formalization**: First rigorous mathematical formulation of LSC characterization, eliminating ambiguity 2. **Pole Symmetry**: Three poles naturally pair (broadening/narrowing share dimensional measurement), simplifying computational framework 3. **Operationalizability**: Definitions directly translate to algorithms (e.g., sense counting, sentiment scoring, relational graph analysis) 4. **Cambridge Perspective**: Adopts static comparison (contrasting two corpora) rather than McTaggart dynamic tracking, suitable for computational methods ## Experimental Setup ### Dataset Classification #### Diachronic Corpora (Table 2) | Corpus | Language | Time Span | Scale | Characteristics | |--------|----------|-----------|-------|-----------------| | **COHA** | English | 1810s-2000s | 400M words | Most commonly used, balanced genres | | **Google N-Gram** | Multilingual | 1600-2009 | 300B words | Largest scale, but noisy | | **DTA** | German | 1741-1900 | 1022 texts | High quality, manually selected | | **CLMET** | English | 1710-1920 | 34M words | Primarily literary works | #### Demonstration Datasets - **SEMCOR** (1993): 200K words, WordNet sense annotations - **MASC** (2013): 500K words, modern American English - **Annotation Sources**: - Senses: WordNet - Relations: ChainNet (metaphor/metonymy links) - Orientation: SentiWordNet (positive/negative scores) ### Evaluation Dimensions As a survey paper, this work does not provide unified evaluation metrics, but analyzes existing methods' evaluation approaches: #### Dimensional Pole (D) - **Metrics**: Sense quantity change, clustering density, topic count - **Data Sources**: Dictionaries, sense-induced clustering, topic models #### Orientational Pole (O) - **Metrics**: Distance from seed words, VAD framework scores (Valence-Arousal-Dominance) - **Challenges**: Seed word stability assumptions, irony/negation handling #### Relational Pole (R) - **Metrics**: Entropy increase (Schlechtweg 2017), relational graph edge count - **Issues**: Difficulty distinguishing metaphor vs. new homonymy ### Method Classification (Table 3 Core) | Method | D | R | O | Representation | |--------|---|---|---|-----------------| | Biemann 2006 | ✓ | - | - | Graph | | Tang et al. 2013 | ✓ | ✓ | - | Frequency | | Hamilton et al. 2016a | - | - | ✓ | Graph (SentiProp) | | Inoue et al. 2022 | ✓ | - | - | Topics (InfiniteSCAN) | | Giulianelli et al. 2020 | ✓ | - | - | Embeddings (BERT) | | Fonteyn & Manjavacas 2021 | - | ✓ | ✓ | Embeddings | **Key Findings**: - **No method covers all three poles**: Characterization complexity is high - **Dimensional pole most researched**: 18/23 methods - **Relational pole most underdeveloped**: Only 3 methods - **Embedding methods dominant**: Recent trend ## Experimental Results ### Framework Validation (Section 5.7) #### Case 1: Multi-Pole Change of "heart" **Data** (SEMCOR → MASC): ``` Sense distribution changes: - heart.n.02 (organ, literal): 34.8% → 0% - heart.n.03 (courage, metaphorical+): 12.1% → 90.1% - heart.n.10 (poker suit, new): 0% → 2.8% ``` **Computational Results**: 1. **Dimension**: $|S|: 5 \rightarrow 3$, narrowing 2. **Orientation**: $f: 0.15 \rightarrow 0.97$, strong amelioration 3. **Relation**: Metaphorical uses dominant (90.1%), strengthened relations **Interpretation**: Literal sense "heart" disappears, metaphorical sense "courage/core" becomes prototypical #### Case 2: Narrowing of "plane" **Data**: ``` SEMCOR: 5 senses (aircraft 48.8%, plane 37.2%, planer 4.7%, etc.) MASC: 2 senses (aircraft 90.9%, plane 9.1%) ``` **Computational Results**: 1. **Dimension**: $5 \rightarrow 2$, significant narrowing 2. **Orientation**: Loss of positive sense (flat.s.01, +0.375) → slight pejoration 3. **Relation**: $R: 1 \rightarrow 0$ (metonymic relation between plane.n.03 and plane.n.02 disappears) ### Method Comparison Analysis (Table 4) #### Frequency Methods **Advantages**: - Simple and interpretable - Suitable for detecting neologisms - Low data requirements **Disadvantages**: - Cannot distinguish senses (polysemy problem) - Difficult to capture semantic similarity - Sensitive to irony/negation **Applicable Scenarios**: Seed word co-occurrence statistics for orientational pole #### Topic Models **Advantages**: - Unsupervised discovery of new senses - Visualization of topic evolution - InfiniteSCAN dynamically adjusts topic count **Disadvantages**: - Requires manual topic interpretation - Topic granularity difficult to control - Research gaps in relational and orientational poles **Representative Works**: - SCAN (Frermann & Lapata 2016) - InfiniteSCAN (Inoue et al. 2022): Automatic detection of sense quantity changes #### Graph Methods **Advantages**: - Natural representation of word relations - Visualization of sense evolution trees (Ehmüller et al. 2020) - Suitable for sentiment propagation (SentiProp) **Disadvantages**: - Dependent on graph construction quality - High computational complexity - Severe underexploration of relational pole **Representative Works**: - Chinese Whispers clustering (Biemann 2006) - Ego-network + PMI filtering (Ehmüller et al. 2020) #### Embedding Methods **Advantages**: - Capture subtle semantic changes - BERT and contextual embeddings improve performance - Density embeddings (word2gauss) model polysemy **Disadvantages**: - **Meaning Conflation Deficiency**: Single vectors cannot distinguish fine-grained senses - Instability for low-frequency words - Contextual embeddings over-contextualize → false positives **Representative Works**: - Diachronic embeddings (Hamilton et al. 2016b) - Gaussian embeddings (Moss 2020, Yüksel et al. 2021) - XL-LEXEME (Cassotti et al. 2023): Cross-lingual WSD pretraining ### Important Findings 1. **Characterization is harder than detection**: SemEval-2020 shows contextual embeddings did not surpass static embeddings in LSC detection; characterization requires specialized design 2. **Data Bottleneck**: Historical corpora at million scale vs. modern LLMs requiring trillion scale → need few-shot learning 3. **Multilingual Scarcity**: 90% of research focuses only on English 4. **Relational Pole Gap**: Only 3 papers, no standard dataset 5. **Evaluation Difficulty**: Lack of gold standards, mostly qualitative analysis ## Related Work ### Comparison with Existing Surveys | Survey | Year | Focus | Difference from This Work | |--------|------|-------|---------------------------| | **Tang 2018** | 2018 | Four-step framework (corpus→sense→modeling→validation) | Focuses on detection, characterization only briefly discussed | | **Tahmasebi et al. 2018** | 2018 | Word-level/sense-level distinction, lexical replacement | Recommends deeper characterization research | | **Kutuzov et al. 2018** | 2018 | Word representation models and data | Points out insufficient validation of classification schemes | | **Montanelli & Periti 2023** | 2023 | Contextual embedding methods | Calls for research on "laws of semantic shift" | | **This Work** | 2025 | **Characterization three-poles + formalization** | First systematic characterization survey | ### Theoretical Foundations #### Linguistic Classification (Traugott 2017) - **Broadening/Narrowing**: Range of word meaning changes - **Amelioration/Pejoration**: Change in sentiment value - **Metaphorization/Metonymization**: Change in rhetorical mechanisms #### Computational Perspective Classification - **Cambridge Perspective**: Static comparison of two corpora (adopted in this work) - **McTaggart Perspective**: Dynamic tracking of evolution process (requires historical knowledge) ### Evolution of Sense Representation 1. **Early Period**: Frequency + co-occurrence matrices (Sagi et al. 2009) 2. **2010s**: Topic models (Lau et al. 2012), graph clustering (Biemann 2006) 3. **2016+**: Static embeddings (Hamilton et al. 2016b) 4. **2019+**: BERT and contextual embeddings (Giulianelli et al. 2020) 5. **Future**: LLM generative methods (Cassotti et al. 2024) ## Conclusions and Discussion ### Main Conclusions 1. **Characterization Research Severely Underdeveloped**: Detection vs. characterization paper ratio approximately 9:1 2. **Three-Pole Imbalance**: Dimensional pole (D) well-researched, relational pole (R) nearly absent 3. **Method Fragmentation**: Lack of unified framework and evaluation standards 4. **Formalization Necessity**: Set-theoretic definitions can eliminate ambiguity and promote method comparison 5. **Data Challenges**: Historical corpus scale limitations restrict deep learning applications ### Limitations #### Methodological Limitations 1. **Simplifying Assumptions**: Sense objectivism ignores context-dependency 2. **Binary Classification Limitations**: Broadening/narrowing cannot describe changes in word meaning intension (connotation) 3. **Relational Pole Definition Ambiguity**: Difficult to distinguish metaphor vs. metonymy vs. new homonymy #### Data Limitations 1. **Corpus Bias**: - Balanced corpora like COHA still have genre bias - Google N-Gram has high noise (OCR errors) 2. **Annotation Lag**: Dictionary adoption of new senses lags 5-10 years 3. **Multilingual Scarcity**: Non-English research <10% #### Evaluation Limitations 1. **Lack of Gold Standards**: Most work relies on qualitative analysis 2. **Seed Word Stability**: Orientational pole assumes seed words don't change (they actually do) 3. **Threshold Subjectivity**: Binary classification change thresholds lack consensus ### Future Directions #### Short-term (1-2 years) 1. **Relational Pole Breakthrough**: - Construct metaphor/metonymy annotated datasets - Leverage knowledge graphs (Wikidata) to model conceptual relations 2. **Multi-Pole Joint Modeling**: Single model characterizing D+R+O simultaneously 3. **Standard Evaluation**: Establish LSC characterization benchmarks #### Medium-term (3-5 years) 1. **LLM Applications**: - Few-shot learning to address data scarcity - Generative methods to synthesize historical corpora (Cassotti et al. 2024) 2. **Cross-Lingual Research**: - Validate universal laws of semantic change - Leverage multilingual pretrained models 3. **Causal Analysis**: From "how it changes" to "why it changes" (sociocultural factors) #### Long-term (5+ years) 1. **Laws of Semantic Change**: - Which word classes undergo broadening? - Relationship between frequency and change rate 2. **Application-Driven**: - Historical text machine translation - Dynamic knowledge graph maintenance - Cultural evolution modeling ## In-Depth Evaluation ### Strengths #### Academic Contributions 1. **Fills Research Gap**: First systematic characterization survey, clearly distinguishing identification from characterization 2. **Theoretical Innovation**: - Three-pole taxonomy integrates linguistic and computational perspectives - Formal framework (Section 5) directly guides algorithm design 3. **Comprehensiveness**: - Time span: 2006-2024 - Method coverage: 4 representation types × 3 change types = 12-dimensional analysis - In-depth analysis of 23 core papers #### Methodological Advantages 1. **Literature Search**: Uses Research Rabbit tool for iterative expansion (11→151 papers) 2. **Empirical Validation**: SEMCOR/MASC cases demonstrate framework operationalizability 3. **Visualization**: Figure 1 classification tree, Figure 11 three-dimensional space provide intuitive presentation #### Writing Quality 1. **Clear Structure**: Background→methods→formalization→discussion follows logical progression 2. **Unified Terminology**: Clearly defines LSC, D/R/O and other core concepts 3. **Information-Dense Tables**: Tables 2-4 compress substantial information ### Weaknesses #### Theoretical Level 1. **Sense Objectivity Controversy**: - Assumes word senses can be discretely enumerated ($S(w,t)=\{s_1,...,s_k\}$) - Ignores Wittgenstein's "family resemblance" and usage theory - Response: Authors acknowledge "pragmatic stance" but insufficiently discuss prototype theory 2. **Relational Pole Definition Insufficient**: - Formula (6)'s $l(s_i, s_j)$ calculation not explicitly specified - Metaphor vs. metonymy distinction depends on external resources like ChainNet 3. **Orientational Pole Oversimplification**: - Considers only positive/negative polarity, ignoring multidimensionality of emotion (except VAD) - Circular reasoning problem in seed word selection #### Experimental Level 1. **Insufficient Validation**: - Section 5.7 provides only 2 word case studies, lacking statistical significance - SEMCOR/MASC time span only 20 years, insufficient for demonstrating diachronic change - No comparison with human annotations for validation 2. **Missing Method Comparison**: - Table 3 only classifies, does not quantitatively compare accuracy - Lacks comparative experiments of different representation methods on same tasks 3. **Dataset Limitations**: - Depends on WordNet annotations, but coverage incomplete (slang, neologisms) - Noise in ChainNet/SentiWordNet not discussed #### Coverage Range 1. **Insufficient LLM-Era Methods**: - Only briefly mentions GPT/BERT applications to LSC - Does not discuss prompt engineering, in-context learning and other new paradigms 2. **Multimodal Absence**: Image-text joint modeling could assist sense understanding 3. **Weak Cognitive Linguistics Perspective**: Does not incorporate computational models of conceptual metaphor theory (Lakoff & Johnson) ### Impact Assessment #### Expected Contribution to Field 1. **Paradigm Shift**: Pushes LSC research from detection toward characterization 2. **Method Guidance**: Formal framework directly translates to algorithms (e.g., Algorithm 1 pseudocode) 3. **Dataset Needs**: Calls for three-pole annotated data, potentially catalyzing new benchmarks #### Practical Value 1. **Historical NLP**: Improves historical text understanding (e.g., word sense disambiguation in Shakespeare) 2. **Knowledge Engineering**: Guides Wikidata and similar temporal knowledge graph maintenance 3. **Social Computing**: Tracks semantic evolution on social media (e.g., politicization of "woke") #### Reproducibility - **High**: Formal definitions clear, SEMCOR/MASC publicly available - **Medium**: Some methods (e.g., ChainNet) difficult to access - **Low**: No code repository; readers must implement independently ### Applicable Scenarios #### Suitable Applications 1. **Digital Humanities**: Analyze semantic evolution of key terms in literary works 2. **Dictionary Compilation**: Automatically discover entries needing updates 3. **Sociolinguistics**: Study discourse shifts in social movements (e.g., "feminism") 4. **Low-Resource Languages**: Formal framework transferable to non-English languages #### Unsuitable Scenarios 1. **Real-Time Systems**: Diachronic analysis requires substantial historical data, unsuitable for online applications 2. **Fine-Grained WSD**: Three-pole classification too coarse for subtle semantic distinctions 3. **Causal Inference**: Only describes "how it changes," cannot explain "why it changes" ## Key References (Selected) ### Theoretical Foundations 1. **Traugott (2017)**: Semantic change - authoritative linguistic classification source 2. **Koch (2016)**: Meaning change and semantic shifts - detailed rhetorical mechanisms 3. **Blank (2012)**: Prinzipien des lexikalischen Bedeutungswandels - German semantic change research ### Detection Methods 4. **Hamilton et al. (2016b)**: Diachronic word embeddings reveal statistical laws - static embedding milestone 5. **Giulianelli et al. (2020)**: Analysing lexical semantic change with contextualised word representations - BERT application 6. **Schlechtweg et al. (2020)**: SemEval-2020 Task 1 - standard evaluation task ### Characterization Methods 7. **Inoue et al. (2022)**: Infinite SCAN - topic model dynamically detecting sense quantity 8. **Fonteyn & Manjavacas (2021)**: Adjusting scope - multi-pole joint analysis case study 9. **Ehmüller et al. (2020)**: Sense tree discovery - graph method visualization ### Survey Comparisons 10. **Tahmasebi et al. (2018)**: Survey of computational approaches to LSC - most comprehensive detection survey 11. **Kutuzov et al. (2018)**: Diachronic word embeddings and semantic shifts - word representation model survey --- ## Summary This paper is a **landmark survey** in semantic change research, systematizing the characterization problem for the first time and proposing a three-pole framework (D/R/O) with formal definitions that establish theoretical foundations for subsequent research. Its greatest value lies in: 1. **Clarifying Research Direction**: Identifying gaps in relational pole and multi-pole joint modeling 2. **Unifying Terminology**: Eliminating confusion between detection vs. characterization, broadening vs. generalization 3. **Operationalizability**: Set-theoretic definitions directly translate to algorithms However, the paper has room for improvement in experimental validation, LLM-era method integration, and cognitive linguistics depth. Recommended future work: - Construct large-scale three-pole annotated datasets (e.g., annotating D/R/O changes for 1000 words in COHA) - Develop end-to-end characterization models (e.g., multitask learning predicting all three poles) - Explore LLM zero-shot characterization capabilities (e.g., using GPT-4 to judge metaphorization) For NLP researchers, this paper is **essential reading** for entering the LSC field; for application developers, its formal framework provides **theoretical guidance** for building historical text understanding systems.