2025-11-11T14:37:08.910755

The Tonogenesis Continuum in Tibetan: A Computational Investigation

Liang, Zerong
Tonogenesis-the historical process by which segmental contrasts evolve into lexical tone-has traditionally been studied through comparative reconstruction and acoustic phonetics. We introduce a computational approach that quantifies the functional role of pitch at different stages of this sound change by measuring how pitch manipulation affects automatic speech recognition (ASR) performance. Through analysis on the sensitivity to pitch-flattening from a set of closely related Tibetan languages, we find evidence of a tonogenesis continuum: atonal Amdo dialects tolerate pitch removal the most, while fully tonal U-Tsang varieties show severe degradation, and intermediate Kham dialects fall measurably between these extremes. These gradient effects demonstrate how ASR models implicitly learn the shifting functional load of pitch as languages transition from consonant-based to tone-based lexical contrasts. Our findings show that computational methods can capture fine-grained stages of sound change and suggest that traditional functional load metrics, based solely on minimal pairs, may overestimate pitch dependence in transitional systems where segmental and suprasegmental cues remain phonetically intertwined.
academic

The Tonogenesis Continuum in Tibetan: A Computational Investigation

Basic Information

  • Paper ID: 2510.22485
  • Title: The Tonogenesis Continuum in Tibetan: A Computational Investigation
  • Authors: Siyu Liang, Zhaxi Zerong (University of Washington)
  • Classification: cs.CL (Computational Linguistics)
  • Publication Date: October 26, 2025 (ArXiv Preprint)
  • Paper Link: https://arxiv.org/abs/2510.22485

Abstract

Tonogenesis is the historical linguistic process through which segmental contrasts evolve into lexical tones, traditionally studied through comparative reconstruction and acoustic phonetics. This paper introduces a computational approach that quantifies the functional role of tones at different stages of sound change by measuring the impact of tonal manipulations on automatic speech recognition (ASR) performance. By analyzing the sensitivity of a closely related set of Tibetan dialects to tone flattening, the study provides evidence for a tonogenesis continuum: the toneless Amdo dialect exhibits the highest tolerance for tone removal, the fully tonalized Lhasa dialect shows severe degradation, while the intermediate Kham dialect falls between these two extremes. These gradient effects demonstrate how ASR models implicitly learn the shift in tonal functional load—the transition from consonant-based contrasts to tone-based lexical distinctions.

Research Background and Motivation

Core Research Question

The central question addressed by this study is how to quantify the degree of a language's dependence on tones at different stages of the tonogenesis process. Traditional tonogenesis research has relied primarily on comparative reconstruction and acoustic phonetics methods, lacking quantitative computational tools to precisely measure the functional load of tones in lexical distinction.

Significance of the Problem

  1. Theoretical Significance: Tonogenesis is an important research area in historical linguistics; understanding this process helps reveal universal principles of language evolution
  2. Practical Value: Provides important guidance for developing ASR systems for multi-dialectal languages such as Tibetan
  3. Methodological Contribution: Offers a novel computational approach to studying typological linguistic questions

Limitations of Existing Methods

  1. Traditional Functional Load Measurement: Methods based solely on minimal pair counting cannot adequately reflect the complex interactions between segmental and suprasegmental cues in transitional tonal systems
  2. Static Analysis: Existing methods struggle to capture fine-grained stage changes during tonogenesis
  3. Subjectivity: Relies on expert judgment, lacking objective quantitative standards

Research Motivation

Tibetan languages provide an ideal laboratory for studying the tonogenesis continuum: Amdo dialects maintain toneless features, Lhasa dialects are fully tonalized, and Kham dialects occupy an intermediate transitional stage. Computational methods can objectively quantify this continuous variation.

Core Contributions

  1. Proposed a computational method based on tone flattening: Systematically removes f0 contours to quantify a language's dependence on tones
  2. Verified the Tibetan tonogenesis continuum: Provides quantitative evidence supporting a gradient of tonalization across Amdo-Kham-Lhasa
  3. Revealed implicit learning capabilities of ASR models: Demonstrates that ASR systems automatically learn and reflect changes in tonal functional load
  4. Challenged traditional functional load theory: Shows that traditional measurement methods based on minimal pairs may overestimate tonal dependence in transitional systems

Methodology Details

Task Definition

Input: Speech data from different Tibetan dialects Output: ASR performance differences for each dialect under original vs. tone-flattened conditions Objective: Quantify each dialect's dependence on tones through the degree of performance degradation

Model Architecture

Data Processing Pipeline

  1. Data Source: TIBMD@MUC corpus containing 6 Tibetan dialects
  2. Transcription Conversion: Convert Tibetan script to Wylie romanization system
  3. Audio Preprocessing: Resample to 16kHz, character-level tokenization

ASR Model

  • Base Model: XLS-R 300m (cross-lingual self-supervised speech representation model)
  • Fine-tuning Strategy: Separate model fine-tuning for each dialect
  • Training Configuration: CTC loss, AdamW optimizer, learning rate 3×10^-4

Tone Flattening Technique

  • Method: PSOLA algorithm via Praat
  • Operation: Replace natural f0 contour of each utterance with its mean pitch
  • Preserved Features: Spectral envelope and temporal structure maintained

Technical Innovations

  1. Tone Flattening Methodology: First systematic application of PSOLA tone flattening to tonogenesis research
  2. Cross-dialect Comparison Framework: Establishes unified evaluation framework for comparing languages with different degrees of tonalization
  3. ASR as Linguistic Tool: Innovatively uses ASR performance as a quantitative metric for typological linguistic features

Experimental Setup

Dataset

Dialect GroupDialectDuration (hours)SpeakersUtterances
AmdoXiahe4.1223549
Aba8.1626546
KhamChamdo2.7972558
Dege2.3131245
LhasaLhasa37.384830349
Shigatse15.15410729

Evaluation Metrics

  • Character Error Rate (CER): Character-level recognition error rate
  • Word Error Rate (WER): Word-level recognition error rate
  • Performance Degradation (Δ): Error rate increment after tone flattening

Comparison Conditions

  • Original Condition: Speech with complete tonal information preserved
  • Flattened Condition: Speech with f0 variation removed

Implementation Details

  • Batch Size: 4-8 (adjusted according to GPU memory)
  • Training Steps: 2000
  • Warmup Steps: 500
  • Gradient Accumulation: Maintains effective batch size of 16

Experimental Results

Main Results

LanguageTonal StatusOriginal CERFlattened CERΔCEROriginal WERFlattened WERΔWER
Amdo Group
XiaheToneless0.1140.1390.0250.3200.3780.058
AbaToneless0.1820.2020.0200.5250.5630.038
Lhasa Group
LhasaTonalized0.1770.2370.0600.4860.5930.107
ShigatseTonalized0.4900.6290.1390.1750.2500.075
Kham Group
ChamdoTonalized0.2470.3030.0560.5230.6130.090
DegeTonalized0.4750.4920.0170.9020.9170.015

Key Findings

  1. Tonogenesis Continuum Verification:
    • Amdo dialects: Average ΔCER = 0.023, showing minimal tonal dependence
    • Lhasa dialects: Average ΔCER = 0.100, displaying strong tonal dependence
    • Kham dialects: ΔCER intermediate between the two, confirming intermediate status
  2. Gradient Pattern: Performance degradation degree perfectly aligns with linguistically described tonalization degree
  3. Dege Anomaly: Dege Kham dialect shows smaller performance degradation, possibly reflecting training data limitations or residual segmental cues

Experimental Findings

  1. ASR Implicit Learning: ASR models automatically learn and reflect tonal functional load variations across dialects
  2. Challenge to Traditional Theory: Pure minimal pair-based functional load measurement cannot adequately capture the complexity of transitional systems
  3. Continuity Evidence: Tonogenesis is indeed a continuous process rather than discrete stage transitions

Tonogenesis Research

  • Classical Theory: Foundational work by Haudricourt (1954) and Hombert (1977)
  • Southeast Asian Studies: Tonogenesis processes in Vietnamese, Khmer, and related languages
  • Tibetan Studies: Sun (2015) on Tibetan tonal diversity

ASR and Tones

  • Tone Modeling: Two main approaches—direct tonal feature integration and explicit tone annotation
  • Tone Flattening Research: Methodological foundation established by Liang and Levow (2025)
  • Cross-lingual ASR: Development of multilingual models such as XLS-R

Functional Load Theory

  • Traditional Methods: Static measurement based on minimal pair counting
  • Limitations: Cannot handle interactions between segmental and suprasegmental cues
  • New Directions: Possibilities for dynamic assessment provided by computational methods

Conclusions and Discussion

Main Conclusions

  1. Continuum Verification: Tibetan dialects indeed exhibit a continuum pattern of tonogenesis
  2. Computational Method Validity: Tone flattening technique effectively quantifies tonal functional load
  3. ASR as Research Tool: ASR systems can serve as effective tools for typological linguistic research
  4. Theoretical Contribution: Challenges the static perspective of traditional functional load theory

Limitations

  1. Data Limitations:
    • Covers only 6 Tibetan dialects, cannot represent complete dialectal diversity
    • Training and test data may contain overlapping speakers, affecting generalization assessment
    • Test sets are relatively small (approximately 30 minutes per dialect)
  2. Methodological Limitations:
    • Historical nature of Tibetan orthography introduces transcription inconsistencies
    • Tone flattening may not completely remove all tonal cues
    • Lacks fine-grained analysis of specific confusion patterns
  3. Theoretical Limitations:
    • Insufficient consideration of other prosodic features' effects
    • Limited understanding of mechanisms underlying segmental-suprasegmental interactions in transitional systems

Future Directions

  1. Extended Research:
    • Include more Tibetan dialects and other language families
    • Develop speaker-independent evaluation frameworks
    • Conduct larger-scale data collection
  2. Methodological Improvements:
    • Integrate voice quality features such as breathiness and aspiration
    • Develop more refined tone manipulation techniques
    • Establish multimodal methods for measuring tonal dependence
  3. Application Extensions:
    • Develop adaptive multi-dialect ASR systems
    • Explore real-time tonalization degree detection
    • Apply to language preservation and documentation work

In-Depth Evaluation

Strengths

  1. Methodological Innovation:
    • First to use ASR performance as a quantitative metric for tonal functional load
    • Systematic application of tone flattening technique has methodological value
    • Interdisciplinary fusion of computational linguistics and historical linguistics
  2. Experimental Sufficiency:
    • Covers key nodes of the tonogenesis continuum
    • Rigorous experimental design with clear control conditions
    • Results highly consistent with linguistic theory
  3. Result Convincingness:
    • Quantitative results support qualitative linguistic descriptions
    • Gradient pattern clearly demonstrates continuum characteristics
    • Statistical results are significant
  4. Writing Clarity:
    • Clear structure with rigorous logic
    • Accurate technical detail descriptions
    • Sufficient background introduction for interdisciplinary audience

Weaknesses

  1. Data Scale Limitations:
    • Insufficient training data for certain dialects may affect result reliability
    • Speaker overlap issues require stricter control
    • Lacks independent validation dataset
  2. Methodological Limitations:
    • Tone flattening may not completely isolate tonal cues
    • Fails to account for confounding effects of other prosodic features
    • ASR model architecture bias may influence results
  3. Analysis Depth:
    • Lacks analysis of specific confusion patterns
    • Insufficient exploration of Dege anomaly causes
    • Theoretical explanation of transition mechanisms not sufficiently deep

Impact

  1. Academic Contribution:
    • Provides new computational tools for tonogenesis research
    • Advances application of computational linguistics in linguistic typology
    • Offers new perspective for functional load theory development
  2. Practical Value:
    • Provides guidance for multi-dialect ASR system design
    • Facilitates language preservation and documentation work
    • Applicable to research on other tonal languages
  3. Reproducibility:
    • Detailed method descriptions with clear technical pathways
    • Uses open-source models and tools
    • Complete hyperparameter specifications

Applicable Scenarios

  1. Linguistic Typology Research: Quantify degree of language feature changes
  2. Multilingual ASR Development: Guide tone-sensitive system design
  3. Language Preservation Work: Rapidly assess dialect tonalization degree
  4. Historical Linguistics: Verify theoretical hypotheses of sound change

References

This paper cites abundant relevant literature, including:

  • Classical Tonogenesis Theory: Haudricourt (1954), Hombert (1977)
  • Tibetan Studies: Sun (2015), Gesang and Gesang (2002), DeLancey (2017)
  • ASR and Tones: Fu et al. (1998), Zhang and Kirby (2020)
  • Functional Load Theory: Surendran and Levow (2004)
  • Technical Foundation: Babu et al. (2021) - XLS-R model

This research successfully introduces computational methods into traditional historical linguistic research, providing new quantitative tools for understanding tonogenesis, an important linguistic phenomenon. Despite certain data and methodological limitations, its innovative research approach and convincing experimental results establish an important foundation for future development in this field.