Simple stochastic processes behind Menzerath's Law
MiliÄka
This paper revisits Menzerath's Law, also known as the Menzerath-Altmann Law, which models a relationship between the length of a linguistic construct and the average length of its constituents. Recent findings indicate that simple stochastic processes can display Menzerathian behaviour, though existing models fail to accurately reflect real-world data. If we adopt the basic principle that a word can change its length in both syllables and phonemes, where the correlation between these variables is not perfect and these changes are of a multiplicative nature, we get bivariate log-normal distribution. The present paper shows, that from this very simple principle, we obtain the classic Altmann model of the Menzerath-Altmann Law. If we model the joint distribution separately and independently from the marginal distributions, we can obtain an even more accurate model by using a Gaussian copula. The models are confronted with empirical data, and alternative approaches are discussed.
academic
Simple stochastic processes behind Menzerath's Law
This paper revisits Menzerath's Law (also known as the Menzerath-Altmann Law), which describes the relationship between the length of linguistic constructions and the average length of their constituent components. Recent research has demonstrated that simple stochastic processes can exhibit Menzerath behavior, yet existing models fail to accurately reflect real-world data. By adopting the fundamental principle that lexical items can vary in length across both syllabic and phonemic dimensions, where correlations between these variables are imperfect and variations exhibit multiplicative properties, we obtain a bivariate lognormal distribution. This paper demonstrates that from this remarkably simple principle, we can derive the classical Altmann model. By independently modeling joint distributions and marginal distributions separately, we can obtain more accurate models using Gaussian copulas.
Problem to be Addressed: Menzerath's Law is an important principle in linguistics describing the inverse relationship between the length of linguistic constructions (such as lexical items) and the average length of their constituent components. Although the law has been extensively verified empirically, it lacks satisfactory theoretical explanation and stochastic process foundations.
Importance of the Problem: Menzerath's Law has attracted considerable attention in quantitative linguistics due to its universality and ability to integrate different segmentation levels into a unified framework. Understanding the stochastic processes underlying it is significant for theories of language evolution and quantitative linguistics.
Limitations of Existing Approaches:
Torre et al. (2021) demonstrated that simple stochastic processes can exhibit Menzerath behavior, but the models do not conform to real data
The classical Altmann model (1980) lacks stochastic process derivation and parameter interpretation
Existing models primarily focus on text generation processes while neglecting the mechanisms determining lexical length variation in language evolution
Research Motivation: The author argues that Menzerath's Law should be understood from the perspective of language evolution rather than text generation, and proposes explaining the stochastic process foundations of the law through joint distribution modeling.
Investigates the joint distribution between linguistic construction length (such as the number of syllables x in lexical items) and constituent component length (such as the number of phonemes y), and derives the form of Menzerath's Law from this distribution.
Basic Principle: Assumes that lexical length variation exhibits multiplicative properties, meaning longer words are more likely to undergo length changes than shorter words.
Mathematical Derivation:
Begins with linear regression of log-transformed variables:
log z = α + β log x
where z = xy
Parameter interpretation:
β = ρ_log x,log z × (s_log z / s_log x)
α = log z̅ - β log x̅
Bivariate Lognormal Distribution represents a linguistically reasonable stochastic principle capable of modeling construction length variation across constituent and sub-constituent components
Gaussian Copula is an effective tool for modeling joint distributions, demonstrating superior performance when focused on joint distribution modeling
Joint Distribution Modeling should be prioritized over mean modeling, providing more information
In practical applications, one should consider using robust parameter estimates of marginal distributions and correlation coefficients
Altmann, G. (1980). Prolegomena to Menzerath's law
Menzerath, P. (1954). Die Architektonik des deutschen Wortschatzes
Torre, I. G., et al. (2021). Can Menzerath's law be a criterion of complexity in communication?
Milička, J. (2023). Menzerath's law: Is it just regression toward the mean?
This paper makes important theoretical contributions to Menzerath's Law research, providing new perspectives for understanding the classical law through stochastic process modeling, possessing considerable academic value and practical significance.