2025-11-10T02:47:56.247933

Strong consistency of pseudo-likelihood parameter estimator for univariate Gaussian mixture models

Lember, Kangro, Kuljus

We consider a new method for estimating the parameters of univariate Gaussian mixture models. The method relies on a nonparametric density estimator $\hat{f}_n$ (typically a kernel estimator). For every set of Gaussian mixture components, $\hat{f}_n$ is used to find the best set of mixture weights. That set is obtained by minimizing the $L_2$ distance between $\hat{f}_n$ and the Gaussian mixture density with the given component parameters. The densities together with the obtained weights are then plugged in to the likelihood function, resulting in the so-called pseudo-likelihood function. The final parameter estimators are the parameter values that maximize the pseudo-likelihood function together with the corresponding weights. The advantages of the pseudo-likelihood over the full likelihood are: 1) its arguments are the means and variances only, mixture weights are also functions of the means and variances; 2) unlike the likelihood function, it is always bounded above. Thus, the maximizer of the pseudo-likelihood function -- referred to as the pseudo-likelihood estimator -- always exists. In this article, we prove that the pseudo-likelihood estimator is strongly consistent.

academic

Strong consistency of pseudo-likelihood parameter estimator for univariate Gaussian mixture models

Basic Information

Paper ID: 2510.14482
Title: Strong consistency of pseudo-likelihood parameter estimator for univariate Gaussian mixture models
Authors: Jüri Lember, Raul Kangro, Kristi Kuljus (Institute of Mathematics and Statistics, University of Tartu, Estonia)
Classification: math.ST stat.TH
Publication Date: October 16, 2025
Paper Link: https://arxiv.org/abs/2510.14482

Abstract

This paper proposes a novel method for estimating parameters of univariate Gaussian mixture models. The method is based on a nonparametric density estimator $\hat{f}_n$ (typically a kernel estimator). For each given set of Gaussian mixture component parameters, optimal mixing weights are found by minimizing the $L_2$ distance between $\hat{f}_n$ and the Gaussian mixture density. The obtained weights are then substituted together with the density into the likelihood function, forming the so-called pseudo-likelihood function. The final parameter estimator is the parameter value and corresponding weights that maximize the pseudo-likelihood function. Compared to the complete likelihood, the pseudo-likelihood has two advantages: 1) its parameters consist only of means and variances, with mixing weights also being functions of means and variances; 2) unlike the likelihood function, it is always bounded. Therefore, the maximizer of the pseudo-likelihood function—the pseudo-likelihood estimator—always exists. This paper proves the strong consistency of the pseudo-likelihood estimator.

Research Background and Motivation

Problem Background

Unboundedness of likelihood for Gaussian mixture models: The likelihood function of Gaussian mixture models is unbounded, a well-known problem. When the variances of certain components approach zero, the likelihood function may tend to infinity.
Limitations of existing solutions:
- Restricting the parameter space
- Using sieve methods
- Penalized maximum likelihood estimation
- Bayesian methods
- Profile likelihood, etc.
These methods typically require imposing restrictions or penalty terms on variances.
Research motivation:
- Provide a method that does not require any restrictions on parameters
- Maintain similarity with standard maximum likelihood estimation
- Ensure existence and consistency of the estimator

Why It Matters

Gaussian mixture models are widely applied in statistics and machine learning
The unbounded likelihood problem hinders the application of standard MLE
There is a need for theoretically reliable and practically feasible estimation methods

Core Contributions

Proposes the pseudo-likelihood method: A novel parameter estimation method that determines mixing weights through $L_2$ distance minimization and then constructs the pseudo-likelihood function.
Proves strong consistency: Under i.i.d. sample assumptions, proves the strong consistency of the pseudo-likelihood estimator: $\hat{\theta}_n \xrightarrow{a.s.} \theta^*$ and $v_n(\hat{\theta}_n) \xrightarrow{a.s.} w^*$ .
No parameter restrictions: The method does not require imposing lower bounds on variances or other constraints.
Theoretical framework: Establishes a complete theoretical framework for handling unbounded means, vanishing or unbounded variances.

Methodology Details

Problem Definition

Given i.i.d. observations $Y_1, \ldots, Y_n$ from a $k$ -component univariate Gaussian mixture distribution, the goal is to estimate:

Component parameters: $\theta_i = (\mu_i, \sigma_i)$ , $i = 1, \ldots, k$
Mixing weights: $w_i > 0$ , $\sum_{i=1}^k w_i = 1$

The true density is: $f(\cdot) = \sum_{i=1}^k w_i^* g(\theta_i^*, \cdot)$

Model Architecture

Step One: Weight Estimation

For given parameters $\theta = (\theta_1, \ldots, \theta_k)$ , determine weights by minimizing $L_2$ distance:

$v_n(\theta) := \arg \inf_{w \in S_k} \|\hat{f}_n(\cdot) - \sum_{i=1}^k w_i g(\theta_i, \cdot)\|$

where $S_k$ is the $(k-1)$ -dimensional simplex and $\hat{f}_n$ is a nonparametric density estimator.

Step Two: Pseudo-likelihood Construction

Substitute the obtained weights into the likelihood function:

$L_n(\theta) := \prod_{t=1}^n \left( \sum_{i=1}^k v_{n,i}(\theta) g(\theta_i, Y_t) \right)$

Log pseudo-likelihood function: $\ell_n(\theta) := \frac{1}{n} \sum_{t=1}^n \ln\left( v_n(\theta)g(\theta, Y_t) \right)$

Step Three: Parameter Estimation

The pseudo-likelihood estimator is defined as: $\hat{\theta}_n \text{ satisfies } \ell_n(\hat{\theta}_n) \geq \sup_{\theta \in \Theta_o} \ell_n(\theta) - \epsilon_n$

where $\epsilon_n \searrow 0$ .

Technical Innovations

Two-step estimation strategy:
- Step one uses $L_2$ distance to estimate weights
- Step two uses likelihood method to estimate component parameters
- This combination ensures boundedness of the objective function
Uniqueness of weights: Although weights $v_n(\theta)$ may not be unique, the density $v_n(\theta)g(\theta, \cdot)$ is unique (Lemma 2.1).
Treatment of parameter space: Handles parameter non-identifiability (e.g., permutation invariance) through the concept of equivalence classes.

Unbounded parameters: Must handle cases where means tend to infinity and variances tend to zero or infinity.
Randomness of weights: Weights $v_n(\theta)$ depend on random $\hat{f}_n$ , so standard strong law of large numbers cannot be directly applied.
Uniform convergence: Must establish uniform convergence over the entire parameter space, not just pointwise convergence.

Comparison with Existing Methods

Variance-restricted MLE:
- Chen (2017): Assumes all component variances are equal
- Tanaka & Takemura (2006): Requires standard deviation lower bound $\exp[-n^d]$
- Tanaka (2009): Imposes penalties on variance ratios
Distance-based estimation:
- Completely estimates the entire mixture model based on distance minimization
- This paper uses distance method only for weights and likelihood method for component parameters
Doubly smoothed likelihood:
- Seo & Lindsay (2010, 2013): Smooths both empirical measure and specified distribution
- High computational complexity, requires Monte Carlo estimation

Advantages of This Paper

Theoretical guarantees: Provides strong consistency proof
Computational efficiency: Can be solved using standard optimization tools
No parameter restrictions: Does not require variance constraints
Preserves likelihood properties: Stays as close as possible to standard MLE properties

Extensibility Discussion

Beyond the i.i.d. Case

The paper discusses applicability of the method in more general settings:

Hidden Markov models: When $X_1, X_2, \ldots$ is a stationary ergodic process with $Y_t|X_t = i \sim N(\theta_i)$
General latent variable models: As long as ergodicity conditions are satisfied

Practical Applications

Signal denoising (generalization of DUDE method)
Emission parameter estimation for hidden Markov models
General latent variable models

Conclusions and Discussion

Main Conclusions

The pseudo-likelihood estimator converges strongly to true parameters under mild conditions
The method avoids the unboundedness problem of traditional MLE
No artificial parameter restrictions are needed

Limitations

Kernel estimator requirements: Requires $\hat{f}_n \xrightarrow{a.s.} f$ and $\|\hat{f}_n\|_\infty$ bounded
Bandwidth selection: The bandwidth of the kernel estimator must decrease sufficiently slowly
Computational complexity: For general $k$ , the weight optimization problem has no closed-form solution

Future Directions

Establishment of asymptotic normality
Generalization to multivariate cases
Consistency under more general dependence structures
Study of finite sample properties

In-Depth Evaluation

Strengths

Theoretical rigor: Provides complete strong consistency proof, addressing various technical challenges
Methodological innovation: Cleverly combines distance and likelihood methods to solve a classical problem
Practical value: Method is computationally feasible without parameter constraints
Clear presentation: Well-structured paper with clear proof strategy

Weaknesses

Strong assumptions: Requires strong convergence conditions for kernel estimators
Computational efficiency: Weight optimization problem may be computationally complex
Finite sample properties: Lacks analysis of finite sample behavior
Experimental validation: Paper is primarily theoretical, lacking numerical experiments

Impact

Academic contribution: Provides new theoretical framework for Gaussian mixture model estimation
Practical value: Solves important problems in practical applications
Methodological significance: Demonstrates effectiveness of combining different criterion functions

Applicable Scenarios

Gaussian mixture model parameter estimation, especially with many components
Application scenarios requiring avoidance of parameter constraints
Emission parameter estimation for hidden Markov models
Density estimation in signal processing and pattern recognition

References

The paper cites 21 important references covering:

Classical mixture model theory (Teicher, 1963)
MLE consistency theory (Chen, 2017; van der Vaart, 2000)
Kernel density estimation theory (Silverman, 1978)
Distance-based estimation methods (Cutler & Cordero-Brana, 1996)
Related pseudo-likelihood methods (Kangro et al., 2025)

These references provide a solid foundation for the theoretical development of this paper.

Strong consistency of pseudo-likelihood parameter estimator for univariate Gaussian mixture models

Strong consistency of pseudo-likelihood parameter estimator for univariate Gaussian mixture models

Basic Information

Abstract

Research Background and Motivation

Problem Background

Why It Matters

Core Contributions

Methodology Details

Problem Definition

Model Architecture

Step One: Weight Estimation

Step Two: Pseudo-likelihood Construction

Step Three: Parameter Estimation

Technical Innovations

Theoretical Analysis

Main Theorem

Proof Strategy

1. Compactification of Parameter Space

2. Generalization of Strong Law of Large Numbers

3. Uniform Convergence

4. Treatment of Limiting Cases

Technical Challenges

Comparison with Existing Methods

Advantages of This Paper

Extensibility Discussion

Beyond the i.i.d. Case

Practical Applications

Conclusions and Discussion

Main Conclusions

Limitations

Future Directions

In-Depth Evaluation

Strengths

Weaknesses

Impact

Applicable Scenarios

References