2025-11-14T10:46:10.372014

Extrapolation Problem for Multidimensional Stationary Sequences with Missing Observations

Masyutka, Moklyachuk, Sidei

This paper focuses on the problem of the mean square optimal estimation of linear functionals which depend on the unknown values of a multidimensional stationary stochastic sequence. Estimates are based on observations of the sequence with an additive stationary noise sequence. The aim of the paper is to develop methods of finding the optimal estimates of the functionals in the case of missing observations. The problem is investigated in the case of spectral certainty where the spectral densities of the sequences are exactly known. Formulas for calculating the mean-square errors and the spectral characteristics of the optimal linear estimates of functionals are derived under the condition of spectral certainty. The minimax (robust) method of estimation is applied in the case of spectral uncertainty, where spectral densities of the sequences are not known exactly while sets of admissible spectral densities are given. Formulas that determine the least favorable spectral densities and the minimax spectral characteristics of the optimal estimates of functionals are proposed for some special sets of admissible densities.

academic

Extrapolation Problem for Multidimensional Stationary Sequences with Missing Observations

Basic Information

Paper ID: 2511.07228
Title: Extrapolation Problem for Multidimensional Stationary Sequences with Missing Observations
Authors: Oleksandr Masyutka, Mikhail Moklyachuk, Maria Sidei
Institution: Taras Shevchenko National University of Kyiv
Classification: math.ST (Statistics Theory), stat.TH
Published Journal: Statistics, Optimization and Information Computing, Vol. 7, March 2019, pp 97-117
Paper Link: https://arxiv.org/abs/2511.07228

Abstract

This paper investigates the mean-square optimal extrapolation problem for multidimensional stationary random sequences with missing observations. Estimation is based on observations of sequences with additive stationary noise. The study is conducted under two scenarios: spectral certainty and spectral uncertainty. Under spectral certainty, formulas are derived for computing the mean-square error and spectral characteristics of optimal linear estimates. Under spectral uncertainty, minimax-robust methods are applied to determine formulas for the least favorable spectral density and minimax spectral characteristics.

Research Background and Motivation

Problem Definition

The core problem addressed in this paper is: How can one optimally estimate linear functionals of multidimensional stationary random sequences when missing observations are present? Specifically:

Observation Model: The observed sequence is $\xi(j) + \eta(j)$ , where $\xi(j)$ is the signal sequence and $\eta(j)$ is the noise sequence
Missing Pattern: Observation points are $j \in \mathbb{Z}^- \setminus S$ , where $S = \bigcup_{l=1}^{s}\{-M_l-N_l, \ldots, -M_l\}$ represents missing observation segments
Estimation Target: Linear functional $A\xi = \sum_{j=0}^{\infty} a(j)^\top \xi(j)$

Research Significance

Theoretical Value: Extends classical Kolmogorov-Wiener prediction theory to the missing observation scenario
Practical Importance: In real applications, sensor failures and data transmission interruptions frequently cause missing observations
Robustness Requirements: In practice, spectral density is often unknown or imprecisely known, necessitating robust estimation methods

Limitations of Existing Methods

Complete Observation Assumption: Traditional methods (Wiener, Yaglom, Rozanov, etc.) assume complete observations
Spectral Certainty Assumption: Most methods require precisely known spectral density, which is difficult to satisfy in practice
Univariate Limitations: Theory and methods for multidimensional cases are relatively underdeveloped

Research Motivation

The innovation of this paper lies in:

Extending Hilbert space projection methods to the missing observation scenario
Developing minimax robust estimation theory under spectral uncertainty
Providing a complete theoretical framework and computational formulas for the multidimensional case

Core Contributions

Theoretical Framework: Establishes a complete theoretical system for the extrapolation problem of multidimensional stationary sequences with missing observations
Spectral Certainty Case:
- Derives explicit spectral characteristic formulas for optimal linear estimates (Formula 10)
- Provides exact computational formulas for mean-square error (Formula 11)
Spectral Uncertainty Case:
- Develops minimax robust estimation methods
- Proposes characterization equations for least favorable spectral density
- Provides specific solutions for multiple special admissible spectral density classes
Special Cases: Provides corollaries for noise-free observations, uncorrelated noise, and other special cases
Computational Methods: Establishes a computable framework through operator equations and Fourier coefficients

Detailed Methodology

Task Definition

Input:

Observation sequence: $\{\xi(j) + \eta(j), j \in \mathbb{Z}^- \setminus S\}$
Missing set: $S = \bigcup_{l=1}^{s}\{-M_l-N_l, \ldots, -M_l\}$
Functional coefficients: $\{a(j), j=0,1,\ldots\}$ satisfying $\sum_{j=0}^{\infty}\sum_{k=1}^{T}|a_k(j)| < \infty$

Output:

Optimal estimate: $\hat{A}\xi = \int_{-\pi}^{\pi} h(e^{i\lambda})^\top (Z_\xi(d\lambda) + Z_\eta(d\lambda))$
Mean-square error: $\Delta(h; F, G) = E|A\xi - \hat{A}\xi|^2$

Constraints:

Minimality condition: $\int_{-\pi}^{\pi} \text{Tr}(F(\lambda) + G(\lambda))^{-1}d\lambda < \infty$

Theoretical Foundation: Hilbert Space Projection Method

The core method is based on Kolmogorov's Hilbert space projection theory:

Hilbert Space Construction:
- $H = L_2(\Omega, \mathcal{F}, P)$ : Generated by zero-mean, finite-variance random variables
- $H_s(\xi + \eta)$ : Closed linear subspace generated by observed values $\{\xi_k(j) + \eta_k(j): j \in \mathbb{Z}^- \setminus S, k=1,\ldots,T\}$
Optimal Estimate Characterization: The optimal estimate $\hat{A}\xi$ $\hat{A} ξ$ is the orthogonal projection of $A\xi$ $A ξ$ onto $H_s(\xi+\eta)$ $H_{s} (ξ + η)$ , satisfying:
- $\hat{A}\xi \in H_s(\xi + \eta)$
- $A\xi - \hat{A}\xi \perp H_s(\xi + \eta)$

Solution for Spectral Certainty Case

Spectral Representation

Using spectral decomposition: $\xi(j) = \int_{-\pi}^{\pi} e^{ij\lambda} Z_\xi(d\lambda), \quad A\xi = \int_{-\pi}^{\pi} A(e^{i\lambda})^\top Z_\xi(d\lambda)$

where $A(e^{i\lambda}) = \sum_{j=0}^{\infty} a(j)e^{ij\lambda}$

Spectral Characteristic Equation

Through orthogonality conditions, the spectral characteristic $h(e^{i\lambda})$ satisfies:

$(A(e^{i\lambda}))^\top(F(\lambda) + F_{\xi\eta}(\lambda)) - (h(e^{i\lambda}))^\top F_\zeta(\lambda) = (C(e^{i\lambda}))^\top$

where $F_\zeta(\lambda) = F(\lambda) + F_{\xi\eta}(\lambda) + F_{\eta\xi}(\lambda) + G(\lambda)$ , $C(e^{i\lambda}) = \sum_{j \in U} c(j)e^{ij\lambda}$ , $U = S \cup \{0,1,\ldots\}$

Operator Equation

Introducing Fourier coefficients: $B(k-j) = \frac{1}{2\pi}\int_{-\pi}^{\pi} (F_\zeta(\lambda))^{-1}e^{-i(k-j)\lambda}d\lambda$

$R(k-j) = \frac{1}{2\pi}\int_{-\pi}^{\pi} (F(\lambda) + F_{\xi\eta}(\lambda))(F_\zeta(\lambda))^{-1}e^{-i(k-j)\lambda}d\lambda$

Unknown coefficients $c(k), k \in U$ are determined by the operator equation: $Ra = Bc$

where operators $B, R$ are defined by corresponding block matrices accounting for the missing observation structure.

Optimal Spectral Characteristic (Theorem 2.1)

$(h(e^{i\lambda}))^\top = (A(e^{i\lambda}))^\top(F(\lambda) + F_{\xi\eta}(\lambda))(F_\zeta(\lambda))^{-1} - \left(\sum_{k \in U}(B^{-1}Ra)(k)e^{ik\lambda}\right)^\top(F_\zeta(\lambda))^{-1}$

Mean-Square Error

$\Delta(h; F, G) = \langle Ra, B^{-1}Ra \rangle + \langle Qa, a \rangle$

where $Q$ is a linear operator defined by Fourier coefficients $Q(k-j)$ .

Spectral Uncertainty Case: Minimax Method

Basic Concepts

Least Favorable Spectral Density (Definition 3.1): $(F^0, G^0) \in \mathcal{D}$ is called least favorable if $\Delta(h(F^0, G^0); F^0, G^0) = \max_{(F,G) \in \mathcal{D}} \Delta(h(F,G); F, G)$

Minimax Spectral Characteristic (Definition 3.2): $h^0 \in H_{\mathcal{D}}$ is called minimax if $\min_{h \in H_{\mathcal{D}}} \max_{(F,G) \in \mathcal{D}} \Delta(h; F, G) = \max_{(F,G) \in \mathcal{D}} \Delta(h^0; F, G)$

Optimization Problem

The minimax problem is equivalent to constrained optimization: $\max_{(F,G) \in \mathcal{D}} (\langle Ra, B^{-1}Ra \rangle + \langle Qa, a \rangle)$

Transformed to unconstrained optimization: $\Delta_{\mathcal{D}}(F,G) = -\Delta(h(F^0, G^0); F, G) + \delta((F,G)|\mathcal{D}) \to \inf$

where $\delta$ is the indicator function.

Optimality Conditions

The least favorable spectral density is determined by subdifferential conditions: $0 \in \partial \Delta_{\mathcal{D}}(F^0, G^0)$

Using Lagrange multiplier methods and subdifferential forms, specific characterization equations can be derived.

Special Admissible Spectral Density Classes

The paper considers multiple special categories, for example:

Class $\mathcal{D}^1_0 \times \mathcal{D}^{UV}_1$

$\mathcal{D}^1_0 = \left\{F(\lambda) \left| \frac{1}{2\pi}\int_{-\pi}^{\pi} \text{Tr}F(\lambda)d\lambda = p\right.\right\}$

$\mathcal{D}^{UV}_1 = \left\{G(\lambda) \left| \text{Tr}V(\lambda) \leq \text{Tr}G(\lambda) \leq \text{Tr}U(\lambda), \frac{1}{2\pi}\int_{-\pi}^{\pi}\text{Tr}G(\lambda)d\lambda = q\right.\right\}$

Least Favorable Spectral Density Equation (Theorem 4.1): $(r^0_G(\lambda))^*(r^0_G(\lambda))^\top = \alpha^2(F^0(\lambda) + G^0(\lambda))^2$

$(r^0_F(\lambda))^*(r^0_F(\lambda))^\top = (\beta^2 + \gamma_1(\lambda) + \gamma_2(\lambda))(F^0(\lambda) + G^0(\lambda))^2$

where $\alpha^2, \beta^2$ are Lagrange multipliers, $\gamma_1(\lambda) \leq 0$ (equals 0 when $\text{Tr}G^0(\lambda) > \text{Tr}V(\lambda)$ ), $\gamma_2(\lambda) \geq 0$ (equals 0 when $\text{Tr}G^0(\lambda) < \text{Tr}U(\lambda)$ ).

Other Classes

The paper also considers:

$\mathcal{D}^2_0 \times \mathcal{D}^{UV}_2$ : Diagonal element constraints
$\mathcal{D}^3_0 \times \mathcal{D}^{UV}_3$ : Weighted trace constraints
$\mathcal{D}^4_0 \times \mathcal{D}^{UV}_4$ : Matrix inequality constraints
$\mathcal{D}_\epsilon \times \mathcal{D}^1_\delta$ : $\epsilon$ -contamination and $\delta$ -neighborhood models

Each category provides corresponding characterization equations.

Experimental Setup

Numerical Example (Example 2.1)

The paper provides a concrete two-dimensional sequence extrapolation example:

Problem Setup:

Functional: $A_1\xi = a(0)^\top\xi(0) + a(1)^\top\xi(1)$ , where $a(0) = a(1) = (1,1)^\top$
Sequence: $\xi_1(n) = \xi(n)$ , $\xi_2(n) = \xi(n) + \eta(n)$
Missing set: $S = \{-3, -2\}$
Spectral density: $f(\lambda) = \frac{1}{|1-b_1e^{i\lambda}|^2}, \quad g(\lambda) = \frac{1}{|1-b_2e^{i\lambda}|^2}$
Spectral density matrix: $F(\lambda) = \begin{pmatrix} f(\lambda) & f(\lambda) \\ f(\lambda) & f(\lambda) + g(\lambda) \end{pmatrix}$

Computational Steps

Inverse Spectral Density Matrix: $(F(\lambda))^{-1} = \begin{pmatrix} \frac{1}{f(\lambda)} + \frac{1}{g(\lambda)} & -\frac{1}{g(\lambda)} \\ -\frac{1}{g(\lambda)} & \frac{1}{g(\lambda)} \end{pmatrix} = B(-1)e^{-i\lambda} + B(0) + B(1)e^{i\lambda}$
Fourier Coefficients: $B(0) = \begin{pmatrix} 2+b_1^2+b_2^2 & -1-b_2^2 \\ -1-b_2^2 & 1+b_2^2 \end{pmatrix}, \quad B(1) = B(-1) = \begin{pmatrix} -b_1-b_2 & b_2 \\ b_2 & -b_2 \end{pmatrix}$
Operator Matrix: Construct block matrix $B$ accounting for missing positions $\{-3, -2\}$ and future positions $\{0, 1, 2, \ldots\}$
Spectral Factorization: Utilize factorization $(F(\lambda))^{-1} = \left(\sum_{j=0}^{\infty}\psi(j)e^{-ij\lambda}\right) \cdot \left(\sum_{j=0}^{\infty}\psi(j)e^{-ij\lambda}\right)^*$
where $\psi(0) = \begin{pmatrix} 1 & 1 \\ 0 & -1 \end{pmatrix}$ , $\psi(1) = \begin{pmatrix} -b_1 & -b_2 \\ 0 & b_2 \end{pmatrix}$
Inverse Operator Computation: $B^{-1}_{11}(i,j) = (\Theta^*\Theta)(i,j) = \sum_{l=0}^{\min(i,j)}(\theta(i-l))^*\theta(j-l)$

Final Results

Spectral Characteristic: $(h_1(e^{i\lambda}))^\top = -(b_2 + b_2^2 - 2(b_1 + b_1^2), -b_2 - b_2^2)e^{-i\lambda}$

Mean-Square Error: $\Delta(h_1; F) = 10 + 8b_1 + 4b_1^2 + 2b_2 + b_2^2$

This example demonstrates:

How to handle block structure of missing observations
How to utilize spectral factorization to simplify computations
Explicit form of optimal spectral characteristics

Experimental Results

Theoretical Results Verification

The paper verifies the feasibility of the theoretical framework through Example 2.1:

Simplicity of Spectral Characteristics: The optimal spectral characteristic has finite support (non-zero only in $\lambda^{-1}$ terms), reflecting that the impact of missing observations is local
Computability of Error: The mean-square error expression is a simple polynomial in parameters $b_1, b_2$ , facilitating analysis and optimization
Parameter Effects:
- Larger $b_1, b_2$ lead to larger errors (enhanced autocorrelation of signal and noise)
- Error is more sensitive to $b_1$ (signal autocorrelation has more significant impact)

Method Advantages

Compared to existing methods:

Completeness: Provides a complete framework from problem modeling to concrete computation
Generality: Applicable to multidimensional sequences and arbitrary missing patterns
Robustness: Minimax method handles spectral uncertainty
Computability: Implemented through operator equations and Fourier coefficients

Theoretical Guarantees

The paper provides multiple theorems guaranteeing:

Theorem 2.1: Existence and uniqueness of optimal solution under spectral certainty
Theorems 4.1, 5.1: Characterization of least favorable spectral density under different admissible classes
Corollaries 2.1-2.4, 4.1-4.2, 5.1-5.2: Simplified results for special cases

Classical Theoretical Foundation

Kolmogorov (1941): First proposed spectral methods for stationary sequence prediction
Wiener (1949): Developed continuous-time filtering theory
Yaglom (1955, 1987): Systematically studied related theory of stationary processes
Rozanov (1967): Multidimensional stationary process theory
Hannan (1970): Multivariate time series analysis

Missing Observation Problems

Bondon (2002, 2005): Prediction with incomplete past
Cheng & Pourahmadi (1996, 1998): Extremal problems and interpolation in $L^p(w)$ spaces
Kasahara, Pourahmadi & Inoue (2009): Dual methods for missing value prediction
Pelagatti (2015): Time series modeling with unobservable components

Robust Estimation Methods

Grenander (1957): First proposed minimax methods for stationary process extrapolation
Kassam & Poor (1985): Survey of robust techniques in signal processing
Franke (1984, 1985): Robust prediction and interpolation for time series
Franke & Poor (1984): Minimax robust filtering
Vastola & Poor (1983): Analysis of spectral uncertainty effects on Wiener filtering

Authors' Prior Work

Moklyachuk (2008, 2015): Robust estimation of stationary sequence functionals
Moklyachuk & Masyutka (2008-2012): Minimax prediction for multidimensional stationary processes
Moklyachuk & Sidei (2015-2017): Interpolation, extrapolation and filtering with missing observations
Luz & Moklyachuk (2015-2016): Estimation for stationary increment processes

Unique Contributions of This Paper

Compared to existing work:

Systematicity: First systematic study of extrapolation for multidimensional sequences with missing observations
Completeness: Addresses both spectral certainty and uncertainty cases
Generality: Considers multiple missing patterns and admissible spectral density classes
Operability: Provides explicit computational formulas and operator equations

Conclusions and Discussion

Main Conclusions

Theoretical Framework: Successfully establishes a complete theoretical system for extrapolation of multidimensional stationary sequences with missing observations
Spectral Certainty Results:
- Optimal spectral characteristic is determined by operator equation $Ra = Bc$ and formula (10)
- Mean-square error can be precisely computed via formula (11)
- Method applies to correlated and uncorrelated noise
Spectral Uncertainty Results:
- Least favorable spectral density is characterized by subdifferential condition $0 \in \partial\Delta_{\mathcal{D}}(F^0, G^0)$
- Explicit Lagrange equations provided for multiple special admissible classes
- Minimax estimate possesses saddle-point property
Computational Methods: Achieves computable framework through Fourier coefficients and operator matrices

Limitations

Computational Complexity:
- Requires solving infinite-dimensional operator equations (truncation needed in practice)
- Computation of inverse operator $B^{-1}$ may be difficult
- Matrix dimension increases with number of missing segments
Theoretical Assumptions:
- Requires minimality condition (1) or (12) to hold
- Assumes operator $B$ is invertible (see Salehi 1979)
- Functional coefficients must satisfy absolute summability condition (3)
Spectral Uncertainty:
- Only considers specific admissible spectral density classes
- Numerical solution of least favorable spectral density may be complex
- Does not discuss how to estimate admissible classes from data
Practical Applicability:
- Lacks large-scale numerical experiments
- Not combined with real data applications
- Lacks numerical comparison with other methods

Future Directions

Research directions suggested by the paper:

Algorithm Development:
- Efficient numerical algorithms for solving operator equations
- Approximation methods for large-scale problems
- Adaptive truncation dimension selection
Theoretical Extensions:
- Generalization to non-stationary sequences
- Periodically correlated sequences (partial work exists)
- Stationary increment sequences (partial work exists)
Application Research:
- Real problems in signal processing
- Financial time series analysis
- Sensor network data fusion
Statistical Inference:
- Spectral density estimation from data
- Methods for admissible class selection
- Confidence intervals and hypothesis testing

In-Depth Evaluation

Strengths

1. Theoretical Rigor

Solid Mathematical Foundation: Based on Hilbert space theory and convex optimization theory
Complete Proofs: Clear logical flow of theorems and corollaries with explicit conditions
Standardized Notation: Mathematical symbols used consistently and clearly

2. Method Innovation

Missing Observation Handling: Cleverly embeds missing structure into operator matrices
Minimax Framework: Systematically develops robust estimation under spectral uncertainty
Multidimensional Generalization: Successfully handles complexity of multidimensional cases

3. Result Completeness

Multiple Cases: Covers correlated/uncorrelated noise, with/without noise observations
Multiple Spectral Classes: Considers 8 different admissible spectral density classes
Explicit Formulas: Provides computable explicit expressions

4. Literature Review

Clear Historical Context: From Kolmogorov to latest work
Comprehensive References: Includes 41 references
Accurate Positioning: Clearly states relationship to existing work

Weaknesses

1. Insufficient Experimental Verification

Only One Example: Example 2.1 is too simple (two-dimensional, simple missing pattern)
Lacks Numerical Comparison: No numerical comparison with other methods
No Real Data: Not validated on real datasets

2. Readability Issues

Heavy Notation: Abundant matrix and operator symbols, high reading threshold
Complex Structure: Block matrix structure description lacks intuitive presentation
Missing Visualizations: No figures or diagrams to aid understanding

3. Practical Considerations

Computational Cost: Algorithm complexity and computational efficiency not discussed
Parameter Selection: Lacks practical guidance for choosing admissible class parameters
Software Implementation: No code or software package provided

4. Theoretical Limitations

Invertibility Assumption: Invertibility condition for operator $B$ not sufficiently clear
Convergence Analysis: Truncation error analysis for infinite-dimensional problems missing
Stability: Numerical stability not discussed

Impact Assessment

Academic Value

Theoretical Contribution: ★★★★☆
- Fills gap in extrapolation theory with missing observations
- Provides systematic framework for subsequent research
Method Innovation: ★★★★☆
- Operator equation method for missing observations is innovative
- Systematic development of minimax framework has value

Practical Value

Application Potential: ★★★☆☆
- Theory is complete but practical applicability needs verification
- Requires more real application cases
Reproducibility: ★★☆☆☆
- Theoretical formulas complete but algorithm details insufficient
- Lacks code and numerical experiments

Scope of Impact

Time Series Analysis: Provides theoretical tools for missing data handling
Signal Processing: Applicable to sensor data fusion
Financial Engineering: Missing data handling in high-frequency trading
Statistics: Development of robust estimation theory

Applicable Scenarios

Ideal Application Scenarios

Sensor Networks: Data loss due to sensor failures
Communication Systems: Signal reconstruction from packet loss
Financial Time Series: Prediction with irregular trading times
Environmental Monitoring: Imputation of missing weather station data

Inapplicable Scenarios

Non-stationary Processes: Method assumes stationarity
Nonlinear Systems: Only considers linear functionals
High-dimensional Large-scale: Computational complexity may be prohibitive
Completely Unknown Spectrum: Requires some prior information

Reference Highlights

The paper cites classical and cutting-edge works in the field:

Foundational Works:
- Kolmogorov (1992): Random process prediction theory
- Wiener (1966): Filtering and prediction theory
- Yaglom (1987): Related theory
Methodology:
- Grenander (1957): Minimax methods
- Franke (1984, 1985): Robust prediction
- Pshenichnyj (1971): Convex optimization
Missing Observations:
- Bondon (2002, 2005)
- Pourahmadi et al. (2007, 2009)
Authors' Series Work: Demonstrates continuity and depth of research

Summary

This is a high-quality academic paper with rigorous theory and systematic methodology. Main strengths include:

Establishes complete theoretical framework for extrapolation with missing observations
Addresses both spectral certainty and uncertainty cases
Provides explicit solutions for multiple special cases

Main weaknesses are:

Weak experimental verification with only one simple example
Insufficient practical considerations, lacking algorithms and code
Readability issues with heavy notation

Recommendation Index: ★★★★☆ (for theory researchers) / ★★★☆☆ (for applied researchers)

The paper makes important theoretical contributions to time series analysis and robust estimation, but requires follow-up work in algorithm implementation and practical applications for supplementation and verification.