2025-11-13T23:28:11.258985

Ensemble data assimilation to diagnose AI-based weather prediction model: A case with ClimaX version 0.3.1

Kotsuki, Shiraishi, Okazaki

Artificial intelligence (AI)-based weather prediction research is growing rapidly and has shown to be competitive with the advanced dynamic numerical weather prediction models. However, research combining AI-based weather prediction models with data assimilation remains limited partially because long-term sequential data assimilation cycles are required to evaluate data assimilation systems. This study proposes using ensemble data assimilation for diagnosing AI-based weather prediction models, and marked the first successful implementation of ensemble Kalman filter with AI-based weather prediction models. Our experiments with an AI-based model ClimaX demonstrated that the ensemble data assimilation cycled stably for the AI-based weather prediction model using covariance inflation and localization techniques within the ensemble Kalman filter. While ClimaX showed some limitations in capturing flow-dependent error covariance compared to dynamical models, the AI-based ensemble forecasts provided reasonable and beneficial error covariance in sparsely observed regions. In addition, ensemble data assimilation revealed that error growth based on ensemble ClimaX predictions was weaker than that of dynamical NWP models, leading to higher inflation factors. A series of experiments demonstrated that ensemble data assimilation can be used to diagnose properties of AI weather prediction models such as physical consistency and accurate error growth representation.

academic

Ensemble data assimilation to diagnose AI-based weather prediction model: A case with ClimaX version 0.3.1

Basic Information

Paper ID: 2407.17781
Title: Ensemble data assimilation to diagnose AI-based weather prediction model: A case with ClimaX version 0.3.1
Authors: Shunji Kotsuki, Kenta Shiraishi, Atsushi Okazaki (Chiba University)
Classification: cs.LG stat.AP
Publication Date: July 2024
Paper Link: https://arxiv.org/abs/2407.17781

Abstract

Artificial intelligence (AI) weather forecasting research has developed rapidly and demonstrated competitiveness with advanced dynamical numerical weather prediction (NWP) models. However, research combining AI weather prediction models with data assimilation remains limited, partly because evaluating data assimilation systems requires long sequential data assimilation cycles. This study proposes using ensemble data assimilation to diagnose AI weather prediction models and successfully implements the integration of ensemble Kalman filtering with AI weather prediction models for the first time. Experiments based on the AI model ClimaX demonstrate that ensemble data assimilation can operate stably through sequential cycles by employing covariance inflation and localization techniques within the ensemble Kalman filter. Although ClimaX exhibits limitations compared to dynamical models in capturing flow-dependent error covariance, AI ensemble forecasts provide reasonable and beneficial error covariance in sparsely observed regions. Furthermore, ensemble data assimilation reveals that error growth from ClimaX ensemble forecasts is weaker than that from dynamical NWP models, resulting in higher inflation factors. A series of experiments demonstrate that ensemble data assimilation can be used to diagnose properties of AI weather prediction models such as physical consistency and accurate error growth representation.

Research Background and Motivation

Problem Background

Intensifying extreme weather threats: Extreme weather events caused by climate change are becoming increasingly severe, with the World Economic Forum listing extreme weather as one of the most serious global threats
Rapid development of AI weather forecasting: Since Google DeepMind released GraphCast in December 2022, deep learning weather forecasting research has grown rapidly, including Huawei's Pangu-Weather, Microsoft's ClimaX and Stormer, and NVIDIA's FourCastNet
Lagging data assimilation research: Although AI weather prediction models can now compete with state-of-the-art NWP models, research combining AI models with data assimilation remains limited

Research Motivation

Technical challenges: The requirement for long sequential data assimilation experiments makes it difficult to evaluate data assimilation systems for AI models
Methodological gaps: While research on variational data assimilation combined with AI models exists, there are no successful cases of ensemble Kalman filtering integrated with AI models
Diagnostic needs: Effective methods are needed to diagnose properties of AI weather prediction models, such as physical consistency and error growth representation

Core Contributions

First successful implementation: First successful integration of the Local Ensemble Transform Kalman Filter (LETKF) with an AI weather prediction model (ClimaX)
Stable cyclic operation: Demonstrates that ensemble data assimilation for AI models can operate stably for one year through covariance inflation and localization techniques
Diagnostic framework establishment: Establishes a framework for diagnosing AI weather prediction model characteristics using ensemble data assimilation
Important findings: Reveals limitations of AI models compared to dynamical models in error growth and physical consistency
Technical improvements: Extended ClimaX to support forecasting of more variables to meet data assimilation requirements

Methodology Details

Task Definition

The core task of this research is to apply ensemble data assimilation techniques to AI weather prediction models to diagnose their characteristics and evaluate their performance in data assimilation systems. The input consists of atmospheric observations and AI model forecasts, while the output is the assimilated analysis field.

Model Architecture

ClimaX Model

Base architecture: Global atmospheric AI weather prediction model based on Vision Transformer (ViT)
Resolution settings: 64×32 grid points (5.625°×5.625°), 7 vertical levels (900, 850, 700, 600, 500, 250, 50 hPa)
Key components: Variable tokenization and variable aggregation
Extended improvements: Expanded from the default 5 forecast variables to the complete variable set shown in Table 1, supporting data assimilation requirements

LETKF Data Assimilation System

Ensemble state matrix update equation:

X^a = x̄^b · 1^T + δX^b P̃^a (Y^T R^-1 (y^o - H(X^b) · 1^T) + √(m-1) P̃^a^(1/2))

Where the covariance matrix is:

P̃^a = (I + Y^T R^-1 Y)^-1

Localization function:

l = {exp(-dh²/Lh² - dv²/Lv²)  if dh ≤ 2√(10/3)Lh and dv ≤ 2√(10/3)Lv
     0                        else}

Technical Innovations

System integration: First successful integration of LETKF with AI weather prediction models, developed based on the SPEEDY-LETKF system
Model extension: Extended ClimaX to support the complete variable set required for data assimilation
Diagnostic methods: Utilized optimal localization scales, inflation factors, and other metrics to diagnose AI model characteristics
Observation network design: Adopted an observation network similar to radiosonde observations, with 7-level observations of temperature, wind fields, etc. at observation stations

Experimental Setup

Dataset

Training data: WeatherBench dataset 2006-2015 for training, 2016 for validation
Experimental data: 2017 data for data assimilation experiments (not used in training)
Initial conditions: Selected initial conditions for 20 ensemble members from 2006 WeatherBench data

Evaluation Metrics

RMSE: Global mean root mean square error
MAE difference: Mean absolute error difference between analysis field and first guess field
Inflation factor: Adaptive covariance inflation factor based on observation space statistics
Anomaly correlation coefficient: Model performance metrics during training

Comparison Methods

Sensitivity experiments with different horizontal localization scales (Lh = 400, 500, 600, 700, 800 km)
Comparison of inflation factors with dynamical NWP model (SPEEDY)

Implementation Details

Ensemble size: 20 members
Data assimilation interval: 6 hours
Vertical localization scale: Lv = 1.0 (log Pa)
Observation errors: Standard deviation of 1.0 for temperature and wind fields, 0.1 for specific humidity, 1.0 for surface pressure

Experimental Results

Main Results

Stability Analysis

Successful cycles: Experiments with Lh = 500, 600, 700 km maintained stability throughout 2017
Filter divergence: Lh = 800 km exhibited filter divergence after September 2017
Suboptimal performance: Lh = 400 km continuously reduced RMSE but showed suboptimal performance

Optimal Localization Scale

Optimal setting: Lh = 600 km achieved the lowest analysis RMSE for most variables
Significant improvement: Temperature and surface pressure showed significant analysis error reduction
Wind field limitations: Zonal and meridional winds showed no obvious improvement, with slight degradation

Spatial Pattern Analysis

Observation point improvement: Temperature and zonal wind generally improved at grid points with observations
Surrounding degradation: Slight degradation appeared in regions surrounding observation stations (e.g., Arctic Ocean, U.S. and Japanese coasts)
Southern Hemisphere advantage: Geopotential height and surface pressure showed improvement in the sparsely observed Southern Hemisphere

Important Findings

Inflation Factor Characteristics

High inflation requirement: ClimaX requires higher inflation factors than dynamical models (Figure 6 shows global average approximately 1.4-1.6)
Weak error growth: Indicates that error growth in AI models is weaker than in dynamical NWP models
Poor chaotic characteristics: Consistent with findings by Selz and Craig (2022), AI models cannot accurately reproduce the butterfly effect

Physical Consistency Limitations

Short-term forecast limitations: ClimaX cannot perform long-term free integration, gradually deviating from the real atmosphere after 6-hour forecasts
Non-physical field generation: Long-term forecasts produce meteorologically unrealistic weather fields (e.g., extremely low temperatures over the Pacific)
Attractor problem: AI models cannot return to meteorologically reasonable attractor trajectories

AI Weather Forecasting Development

GraphCast: Pioneering work by Google DeepMind
Commercial models: Pangu-Weather (Huawei), ClimaX/Stormer (Microsoft), FourCastNet (NVIDIA)
ViT architecture: Most AI weather prediction models adopt Vision Transformer architecture

Data Assimilation Methods

Variational methods: Mathematical similarity with AI models, with existing 4DVar integration research
Ensemble methods: First successful implementation of EnKF with AI models in this study
Deep learning DA: Recent efforts to use neural networks to solve the data assimilation inverse problem

Conclusions and Discussion

Main Conclusions

Technical feasibility: Ensemble data assimilation can be stably combined with AI weather prediction models and operate in sequential cycles
Diagnostic value: Ensemble data assimilation is an effective tool for diagnosing AI model characteristics
Limitation identification: AI models have deficiencies in capturing flow-dependent error covariance and error growth representation
Sparse region advantage: AI ensemble forecasts provide reasonable error covariance in sparsely observed regions

Limitations

Smaller optimal localization scale: 600 km is significantly smaller than the 900 km for dynamical models, indicating insufficient flow-dependent error covariance capture capability
Cannot perform OSSE: Observing System Simulation Experiments cannot be performed due to unstable long-term forecasts
Missing physical constraints: AI models lack constraints from physical laws, easily producing unrealistic weather fields
Insufficient error growth: Ensemble spread is inadequate, requiring higher inflation factors

Future Directions

Physical constraint integration: Incorporate physical constraints such as hydrostatic balance and geostrophic balance into AI model training
Error growth improvement: Develop stochastic parameterization schemes or multi-model ensemble methods
Large ensemble extension: Leverage AI model computational advantages to extend to large ensemble EnKF or localized particle filters
Real observation application: Advance toward data assimilation with real observational data

In-Depth Evaluation

Strengths

Pioneering contribution: First successful integration of EnKF with AI weather prediction models, with significant academic value
Systematic research: Systematically evaluated method effectiveness through multiple localization scale experiments
In-depth diagnosis: Utilized data assimilation techniques to deeply analyze AI model characteristics, providing new evaluation perspectives
Practical value: Provides direction for improvements to AI weather prediction models
Open-source code: Provides complete code and data, ensuring reproducibility

Weaknesses

Resolution limitation: Experiments conducted only at low resolution (5.625°), limiting practical applicability
Simulated observations: Uses simulated rather than real observational data, creating a gap with practical applications
Single model: Only tested one AI model (ClimaX), with limited generalizability of conclusions
Insufficient theoretical analysis: Theoretical explanations for AI model limitations are relatively superficial

Impact

Academic impact: Opens new directions for combining AI weather forecasting with data assimilation
Practical value: Provides important reference for developing operational AI weather forecasting systems
Methodological contribution: Establishes a framework for diagnosing AI models using data assimilation
Strong reproducibility: Complete open-source code facilitates subsequent research

Applicable Scenarios

AI model evaluation: Suitable for diagnosing characteristics of various AI weather prediction models
Data assimilation research: Provides foundation for developing data assimilation systems for AI models
Hybrid systems: Can be used for designing AI-physics model hybrid forecasting systems
Educational research: Serves as an important case study for AI meteorology education

References

Lam, R., et al. (2023): Learning skillful medium-range global weather forecasting. Science, 382(6677), 1416-1421.
Bi, K., et al. (2023): Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619(7970), 533-538.
Hunt, B. R., et al. (2007): Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230(1-2), 112-126.
Nguyen, T., et al. (2023): ClimaX: A foundation model for weather and climate. arXiv preprint arXiv:2301.10343.

This paper has pioneering significance in combining AI weather forecasting with data assimilation. Although it has some technical limitations, it establishes an important foundation for the development of this field and possesses considerable academic value and practical potential.