2025-11-13T23:28:11.258985

Ensemble data assimilation to diagnose AI-based weather prediction model: A case with ClimaX version 0.3.1

Kotsuki, Shiraishi, Okazaki
Artificial intelligence (AI)-based weather prediction research is growing rapidly and has shown to be competitive with the advanced dynamic numerical weather prediction models. However, research combining AI-based weather prediction models with data assimilation remains limited partially because long-term sequential data assimilation cycles are required to evaluate data assimilation systems. This study proposes using ensemble data assimilation for diagnosing AI-based weather prediction models, and marked the first successful implementation of ensemble Kalman filter with AI-based weather prediction models. Our experiments with an AI-based model ClimaX demonstrated that the ensemble data assimilation cycled stably for the AI-based weather prediction model using covariance inflation and localization techniques within the ensemble Kalman filter. While ClimaX showed some limitations in capturing flow-dependent error covariance compared to dynamical models, the AI-based ensemble forecasts provided reasonable and beneficial error covariance in sparsely observed regions. In addition, ensemble data assimilation revealed that error growth based on ensemble ClimaX predictions was weaker than that of dynamical NWP models, leading to higher inflation factors. A series of experiments demonstrated that ensemble data assimilation can be used to diagnose properties of AI weather prediction models such as physical consistency and accurate error growth representation.
academic

Ensemble data assimilation to diagnose AI-based weather prediction model: A case with ClimaX version 0.3.1

Basic Information

  • Paper ID: 2407.17781
  • Title: Ensemble data assimilation to diagnose AI-based weather prediction model: A case with ClimaX version 0.3.1
  • Authors: Shunji Kotsuki, Kenta Shiraishi, Atsushi Okazaki (Chiba University)
  • Classification: cs.LG stat.AP
  • Publication Date: July 2024
  • Paper Link: https://arxiv.org/abs/2407.17781

Abstract

Artificial intelligence (AI) weather forecasting research has developed rapidly and demonstrated competitiveness with advanced dynamical numerical weather prediction (NWP) models. However, research combining AI weather prediction models with data assimilation remains limited, partly because evaluating data assimilation systems requires long sequential data assimilation cycles. This study proposes using ensemble data assimilation to diagnose AI weather prediction models and successfully implements the integration of ensemble Kalman filtering with AI weather prediction models for the first time. Experiments based on the AI model ClimaX demonstrate that ensemble data assimilation can operate stably through sequential cycles by employing covariance inflation and localization techniques within the ensemble Kalman filter. Although ClimaX exhibits limitations compared to dynamical models in capturing flow-dependent error covariance, AI ensemble forecasts provide reasonable and beneficial error covariance in sparsely observed regions. Furthermore, ensemble data assimilation reveals that error growth from ClimaX ensemble forecasts is weaker than that from dynamical NWP models, resulting in higher inflation factors. A series of experiments demonstrate that ensemble data assimilation can be used to diagnose properties of AI weather prediction models such as physical consistency and accurate error growth representation.

Research Background and Motivation

Problem Background

  1. Intensifying extreme weather threats: Extreme weather events caused by climate change are becoming increasingly severe, with the World Economic Forum listing extreme weather as one of the most serious global threats
  2. Rapid development of AI weather forecasting: Since Google DeepMind released GraphCast in December 2022, deep learning weather forecasting research has grown rapidly, including Huawei's Pangu-Weather, Microsoft's ClimaX and Stormer, and NVIDIA's FourCastNet
  3. Lagging data assimilation research: Although AI weather prediction models can now compete with state-of-the-art NWP models, research combining AI models with data assimilation remains limited

Research Motivation

  1. Technical challenges: The requirement for long sequential data assimilation experiments makes it difficult to evaluate data assimilation systems for AI models
  2. Methodological gaps: While research on variational data assimilation combined with AI models exists, there are no successful cases of ensemble Kalman filtering integrated with AI models
  3. Diagnostic needs: Effective methods are needed to diagnose properties of AI weather prediction models, such as physical consistency and error growth representation

Core Contributions

  1. First successful implementation: First successful integration of the Local Ensemble Transform Kalman Filter (LETKF) with an AI weather prediction model (ClimaX)
  2. Stable cyclic operation: Demonstrates that ensemble data assimilation for AI models can operate stably for one year through covariance inflation and localization techniques
  3. Diagnostic framework establishment: Establishes a framework for diagnosing AI weather prediction model characteristics using ensemble data assimilation
  4. Important findings: Reveals limitations of AI models compared to dynamical models in error growth and physical consistency
  5. Technical improvements: Extended ClimaX to support forecasting of more variables to meet data assimilation requirements

Methodology Details

Task Definition

The core task of this research is to apply ensemble data assimilation techniques to AI weather prediction models to diagnose their characteristics and evaluate their performance in data assimilation systems. The input consists of atmospheric observations and AI model forecasts, while the output is the assimilated analysis field.

Model Architecture

ClimaX Model

  • Base architecture: Global atmospheric AI weather prediction model based on Vision Transformer (ViT)
  • Resolution settings: 64×32 grid points (5.625°×5.625°), 7 vertical levels (900, 850, 700, 600, 500, 250, 50 hPa)
  • Key components: Variable tokenization and variable aggregation
  • Extended improvements: Expanded from the default 5 forecast variables to the complete variable set shown in Table 1, supporting data assimilation requirements

LETKF Data Assimilation System

Ensemble state matrix update equation:

X^a = x̄^b · 1^T + δX^b P̃^a (Y^T R^-1 (y^o - H(X^b) · 1^T) + √(m-1) P̃^a^(1/2))

Where the covariance matrix is:

P̃^a = (I + Y^T R^-1 Y)^-1

Localization function:

l = {exp(-dh²/Lh² - dv²/Lv²)  if dh ≤ 2√(10/3)Lh and dv ≤ 2√(10/3)Lv
     0                        else}

Technical Innovations

  1. System integration: First successful integration of LETKF with AI weather prediction models, developed based on the SPEEDY-LETKF system
  2. Model extension: Extended ClimaX to support the complete variable set required for data assimilation
  3. Diagnostic methods: Utilized optimal localization scales, inflation factors, and other metrics to diagnose AI model characteristics
  4. Observation network design: Adopted an observation network similar to radiosonde observations, with 7-level observations of temperature, wind fields, etc. at observation stations

Experimental Setup

Dataset

  • Training data: WeatherBench dataset 2006-2015 for training, 2016 for validation
  • Experimental data: 2017 data for data assimilation experiments (not used in training)
  • Initial conditions: Selected initial conditions for 20 ensemble members from 2006 WeatherBench data

Evaluation Metrics

  • RMSE: Global mean root mean square error
  • MAE difference: Mean absolute error difference between analysis field and first guess field
  • Inflation factor: Adaptive covariance inflation factor based on observation space statistics
  • Anomaly correlation coefficient: Model performance metrics during training

Comparison Methods

  • Sensitivity experiments with different horizontal localization scales (Lh = 400, 500, 600, 700, 800 km)
  • Comparison of inflation factors with dynamical NWP model (SPEEDY)

Implementation Details

  • Ensemble size: 20 members
  • Data assimilation interval: 6 hours
  • Vertical localization scale: Lv = 1.0 (log Pa)
  • Observation errors: Standard deviation of 1.0 for temperature and wind fields, 0.1 for specific humidity, 1.0 for surface pressure

Experimental Results

Main Results

Stability Analysis

  • Successful cycles: Experiments with Lh = 500, 600, 700 km maintained stability throughout 2017
  • Filter divergence: Lh = 800 km exhibited filter divergence after September 2017
  • Suboptimal performance: Lh = 400 km continuously reduced RMSE but showed suboptimal performance

Optimal Localization Scale

  • Optimal setting: Lh = 600 km achieved the lowest analysis RMSE for most variables
  • Significant improvement: Temperature and surface pressure showed significant analysis error reduction
  • Wind field limitations: Zonal and meridional winds showed no obvious improvement, with slight degradation

Spatial Pattern Analysis

  • Observation point improvement: Temperature and zonal wind generally improved at grid points with observations
  • Surrounding degradation: Slight degradation appeared in regions surrounding observation stations (e.g., Arctic Ocean, U.S. and Japanese coasts)
  • Southern Hemisphere advantage: Geopotential height and surface pressure showed improvement in the sparsely observed Southern Hemisphere

Important Findings

Inflation Factor Characteristics

  • High inflation requirement: ClimaX requires higher inflation factors than dynamical models (Figure 6 shows global average approximately 1.4-1.6)
  • Weak error growth: Indicates that error growth in AI models is weaker than in dynamical NWP models
  • Poor chaotic characteristics: Consistent with findings by Selz and Craig (2022), AI models cannot accurately reproduce the butterfly effect

Physical Consistency Limitations

  • Short-term forecast limitations: ClimaX cannot perform long-term free integration, gradually deviating from the real atmosphere after 6-hour forecasts
  • Non-physical field generation: Long-term forecasts produce meteorologically unrealistic weather fields (e.g., extremely low temperatures over the Pacific)
  • Attractor problem: AI models cannot return to meteorologically reasonable attractor trajectories

AI Weather Forecasting Development

  • GraphCast: Pioneering work by Google DeepMind
  • Commercial models: Pangu-Weather (Huawei), ClimaX/Stormer (Microsoft), FourCastNet (NVIDIA)
  • ViT architecture: Most AI weather prediction models adopt Vision Transformer architecture

Data Assimilation Methods

  • Variational methods: Mathematical similarity with AI models, with existing 4DVar integration research
  • Ensemble methods: First successful implementation of EnKF with AI models in this study
  • Deep learning DA: Recent efforts to use neural networks to solve the data assimilation inverse problem

Conclusions and Discussion

Main Conclusions

  1. Technical feasibility: Ensemble data assimilation can be stably combined with AI weather prediction models and operate in sequential cycles
  2. Diagnostic value: Ensemble data assimilation is an effective tool for diagnosing AI model characteristics
  3. Limitation identification: AI models have deficiencies in capturing flow-dependent error covariance and error growth representation
  4. Sparse region advantage: AI ensemble forecasts provide reasonable error covariance in sparsely observed regions

Limitations

  1. Smaller optimal localization scale: 600 km is significantly smaller than the 900 km for dynamical models, indicating insufficient flow-dependent error covariance capture capability
  2. Cannot perform OSSE: Observing System Simulation Experiments cannot be performed due to unstable long-term forecasts
  3. Missing physical constraints: AI models lack constraints from physical laws, easily producing unrealistic weather fields
  4. Insufficient error growth: Ensemble spread is inadequate, requiring higher inflation factors

Future Directions

  1. Physical constraint integration: Incorporate physical constraints such as hydrostatic balance and geostrophic balance into AI model training
  2. Error growth improvement: Develop stochastic parameterization schemes or multi-model ensemble methods
  3. Large ensemble extension: Leverage AI model computational advantages to extend to large ensemble EnKF or localized particle filters
  4. Real observation application: Advance toward data assimilation with real observational data

In-Depth Evaluation

Strengths

  1. Pioneering contribution: First successful integration of EnKF with AI weather prediction models, with significant academic value
  2. Systematic research: Systematically evaluated method effectiveness through multiple localization scale experiments
  3. In-depth diagnosis: Utilized data assimilation techniques to deeply analyze AI model characteristics, providing new evaluation perspectives
  4. Practical value: Provides direction for improvements to AI weather prediction models
  5. Open-source code: Provides complete code and data, ensuring reproducibility

Weaknesses

  1. Resolution limitation: Experiments conducted only at low resolution (5.625°), limiting practical applicability
  2. Simulated observations: Uses simulated rather than real observational data, creating a gap with practical applications
  3. Single model: Only tested one AI model (ClimaX), with limited generalizability of conclusions
  4. Insufficient theoretical analysis: Theoretical explanations for AI model limitations are relatively superficial

Impact

  1. Academic impact: Opens new directions for combining AI weather forecasting with data assimilation
  2. Practical value: Provides important reference for developing operational AI weather forecasting systems
  3. Methodological contribution: Establishes a framework for diagnosing AI models using data assimilation
  4. Strong reproducibility: Complete open-source code facilitates subsequent research

Applicable Scenarios

  1. AI model evaluation: Suitable for diagnosing characteristics of various AI weather prediction models
  2. Data assimilation research: Provides foundation for developing data assimilation systems for AI models
  3. Hybrid systems: Can be used for designing AI-physics model hybrid forecasting systems
  4. Educational research: Serves as an important case study for AI meteorology education

References

  1. Lam, R., et al. (2023): Learning skillful medium-range global weather forecasting. Science, 382(6677), 1416-1421.
  2. Bi, K., et al. (2023): Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619(7970), 533-538.
  3. Hunt, B. R., et al. (2007): Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230(1-2), 112-126.
  4. Nguyen, T., et al. (2023): ClimaX: A foundation model for weather and climate. arXiv preprint arXiv:2301.10343.

This paper has pioneering significance in combining AI weather forecasting with data assimilation. Although it has some technical limitations, it establishes an important foundation for the development of this field and possesses considerable academic value and practical potential.