2025-11-24T19:07:18.605056

Mitigating Model Drift in Developing Economies Using Synthetic Data and Outliers

Varshavskiy, Boboeva, Khalilbekov et al.

Machine Learning models in finance are highly susceptible to model drift, where predictive performance declines as data distributions shift. This issue is especially acute in developing economies such as those in Central Asia and the Caucasus - including Tajikistan, Uzbekistan, Kazakhstan, and Azerbaijan - where frequent and unpredictable macroeconomics shocks destabilize financial data. To the best of our knowledge, this is among the first studies to examine drift mitigation methods on financial datasets from these regions. We investigate the use of synthetic outliers, a largely unexplored approach, to improve model stability against unforeseen shocks. To evaluate effectiveness, we introduce a two-level framework that measures both the extent of performance degradation and the severity of shocks. Our experiments on macroeconomic tabular datasets show that adding a small proportion of synthetic outliers generally improves stability compared to baseline models, though the optimal amount varies by dataset and model

academic

Mitigating Model Drift in Developing Economies Using Synthetic Data and Outliers

Basic Information

Paper ID: 2510.09294
Title: Mitigating Model Drift in Developing Economies Using Synthetic Data and Outliers
Authors: Ilyas Varshavskiy, Bonu Boboeva, Shuhrat Khalilbekov, Azizjon Azimi, Sergey Shulgin, Akhlitdin Nizamitdinov, Haitz Sáez de Ocáriz Borde
Classification: cs.LG (Machine Learning)
Conference: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Generative AI in Finance
Paper Link: https://arxiv.org/abs/2510.09294

Abstract

Machine learning models in finance are highly susceptible to model drift, wherein predictive performance deteriorates when data distributions shift. This problem is particularly acute in developing economies, especially in Central Asia and the Caucasus region (including Tajikistan, Uzbekistan, Kazakhstan, and Azerbaijan), where frequent and unpredictable macroeconomic shocks undermine financial data stability. To the authors' knowledge, this is among the first studies investigating drift mitigation methods on financial datasets from these regions. The paper investigates the use of synthetic outliers—a largely unexplored approach—to enhance model stability against unexpected shocks. To assess effectiveness, the authors introduce a dual-layer framework that measures both performance degradation and shock severity. Experiments on macroeconomic tabular datasets demonstrate that adding modest quantities of synthetic outliers typically improves stability compared to baseline models, although the optimal quantity varies across datasets and models.

Research Background and Motivation

Problem Definition

This research addresses the model drift problem in financial machine learning models when confronted with distributional changes, particularly in developing economies where frequent macroeconomic shocks cause sharp declines in model performance.

Problem Significance

Severe Economic Impact: In developing economies, model failures can be extremely costly, particularly in critical financial applications such as credit risk assessment
Frequent and Unpredictable Shocks: Central Asian and Caucasian regions frequently face external shocks such as trade conflicts and armed conflicts, causing abrupt data distribution changes
Research Gap: Existing model drift research primarily focuses on mature financial markets, with insufficient attention to developing economies

Limitations of Existing Methods

Passive Response: Traditional approaches such as metric monitoring and model retraining only take action after drift has occurred
Data Dependency: Requires waiting for real-world data to retrain models
Insufficient Regional Specificity: Lacks solutions tailored to the specific circumstances of developing economies

Research Motivation

The authors propose a proactive strategy that introduces synthetic outliers during the training phase to enable models to adapt to extreme scenarios in advance, thereby enhancing model robustness when facing unknown shocks.

Core Contributions

Proposes a Novel Stability Assessment Framework: Includes two metrics—Stabilization Score (SS) and Stabilization Uplift (SU)—that quantify model stability performance under shocks
Innovatively Employs Synthetic Outliers: Utilizes zGAN-generated synthetic outliers to enhance model robustness against sudden shocks
Fills Regional Research Gap: First systematic study of drift mitigation methods on financial datasets from Central Asia and the Caucasus
Provides Open-Source Implementation: Releases complete code, metrics, and experiments, including synthetic data

Methodology Details

Task Definition

Input: Financial tabular data (credit risk prediction task) Output: Binary classification results (default/normal) Objective: Maintain stability of model predictive performance when facing data distribution changes caused by external shocks

Core Methodological Framework

1. Shock Definition and Distribution Change Measurement

Shocks are defined as sudden events in the data generation process that cause immediate significant changes in feature distributions. Distribution shift (DS) is calculated as:

$DS = \frac{1}{|C|+|N|}\left(\sum_{c \in C} d_{TV}(P_{baseline}(c), P_{shocked}(c)) + \sum_{n \in N} d_{KS}(P_{baseline}(n), P_{shocked}(n))\right) \geq \tau$

where C and N represent categorical and numerical features respectively, and $d_{TV}$ and $d_{KS}$ denote total variation distance and Kolmogorov-Smirnov statistic respectively.

2. Stabilization Score (SS)

Quantifies the model's ability to maintain predictive performance under drift:

$SS = 1 - \frac{|\hat{A}_{base} - \hat{A}_{shock}|}{1 + \log(1 + DS + \varepsilon)} \in [0.5, 1]$

where $\hat{A}_{base}$ and $\hat{A}_{shock}$ represent model performance on baseline and shocked data respectively.

3. Stabilization Uplift (SU)

Compares the relative advantage of two models under drift:

$SU = w \cdot (w'_B \cdot SS_B - w'_A \cdot SS_A)$

where weights are computed via sigmoid functions, accounting for model internal stability and relative superiority.

4. Synthetic Outlier Generation

Employs zGAN generator, comprising:

Standard GAN Components: Generator-discriminator architecture
Outlier Conditional Covariance Generator (covGEN): Samples macroeconomic outliers from extreme value theory-compatible multivariate distributions
Conditional VAE: Provides covariance matrices
Hash Similarity Filter: Prevents excessive similarity to real records

Technical Innovations

Proactive Stabilization Strategy: Rather than responding after drift occurs, exposes models to extreme scenarios during training
Dual-Layer Evaluation Framework: SS measures individual model stability; SU compares relative advantages between models
Region-Specific Design: Tailored to macroeconomic shock characteristics of developing economies
Non-Monotonic Optimization: Discovers that optimal outlier proportions typically range from 5-10%, not monotonically increasing

Experimental Setup

Datasets

Experiments utilize private credit risk datasets from five developing economies:

A1 (Tajikistan): Trade conflict shock, DS=0.2250
A4 (Uzbekistan): No explicit shock, DS=0.0050
A5 (Kazakhstan): Armed conflict shock, DS=0.1212
A6 (Jordan): No explicit shock, DS=0.0026
A9 (Azerbaijan): Armed conflict shock, DS=0.1802
Open Dataset (Lending Club): Trade conflict shock, DS=0.1193

All tasks involve binary default prediction with class imbalance (approximately 2-12%).

Evaluation Metrics

AUC_base: Performance before shock
AUC_shock: Performance after shock
SS: Stabilization Score
SU: Stabilization Uplift

Comparison Methods

Tests eight machine learning models:

CatBoost, TabPFN, FT-Transformer, HGBoosting
NGBoost, XGBoost, LightGBM, TabNet

Implementation Details

Data Split: 80/20 train-test split
Synthetic Data Ratio: 50/50 real/synthetic mixture
Outlier Proportion: 0%, 1%, 3%, 5%, 7%, 10%, 50%, 100%
Monte Carlo Evaluation: 51 random splits
Global Hyperparameters: (k1, k2, k3) = (100, 1000, 1000)

Experimental Results

Main Results

According to best results in Table 1:

A1 (Tajikistan): TabNet without outliers achieves SU=0.8441
A4 (Uzbekistan): TabPFN with 50% outliers achieves SU=0.7449
A9 (Azerbaijan): TabPFN with 5% outliers achieves SU=0.9981
Open Dataset: FT-Transformer with 100% outliers achieves SU=0.8884

Key Findings

Flexible Architectures Benefit Most: TabPFN and FT-Transformer typically achieve highest SU values under shocks
Non-Monotonic Outlier Proportion: Moderate injection levels (5-10%) frequently maximize SU; both smaller and larger proportions diminish gains
Gains Correlate with Shock Intensity: Maximum improvements on high-DS datasets (A1, A9); limited improvements when DS is minimal (A4, A6)

Statistical Analysis

Across all model-dataset pairs:

53% of Cases: Adding non-zero outlier proportions improves stability (135/256)
83% of Best Configurations: Training with outliers outperforms training without (10/12)
Significant Model Differences: HGBoosting, NGBoost, XGBoost, LightGBM benefit in 50% of cases; FT-Transformer in 75% of cases; CatBoost, TabPFN, TabNet in 100% of cases

Case Analysis

Analysis of the "tjs/usd" exchange rate feature from the Tajikistan dataset reveals:

Synthetic outliers form reasonable extreme values in distribution tails
5-10% outlier proportions provide sufficient extreme value exposure while maintaining authenticity
UMAP projections show synthetic data highly similar to real data, with outliers appropriately distributed in boundary regions

Drift Detection and Adaptation Methods

Temporal Drift: Relationships gradually evolve over time
Conditional Drift: New data originates from underrepresented feature space regions
Contextual Drift: Sudden changes in input-output relationships due to external shocks

Traditional methods include ADWIN algorithm, incremental learning, and sliding windows, primarily representing passive response strategies.

Synthetic Data Research

Related work includes TabOOD framework for generating out-of-distribution tabular samples and synthetic data for drift detection in business processes, though targeted use of synthetic outliers for drift mitigation remains sparse.

Conclusions and Discussion

Main Conclusions

Synthetic Outliers Are Effective: In most cases, they enhance model stability against sudden shocks
Optimal Proportion Exists: Typically in the 5-10% range, balancing extreme value exposure and data quality
Architecture Sensitivity: Flexible neural network architectures better leverage outlier information than traditional tree models
Regional Applicability: Method demonstrates effectiveness across multiple developing economy datasets

Limitations

Lack of Universal Rules: No universal method found for selecting optimal outlier percentages
Dataset Constraints: Primarily validated on credit risk tasks; applicability to other financial tasks remains unknown
Shock Type Limitations: Primarily addresses macroeconomic shocks; effectiveness on other drift types unclear
Computational Overhead: Training additional generative models increases computational costs

Future Directions

Adaptive Outlier Proportions: Develop heuristic methods to automatically determine optimal outlier proportions
Multiple Shock Types: Extend to more types of distribution change scenarios
Real-Time Adaptation: Combine with online learning for dynamic adjustment
Theoretical Analysis: Provide deeper theoretical guarantees and analysis

In-Depth Evaluation

Strengths

Outstanding Problem Importance: Focuses on the overlooked yet important application scenario of developing economies
Strong Method Innovation: Proactive outlier injection strategy demonstrates novelty and practical value
Comprehensive Evaluation Framework: SS and SU metrics are well-designed, comprehensively assessing model stability
Rigorous Experimental Design: 51 Monte Carlo repetitions, multiple datasets, and comparative experiments across multiple models
Open-Source Contribution: Provides complete code and data, enhancing reproducibility

Shortcomings

Private Dataset Nature: Core datasets cannot be publicly released, limiting result verifiability
Weak Theoretical Foundation: Lacks in-depth theoretical analysis of why outliers enhance stability
Hyperparameter Sensitivity: SU metric's k1, k2, k3 parameter selection lacks sufficient theoretical guidance
Unclear Applicability Scope: Primarily validated on tabular data; applicability to other data types remains unknown
Computational Efficiency: No analysis of method computational overhead and scalability

Impact

Academic Contribution: Provides new perspectives and methods for model drift research
Practical Value: Directly applicable to financial institutions in developing economies
Methodological Inspiration: Proactive stabilization strategy may inspire further related research
Dataset Value: Despite being private, provides important empirical foundation for regional research

Applicable Scenarios

Financial Institutions in Developing Economies: Particularly suitable for financial environments facing frequent external shocks
Credit Risk Management: Enhances model robustness in critical tasks such as default prediction
Macroeconomically Unstable Regions: Any market facing political and economic uncertainty
Proactive Risk Management: Scenarios requiring advance prevention rather than passive response

References

The paper cites 31 related references, primarily including:

Foundational Drift Research: Hinder et al. (2024), Halstead et al. (2022) and other survey work on concept drift
Drift Detection Methods: ADWIN algorithm (Bifet & Gavaldà, 2007), online learning methods, etc.
Synthetic Data Generation: GAN-related work (Goodfellow et al., 2014), TabOOD framework (Puranik et al., 2024)
Machine Learning Models: Original papers for mainstream models including CatBoost, XGBoost, LightGBM
Statistical Methods: Extreme value theory (de Haan & Ferreira, 2006), Kolmogorov-Smirnov tests, etc.

Overall Assessment: This is a high-quality paper proposing innovative solutions to an important yet overlooked application domain (financial stability in developing economies). The method is novel, experiments are comprehensive, and practical applications are significant, though theoretical depth and generalizability could be enhanced.