2025-11-12T09:37:10.141820

Epistemic Errors of Imperfect Multitask Learners When Distributions Shift

Sloman, Caprio, Kaski

Uncertainty-aware machine learners, such as Bayesian neural networks, output a quantification of uncertainty instead of a point prediction. In this work, we provide uncertainty-aware learners with a principled framework to characterize, and identify ways to eliminate, errors that arise from reducible (epistemic) uncertainty. We introduce a principled definition of epistemic error, and provide a decompositional epistemic error bound which operates in the very general setting of imperfect multitask learning under distribution shift. In this setting, the training (source) data may arise from multiple tasks, the test (target) data may differ systematically from the source data tasks, and/or the learner may not arrive at an accurate characterization of the source data. Our bound separately attributes epistemic errors to each of multiple aspects of the learning procedure and environment. As corollaries of the general result, we provide epistemic error bounds specialized to the settings of Bayesian transfer learning and distribution shift within $Îµ$-neighborhoods. We additionally leverage the terms in our bound to provide a novel definition of negative transfer.

academic

Epistemic Errors of Imperfect Multitask Learners When Distributions Shift

Basic Information

Paper ID: 2505.23496
Title: Epistemic Errors of Imperfect Multitask Learners When Distributions Shift
Authors: Sabina J. Sloman, Michele Caprio, Samuel Kaski
Classification: cs.LG stat.ML
Publication Date: October 13, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2505.23496

Abstract

This paper provides a principled framework for uncertainty-aware machine learning models (such as Bayesian neural networks) to characterize and reduce errors caused by reducible (epistemic) uncertainty. The paper introduces a principled definition of epistemic error and provides decomposable epistemic error bounds in the very general setting of imperfect multitask learning under distribution shift. In this setting, training (source) data may originate from multiple tasks, test (target) data may exhibit systematic differences from source tasks, and/or the learner may fail to accurately characterize source data. The bound attributes epistemic error to multiple aspects of both the learning process and the environment.

Research Background and Motivation

Problem Definition

The core problem this research addresses is: How can we provide a theoretical framework for uncertainty-aware learners to characterize and reduce epistemic error? Specifically:

Limitations of Traditional Learning Theory: Existing statistical learning theory primarily focuses on generalization error, but for learners that quantify output uncertainty, prediction error is an irrelevant, incomplete, or uninformative performance measure.
Confusion of Uncertainty Types: Traditional approaches conflate reducible epistemic uncertainty with irreducible aleatoric uncertainty, failing to effectively guide model improvement.
Lack of Theoretical Support for Complex Learning Scenarios: Complex real-world scenarios involving multitask learning, distribution shift, and imperfect learning lack theoretical guidance.

Research Significance

Practical Application Value: Accurate uncertainty quantification is critical in high-risk domains such as healthcare
Theoretical Advancement: Fills gaps in uncertainty-aware learning theory
Practical Guidance: Provides theoretical basis for model selection and optimization

Limitations of Existing Methods

Traditional frameworks such as PAC learning theory cannot distinguish epistemic error from aleatoric error
Lack of unified theoretical framework for multitask learning and distribution shift scenarios
Existing bounds typically assume perfect learning or absence of distribution shift

Core Contributions

Introduction of Epistemic Error Bounds: Proposes epistemic error bounds as a new theoretical tool specifically designed for uncertainty-aware learners
Decomposable Epistemic Error Bounds: Provides bounds that decompose epistemic error into three components in the general setting of imperfect multitask learning with distribution shift
Corollaries for Special Cases: Provides specialized epistemic error bounds for Bayesian transfer learning and distribution shift within ε-neighborhoods
New Definition of Negative Transfer: Provides new theoretical characterization of negative transfer phenomena based on terms in the bounds

Methodology Details

Task Definition

Epistemic error is defined as the degree to which the learner's understanding of the data-generating process (DGP) is incorrect, formalized as: $e := d_{TV}(\hat{P}, Q^t)$

where $\hat{P}$ is the learner's predictive distribution, $Q^t$ is the target task distribution, and $d_{TV}$ is the total variation distance.

Core Theoretical Framework

Multitask Learning Setting

Task Distribution: Tasks themselves are sampled from a second-order task distribution $\mathcal{Q} \in \Delta(\Delta_X)$
Source Tasks: Training data comes from $n$ source tasks, each task $Q \sim \mathcal{Q}^S$
Target Task: Test task $Q^t \sim \mathcal{Q}^T$
Distribution Shift: Occurs when $\mathcal{Q}^S \neq \mathcal{Q}^T$

Key Definitions

Centroid of Task Distribution (Definition 1): $\bar{Q}(x) := \int_{\Delta_X} Q(x) q(Q) dQ = \mathbb{E}_{Q \sim \mathcal{Q}}[Q(x)]$
Variability of Task Distribution (Definition 2): $V[\mathcal{Q}] := \sup_{x \in X} \int_{\Delta_X} [Q(x) - \bar{Q}(x)]^2 q(Q) dQ$
Approximation Bias (Definition 7): $B := d_{TV}(P^*, \bar{Q}^S)$ where $P^* = \arg\min_{P \in \pi} d_{TV}(P, \bar{Q}^S)$
Convergence Shortfall (Definition 8): $C := d_{TV}(\hat{P}, P^*)$
Degree of Distribution Shift (Definition 9): $D := d_{TV}(\bar{Q}^S, \bar{Q}^T)$

Main Theoretical Results

Theorem 1 (Main Result)

Given model class $\pi$ , predictor $\hat{P} \in \pi$ , source task distribution $\mathcal{Q}^S$ , and second-order bounded target task distribution $\mathcal{Q}^T$ :

$\Pr(e \geq \alpha + B + C + D) \leq \frac{V[\mathcal{Q}^T]}{\alpha^2}$

This bound decomposes epistemic error into:

B: Model Limitations (approximation bias)
C: Data Scarcity (convergence shortfall)
D: Distribution Shift
$V[\mathcal{Q}^T]$ : Target task variability

Proof Strategy

Uses the triangle inequality to construct a path in metric space: $d_{TV}(\hat{P}, Q^t) \leq d_{TV}(\hat{P}, P^*) + d_{TV}(P^*, \bar{Q}^S) + d_{TV}(\bar{Q}^S, \bar{Q}^T) + d_{TV}(\bar{Q}^T, Q^t)$

Combined with Chebyshev's inequality to control the impact of task variability.

Technical Innovations

Unified Framework: First to handle multitask learning, imperfect learning, and distribution shift within a single framework
Decomposable Analysis: Decomposes complex epistemic error into interpretable components
Practical Guidance: Each component corresponds to concrete improvement strategies
Theoretical Rigor: Based on rigorous metric space analysis and probability theory

Analysis of Special Cases

Bayesian Transfer Learning (Corollary 1)

For Bayesian learners, the convergence shortfall term can be expressed as posterior convergence: $C^{\Theta} := d_{TV}(P^{\Theta}_1, P^{\Theta}_*)$

This directly connects posterior convergence to epistemic error.

Total Variation Neighborhood (Corollary 2)

Under ε-neighborhood constraints: $\Pr(e \geq \alpha + B + C + D) \leq \frac{\beta}{\alpha^2}(V[\mathcal{Q}^S] + \text{vol}(\mathcal{Q}^T))$

where $\beta = (1-b_T)/b_S$ , $\text{vol}(\mathcal{Q}^T) = (\text{diam}(\mathcal{Q}^S) + \varepsilon)^2$ .

Experimental Validation

Experimental Setup

Model: Bayesian linear regression
Data Generation: $x \sim N(\beta_1^S \xi_1 + \beta_2^S \xi_2, \sigma^S)$
Prior: Normal-Inverse-Gamma model
Distance Approximation: Uses Pinsker's inequality to approximate total variation distance

Main Experimental Results

Posterior Convergence Effect (Figure 1a): Epistemic error decreases as posterior probability of source data-generating parameters increases
Neighborhood Size Effect (Figure 1b): Epistemic error increases with ε-neighborhood size
Negative Transfer Phenomenon (Figure 3): Bound tightness is highly correlated with negative transfer phenomena

Experimental Findings

Theoretical predictions align closely with experimental observations
Bounds become looser in negative transfer cases, consistent with theoretical analysis
Relative importance of components varies across scenarios

Statistical Learning Theory

Multitask Domain Generalization: Baxter (2000), Maurer et al., but without considering distribution shift
Domain Adaptation Theory: Redko et al. (2019), but assumes learner knows distribution shift
Credal Learning Theory: Caprio et al. (2024), but limited to specific learners

Uncertainty Quantification

Bayesian Deep Learning: Papamarkou et al. (2024)
Conformal Prediction: Angelopoulos and Bates (2023)
Credal Learning: Caprio et al. (2024)

Advantages of This Work

More General Setting: Simultaneously handles multitask learning, imperfect learning, and distribution shift
Learner-Agnostic: Does not depend on specific learning algorithms
Decomposable Analysis: Provides actionable improvement guidance

Conclusions and Discussion

Main Conclusions

Provides the first decomposable epistemic error bound for uncertainty-aware learners
Works in very general settings, covering diverse practical scenarios
Provides theoretical guidance framework for model selection and optimization

Limitations

Computational Complexity: Total variation distance is typically difficult to compute exactly
Assumption Constraints: Requires technical assumptions such as second-order bounded distributions
Conformal Prediction: Framework cannot fully characterize conformal prediction settings
Experimental Validation: Validation only on low-dimensional synthetic data

Future Directions

Extension to time-dependent tasks and data
Complete characterization of conformal prediction settings
Experimental validation on high-dimensional and real data
Development of more computationally tractable bound variants

In-Depth Evaluation

Strengths

Strong Theoretical Innovation: First systematic theoretical framework for uncertainty-aware learning
High Practical Value: Decomposable analysis directly guides practical improvements
Mathematical Rigor: Complete proofs with solid theoretical foundations
Clear Presentation: Well-structured with clear concept definitions

Weaknesses

Computational Feasibility: Practical computation of theoretical results poses challenges
Experimental Limitations: Limited experimental scale and complexity
Strict Assumptions: Some technical assumptions may be difficult to satisfy in practice
Incomplete Coverage: Incomplete support for certain uncertainty quantification methods (e.g., conformal prediction)

Impact

Theoretical Contribution: Establishes foundation for uncertainty-aware learning theory
Practical Guidance: Provides basis for model selection in high-risk applications
Research Inspiration: Opens new research directions

Applicable Scenarios

Medical Diagnosis: Clinical predictions requiring accurate uncertainty quantification
Financial Risk: Risk modeling in multi-market environments
Autonomous Driving: Safety decision-making under environmental changes
Scientific Discovery: Cross-domain knowledge transfer

References

This paper cites important works from statistical learning theory, Bayesian inference, and uncertainty quantification, including:

Shalev-Shwarz & Ben-David (2014): Foundations of statistical learning theory
Papamarkou et al. (2024): Bayesian deep learning
Angelopoulos & Bates (2023): Conformal prediction
Redko et al. (2019): Domain adaptation theory

This is an important paper making significant theoretical contributions to uncertainty-aware machine learning. It provides a solid theoretical foundation and practical analytical framework for the field. While there is room for improvement in computational feasibility and experimental validation, its theoretical innovation and practical value make it an important work in the field.