2025-11-12T09:37:10.141820

Epistemic Errors of Imperfect Multitask Learners When Distributions Shift

Sloman, Caprio, Kaski
Uncertainty-aware machine learners, such as Bayesian neural networks, output a quantification of uncertainty instead of a point prediction. In this work, we provide uncertainty-aware learners with a principled framework to characterize, and identify ways to eliminate, errors that arise from reducible (epistemic) uncertainty. We introduce a principled definition of epistemic error, and provide a decompositional epistemic error bound which operates in the very general setting of imperfect multitask learning under distribution shift. In this setting, the training (source) data may arise from multiple tasks, the test (target) data may differ systematically from the source data tasks, and/or the learner may not arrive at an accurate characterization of the source data. Our bound separately attributes epistemic errors to each of multiple aspects of the learning procedure and environment. As corollaries of the general result, we provide epistemic error bounds specialized to the settings of Bayesian transfer learning and distribution shift within $ε$-neighborhoods. We additionally leverage the terms in our bound to provide a novel definition of negative transfer.
academic

Epistemic Errors of Imperfect Multitask Learners When Distributions Shift

Basic Information

  • Paper ID: 2505.23496
  • Title: Epistemic Errors of Imperfect Multitask Learners When Distributions Shift
  • Authors: Sabina J. Sloman, Michele Caprio, Samuel Kaski
  • Classification: cs.LG stat.ML
  • Publication Date: October 13, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2505.23496

Abstract

This paper provides a principled framework for uncertainty-aware machine learning models (such as Bayesian neural networks) to characterize and reduce errors caused by reducible (epistemic) uncertainty. The paper introduces a principled definition of epistemic error and provides decomposable epistemic error bounds in the very general setting of imperfect multitask learning under distribution shift. In this setting, training (source) data may originate from multiple tasks, test (target) data may exhibit systematic differences from source tasks, and/or the learner may fail to accurately characterize source data. The bound attributes epistemic error to multiple aspects of both the learning process and the environment.

Research Background and Motivation

Problem Definition

The core problem this research addresses is: How can we provide a theoretical framework for uncertainty-aware learners to characterize and reduce epistemic error? Specifically:

  1. Limitations of Traditional Learning Theory: Existing statistical learning theory primarily focuses on generalization error, but for learners that quantify output uncertainty, prediction error is an irrelevant, incomplete, or uninformative performance measure.
  2. Confusion of Uncertainty Types: Traditional approaches conflate reducible epistemic uncertainty with irreducible aleatoric uncertainty, failing to effectively guide model improvement.
  3. Lack of Theoretical Support for Complex Learning Scenarios: Complex real-world scenarios involving multitask learning, distribution shift, and imperfect learning lack theoretical guidance.

Research Significance

  1. Practical Application Value: Accurate uncertainty quantification is critical in high-risk domains such as healthcare
  2. Theoretical Advancement: Fills gaps in uncertainty-aware learning theory
  3. Practical Guidance: Provides theoretical basis for model selection and optimization

Limitations of Existing Methods

  • Traditional frameworks such as PAC learning theory cannot distinguish epistemic error from aleatoric error
  • Lack of unified theoretical framework for multitask learning and distribution shift scenarios
  • Existing bounds typically assume perfect learning or absence of distribution shift

Core Contributions

  1. Introduction of Epistemic Error Bounds: Proposes epistemic error bounds as a new theoretical tool specifically designed for uncertainty-aware learners
  2. Decomposable Epistemic Error Bounds: Provides bounds that decompose epistemic error into three components in the general setting of imperfect multitask learning with distribution shift
  3. Corollaries for Special Cases: Provides specialized epistemic error bounds for Bayesian transfer learning and distribution shift within ε-neighborhoods
  4. New Definition of Negative Transfer: Provides new theoretical characterization of negative transfer phenomena based on terms in the bounds

Methodology Details

Task Definition

Epistemic error is defined as the degree to which the learner's understanding of the data-generating process (DGP) is incorrect, formalized as: e:=dTV(P^,Qt)e := d_{TV}(\hat{P}, Q^t)

where P^\hat{P} is the learner's predictive distribution, QtQ^t is the target task distribution, and dTVd_{TV} is the total variation distance.

Core Theoretical Framework

Multitask Learning Setting

  • Task Distribution: Tasks themselves are sampled from a second-order task distribution QΔ(ΔX)\mathcal{Q} \in \Delta(\Delta_X)
  • Source Tasks: Training data comes from nn source tasks, each task QQSQ \sim \mathcal{Q}^S
  • Target Task: Test task QtQTQ^t \sim \mathcal{Q}^T
  • Distribution Shift: Occurs when QSQT\mathcal{Q}^S \neq \mathcal{Q}^T

Key Definitions

  1. Centroid of Task Distribution (Definition 1): Qˉ(x):=ΔXQ(x)q(Q)dQ=EQQ[Q(x)]\bar{Q}(x) := \int_{\Delta_X} Q(x) q(Q) dQ = \mathbb{E}_{Q \sim \mathcal{Q}}[Q(x)]
  2. Variability of Task Distribution (Definition 2): V[Q]:=supxXΔX[Q(x)Qˉ(x)]2q(Q)dQV[\mathcal{Q}] := \sup_{x \in X} \int_{\Delta_X} [Q(x) - \bar{Q}(x)]^2 q(Q) dQ
  3. Approximation Bias (Definition 7): B:=dTV(P,QˉS)B := d_{TV}(P^*, \bar{Q}^S) where P=argminPπdTV(P,QˉS)P^* = \arg\min_{P \in \pi} d_{TV}(P, \bar{Q}^S)
  4. Convergence Shortfall (Definition 8): C:=dTV(P^,P)C := d_{TV}(\hat{P}, P^*)
  5. Degree of Distribution Shift (Definition 9): D:=dTV(QˉS,QˉT)D := d_{TV}(\bar{Q}^S, \bar{Q}^T)

Main Theoretical Results

Theorem 1 (Main Result)

Given model class π\pi, predictor P^π\hat{P} \in \pi, source task distribution QS\mathcal{Q}^S, and second-order bounded target task distribution QT\mathcal{Q}^T:

Pr(eα+B+C+D)V[QT]α2\Pr(e \geq \alpha + B + C + D) \leq \frac{V[\mathcal{Q}^T]}{\alpha^2}

This bound decomposes epistemic error into:

  • B: Model Limitations (approximation bias)
  • C: Data Scarcity (convergence shortfall)
  • D: Distribution Shift
  • V[QT]V[\mathcal{Q}^T]: Target task variability

Proof Strategy

Uses the triangle inequality to construct a path in metric space: dTV(P^,Qt)dTV(P^,P)+dTV(P,QˉS)+dTV(QˉS,QˉT)+dTV(QˉT,Qt)d_{TV}(\hat{P}, Q^t) \leq d_{TV}(\hat{P}, P^*) + d_{TV}(P^*, \bar{Q}^S) + d_{TV}(\bar{Q}^S, \bar{Q}^T) + d_{TV}(\bar{Q}^T, Q^t)

Combined with Chebyshev's inequality to control the impact of task variability.

Technical Innovations

  1. Unified Framework: First to handle multitask learning, imperfect learning, and distribution shift within a single framework
  2. Decomposable Analysis: Decomposes complex epistemic error into interpretable components
  3. Practical Guidance: Each component corresponds to concrete improvement strategies
  4. Theoretical Rigor: Based on rigorous metric space analysis and probability theory

Analysis of Special Cases

Bayesian Transfer Learning (Corollary 1)

For Bayesian learners, the convergence shortfall term can be expressed as posterior convergence: CΘ:=dTV(P1Θ,PΘ)C^{\Theta} := d_{TV}(P^{\Theta}_1, P^{\Theta}_*)

This directly connects posterior convergence to epistemic error.

Total Variation Neighborhood (Corollary 2)

Under ε-neighborhood constraints: Pr(eα+B+C+D)βα2(V[QS]+vol(QT))\Pr(e \geq \alpha + B + C + D) \leq \frac{\beta}{\alpha^2}(V[\mathcal{Q}^S] + \text{vol}(\mathcal{Q}^T))

where β=(1bT)/bS\beta = (1-b_T)/b_S, vol(QT)=(diam(QS)+ε)2\text{vol}(\mathcal{Q}^T) = (\text{diam}(\mathcal{Q}^S) + \varepsilon)^2.

Experimental Validation

Experimental Setup

  • Model: Bayesian linear regression
  • Data Generation: xN(β1Sξ1+β2Sξ2,σS)x \sim N(\beta_1^S \xi_1 + \beta_2^S \xi_2, \sigma^S)
  • Prior: Normal-Inverse-Gamma model
  • Distance Approximation: Uses Pinsker's inequality to approximate total variation distance

Main Experimental Results

  1. Posterior Convergence Effect (Figure 1a): Epistemic error decreases as posterior probability of source data-generating parameters increases
  2. Neighborhood Size Effect (Figure 1b): Epistemic error increases with ε-neighborhood size
  3. Negative Transfer Phenomenon (Figure 3): Bound tightness is highly correlated with negative transfer phenomena

Experimental Findings

  • Theoretical predictions align closely with experimental observations
  • Bounds become looser in negative transfer cases, consistent with theoretical analysis
  • Relative importance of components varies across scenarios

Statistical Learning Theory

  • Multitask Domain Generalization: Baxter (2000), Maurer et al., but without considering distribution shift
  • Domain Adaptation Theory: Redko et al. (2019), but assumes learner knows distribution shift
  • Credal Learning Theory: Caprio et al. (2024), but limited to specific learners

Uncertainty Quantification

  • Bayesian Deep Learning: Papamarkou et al. (2024)
  • Conformal Prediction: Angelopoulos and Bates (2023)
  • Credal Learning: Caprio et al. (2024)

Advantages of This Work

  1. More General Setting: Simultaneously handles multitask learning, imperfect learning, and distribution shift
  2. Learner-Agnostic: Does not depend on specific learning algorithms
  3. Decomposable Analysis: Provides actionable improvement guidance

Conclusions and Discussion

Main Conclusions

  1. Provides the first decomposable epistemic error bound for uncertainty-aware learners
  2. Works in very general settings, covering diverse practical scenarios
  3. Provides theoretical guidance framework for model selection and optimization

Limitations

  1. Computational Complexity: Total variation distance is typically difficult to compute exactly
  2. Assumption Constraints: Requires technical assumptions such as second-order bounded distributions
  3. Conformal Prediction: Framework cannot fully characterize conformal prediction settings
  4. Experimental Validation: Validation only on low-dimensional synthetic data

Future Directions

  1. Extension to time-dependent tasks and data
  2. Complete characterization of conformal prediction settings
  3. Experimental validation on high-dimensional and real data
  4. Development of more computationally tractable bound variants

In-Depth Evaluation

Strengths

  1. Strong Theoretical Innovation: First systematic theoretical framework for uncertainty-aware learning
  2. High Practical Value: Decomposable analysis directly guides practical improvements
  3. Mathematical Rigor: Complete proofs with solid theoretical foundations
  4. Clear Presentation: Well-structured with clear concept definitions

Weaknesses

  1. Computational Feasibility: Practical computation of theoretical results poses challenges
  2. Experimental Limitations: Limited experimental scale and complexity
  3. Strict Assumptions: Some technical assumptions may be difficult to satisfy in practice
  4. Incomplete Coverage: Incomplete support for certain uncertainty quantification methods (e.g., conformal prediction)

Impact

  1. Theoretical Contribution: Establishes foundation for uncertainty-aware learning theory
  2. Practical Guidance: Provides basis for model selection in high-risk applications
  3. Research Inspiration: Opens new research directions

Applicable Scenarios

  1. Medical Diagnosis: Clinical predictions requiring accurate uncertainty quantification
  2. Financial Risk: Risk modeling in multi-market environments
  3. Autonomous Driving: Safety decision-making under environmental changes
  4. Scientific Discovery: Cross-domain knowledge transfer

References

This paper cites important works from statistical learning theory, Bayesian inference, and uncertainty quantification, including:

  • Shalev-Shwarz & Ben-David (2014): Foundations of statistical learning theory
  • Papamarkou et al. (2024): Bayesian deep learning
  • Angelopoulos & Bates (2023): Conformal prediction
  • Redko et al. (2019): Domain adaptation theory

This is an important paper making significant theoretical contributions to uncertainty-aware machine learning. It provides a solid theoretical foundation and practical analytical framework for the field. While there is room for improvement in computational feasibility and experimental validation, its theoretical innovation and practical value make it an important work in the field.