Predicting outcomes in external domains is challenging due to hidden confounders that potentially influence both predictors and outcomes. Well-established methods frequently rely on stringent assumptions, explicit knowledge about the distribution shift across domains, or bias-inducing regularization schemes to enhance generalization. While recent developments in point prediction under hidden confounding attempt to mitigate these shortcomings, they generally do not provide principled uncertainty quantification. We introduce a Bayesian framework that yields well-calibrated predictive distributions across external domains, supports valid model inference, and achieves posterior contraction rates that improve as the number of observed datasets increases. Simulations and a medical application highlight the remarkable empirical coverage of our approach, nearly unchanged when transitioning from low- to moderate-dimensional settings.
Predicting outcomes in external domains is challenging because hidden confounders may simultaneously affect predictor variables and outcome variables. Existing methods typically rely on stringent assumptions, explicit knowledge of cross-domain distributional shifts, or introduce bias-inducing regularization schemes to enhance generalization capacity. While point prediction methods under hidden confounding attempt to mitigate these shortcomings, they typically fail to provide principled uncertainty quantification. This paper introduces a Bayesian framework capable of producing well-calibrated predictive distributions in external domains, supporting effective model inference, and achieving posterior contraction rates that improve with increasing numbers of observational datasets. Simulation experiments and medical applications highlight the method's remarkable empirical coverage rates, which remain nearly invariant across transitions from low-dimensional to moderate-dimensional settings.
The core problem addressed by this research is: How to conduct reliable probabilistic predictions and provide calibrated uncertainty quantification in external domains with distributional shifts in the presence of hidden confounders?
Ubiquity of distributional shift: Machine learning applications frequently encounter inconsistencies between training and test domain distributions, challenging standard i.i.d. assumptions
Impact of hidden confounding: Unobserved confounding variables simultaneously affect predictor variables X and outcome variables Y, causing traditional methods to fail
Demand for uncertainty quantification: Existing methods primarily focus on point predictions, lacking principled uncertainty quantification mechanisms
Building upon prior work on Generative Invariance (GI), the authors aim to construct a unified Bayesian framework that simultaneously addresses two long-standing challenging problems: causal discovery and calibrated prediction.
First Bayesian framework: Proposes a complete Bayesian framework for probabilistic prediction under hidden confounding, enabling simultaneous causal discovery and prediction
Theoretical guarantees: Establishes posterior consistency, contraction rates, and Bernstein-von Mises theorem, proving the asymptotic properties of the method
Hypothesis testing capability: Provides the first computationally tractable hypothesis test for determining whether a variable is a parent node of the target response in linear structural equation models
Calibrated predictions: Achieves well-calibrated predictions in distributional shift domains with coverage rates approaching theoretical levels
Identifiability spectrum: First explicitly articulates weak identifiability as an empirical manifestation of an asymptotic phenomenon
Treats environment means μ_e as random quantities sampled from a common prior distribution rather than fixed parameters, achieving beneficial shrinkage effects.
When identifiability conditions are nearly violated, the Bayesian approach avoids numerical instability of frequentist methods through controlled shrinkage.
Figure 2 demonstrates the weak identifiability phenomenon: as μ→0, the posterior shrinks toward the prior mean, avoiding matrix singularity issues encountered by frequentist methods.
Rothenhäusler, D., et al. (2021). Anchor regression: Heterogeneous data meet causality. Journal of the Royal Statistical Society Series B, 83(2), 215-246.
Peters, J., Bühlmann, P., & Meinshausen, N. (2016). Causal inference by using invariant prediction: Identification and confidence intervals. Journal of the Royal Statistical Society Series B, 78(5), 947-1012.
Tibshirani, R. J., et al. (2019). Conformal prediction under covariate shift. Advances in Neural Information Processing Systems, 32.
Meixide, C. G., & Insua, D. R. (2025). Unsupervised domain adaptation under hidden confounding. arXiv preprint.