2025-11-15T01:58:11.277924

Accounting for Missing Data in Public Health Research Using a Synthesis of Statistical and Mathematical Models

Zivich, Shook-Sa, Cole et al.

Introduction: Accounting for missing data by imputing or weighting conditional on covariates relies on the variable with missingness being observed at least some of the time for all unique covariate values. This requirement is referred to as positivity and positivity violations can result in bias. Here, we review a novel approach to addressing positivity violations in the context of systolic blood pressure. Methods: To illustrate the proposed approach, we estimate the mean systolic blood pressure among children and adolescents aged 2-17 years old in the United States using data from the 2017-2018 National Health and Nutrition Examination Survey (NHANES). As blood pressure was not measured for those aged 2-7, there exists a positivity violation by design. Using a recently proposed synthesis of statistical and mathematical models, we integrate external information with NHANES to address our motivating question. Results: With the synthesis model, the estimated mean systolic blood pressure was 100.5 (95% confidence interval: 99.9, 101.0), which is notably lower than either a complete-case analysis or extrapolation from a statistical model. The synthesis results were supported by a diagnostic comparing the performance of the mathematical model in the positive region. Discussion: Positivity violations pose a threat to quantitative medical research, and standard approaches to addressing nonpositivity rely on restrictive untestable assumptions. Using a synthesis model, like the one detailed here, offers a viable alternative.

academic

公衆衛生研究における欠損データの説明：統計モデルと数学モデルの統合を用いた方法

基本情報

論文ID: 2503.02789
タイトル: Accounting for Missing Data in Public Health Research Using a Synthesis of Statistical and Mathematical Models
著者: Paul N Zivich, Bonnie E Shook-Sa, Stephen R Cole, Eric T Lofgren, Jessie K Edwards
分類: stat.AP（応用統計）、stat.ME（統計方法論）
発表日: 2025年10月16日
論文リンク: https://arxiv.org/abs/2503.02789

要旨

本研究は、公衆衛生研究における欠損データ処理の正向性違反問題に対して、統計モデルと数学モデルを統合した方法を提案している。本研究は、米国2～17歳児童・青少年の収縮期血圧平均値の推定を例として、2017～2018年国民健康栄養調査（NHANES）データを使用している。NHANES設計では2～7歳児童の血圧測定がなされていないため、設計上の正向性違反が存在する。外部情報とNHANESデータを統合することで、統合モデルで推定された平均収縮期血圧は100.5 mmHg（95% CI: 99.9, 101.0）であり、完全症例分析または統計モデルの外挿結果よりも有意に低かった。

研究背景と動機

核心的課題の認識

正向性仮説の重要性: 欠損データ処理において、共変量による補完または重み付けは正向性仮説に依存している。すなわち、すべての一意な共変量値に対して、欠損変数が少なくとも時折観測されることが必要である
正向性違反の普遍性: 特定の共変量の組み合わせで目的変数の観測値が完全に欠損する場合、正向性違反が生じ、その結果バイアスが発生する
既存方法の限界: 非正向性に対処する従来の方法は、研究問題を修正するか、制限的で検証不可能な建模仮説に依存している