2025-11-15T01:58:11.277924

Accounting for Missing Data in Public Health Research Using a Synthesis of Statistical and Mathematical Models

Zivich, Shook-Sa, Cole et al.

Introduction: Accounting for missing data by imputing or weighting conditional on covariates relies on the variable with missingness being observed at least some of the time for all unique covariate values. This requirement is referred to as positivity and positivity violations can result in bias. Here, we review a novel approach to addressing positivity violations in the context of systolic blood pressure. Methods: To illustrate the proposed approach, we estimate the mean systolic blood pressure among children and adolescents aged 2-17 years old in the United States using data from the 2017-2018 National Health and Nutrition Examination Survey (NHANES). As blood pressure was not measured for those aged 2-7, there exists a positivity violation by design. Using a recently proposed synthesis of statistical and mathematical models, we integrate external information with NHANES to address our motivating question. Results: With the synthesis model, the estimated mean systolic blood pressure was 100.5 (95% confidence interval: 99.9, 101.0), which is notably lower than either a complete-case analysis or extrapolation from a statistical model. The synthesis results were supported by a diagnostic comparing the performance of the mathematical model in the positive region. Discussion: Positivity violations pose a threat to quantitative medical research, and standard approaches to addressing nonpositivity rely on restrictive untestable assumptions. Using a synthesis model, like the one detailed here, offers a viable alternative.

academic

공중보건 연구에서 통계 및 수학 모델의 종합을 이용한 결측 데이터 처리

기본 정보

논문 ID: 2503.02789
제목: Accounting for Missing Data in Public Health Research Using a Synthesis of Statistical and Mathematical Models
저자: Paul N Zivich, Bonnie E Shook-Sa, Stephen R Cole, Eric T Lofgren, Jessie K Edwards
분류: stat.AP (응용통계), stat.ME (통계방법론)
발표 시간: 2025년 10월 16일
논문 링크: https://arxiv.org/abs/2503.02789

초록

본 연구는 공중보건 연구에서 결측 데이터 처리 시 양의 성질(positivity) 위반 문제를 다루기 위해 통계 모델과 수학 모델을 결합한 종합적 방법을 제안한다. 본 연구는 2017-2018년 국가건강영양조사(NHANES) 데이터를 이용하여 미국 2-17세 아동청소년의 수축기 혈압 평균값 추정을 사례로 제시한다. NHANES 설계에서 2-7세 아동의 혈압을 측정하지 않아 설계상 양의 성질 위반이 발생한다. 외부 정보와 NHANES 데이터를 통합하여, 종합 모델로 추정한 평균 수축기 혈압은 100.5 mmHg (95% CI: 99.9, 101.0)로, 완전 사례 분석이나 통계 모델 외삽 결과보다 유의하게 낮다.

연구 배경 및 동기

핵심 문제 식별

양의 성질 가정의 중요성: 결측 데이터 처리에서 공변량을 통한 대체 또는 가중치 부여는 양의 성질 가정에 의존한다. 즉, 모든 고유 공변량 값에 대해 결측 변수가 최소한 일부 경우에는 관측되어야 한다.
양의 성질 위반의 보편성: 특정 공변량 조합에서 목표 변수의 관측값이 완전히 결측되면 양의 성질 위반이 발생하여 편향을 초래한다.
기존 방법의 한계: 비양의 성질을 다루는 전통적 방법은 연구 문제를 수정하거나 제한적이고 검증 불가능한 모델링 가정에 의존한다.