2025-11-11T07:10:11.815577

"Within-trial" prognostic score adjustment is targeted maximum likelihood estimation

Højbjerre-Frandsen, Schuler
Adjustment for ``super'' or ``prognostic'' composite covariates has become more popular in randomized trials recently. These prognostic covariates are often constructed from historical data by fitting a predictive model of the outcome on the raw covariates. A natural question that we have been asked by applied researchers is whether this can be done without the historical data: can the prognostic covariate be constructed or derived from the trial data itself, possibly using different folds of the data, before adjusting for it? Here we clarify that such ``within-trial'' prognostic adjustment is nothing more than a form of targeted maximum likelihood estimation (TMLE), a well-studied procedure for optimal inference. We demonstrate the equivalence with a simulation study and discuss the pros and cons of within-trial prognostic adjustment (standard efficient estimation) relative to standard TMLE and standard prognostic adjustment with historical data.
academic

"Within-trial" Prognostic Score Adjustment is Targeted Maximum Likelihood Estimation

Basic Information

  • Paper ID: 2507.23446
  • Title: "Within-trial" Prognostic Score Adjustment is Targeted Maximum Likelihood Estimation
  • Authors: Emilie Højbjerre-Frandsen, Alejandro Schuler
  • Classification: stat.ME (Statistics - Methodology)
  • Publication Date: November 6, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2507.23446v2

Abstract

In recent years, adjustment for "super" or "prognostic" composite covariates in randomized trials has become increasingly popular. These prognostic covariates are typically constructed from historical data by fitting predictive models of outcomes on baseline covariates. A natural question frequently asked by applied researchers is: can this be accomplished without historical data—can prognostic covariates be constructed or derived from the trial data itself, possibly using cross-validation techniques, and then adjusted for? This paper clarifies that such "within-trial" prognostic adjustment is merely a form of targeted maximum likelihood estimation (TMLE), a well-studied optimal inference procedure. The authors demonstrate equivalence through simulation studies and discuss the advantages and disadvantages of within-trial prognostic adjustment relative to standard TMLE and standard prognostic adjustment using historical data.

Research Background and Motivation

Problem Background

  1. Rise of Prognostic Covariate Adjustment: In randomized clinical trials (RCTs), covariate adjustment using "super covariates" or "prognostic covariates" has become a popular method for improving statistical efficiency. This idea traces back to Tukey (1993) and aims to develop a single prognostic covariate from historical data while reducing overfitting risk and improving efficiency.
  2. Historical Data Dependency Issue: Traditional prognostic score adjustment methods (such as PROCOVA™) rely on historical data from previous clinical trials or registry studies. However, in practical applications, researchers frequently face situations where historical data is unavailable or unreliable.
  3. Need for Within-trial Adjustment: Applied researchers naturally ask: can prognostic covariates be constructed without using historical data? Can prognostic covariates be derived directly from trial data itself (possibly using cross-validation techniques) and then adjusted for?

Research Motivation

The core motivation of this research is to clarify the nature of "within-trial" prognostic score adjustment and reveal its relationship to existing statistical methods, avoiding "reinventing the wheel."

Core Contributions

  1. Theoretical Equivalence Proof: First explicitly demonstrates that within-trial prognostic score adjustment is essentially a form of targeted maximum likelihood estimation (TMLE).
  2. Methodological Clarification: Clarifies that within-trial prognostic adjustment is not a new method but rather an implementation of TMLE under a specific submodel, and should therefore be called TMLE directly rather than renamed.
  3. Comparative Analysis: Systematically compares the advantages and disadvantages of within-trial prognostic adjustment, standard TMLE, and standard prognostic adjustment methods based on historical data.
  4. Empirical Verification: Validates theoretical equivalence through simulation studies and demonstrates the performance of different methods across various scenarios.

Methodological Details

Task Definition

Estimating average treatment effect (ATE) in two-arm randomized trials:

  • Input: Observed data Oi=(Wi,Ai,Yi)O_i = (W_i, A_i, Y_i) for n participants
  • Output: Causal average treatment effect Ψ=E[Y(1)Y(0)]\Psi^* = E[Y(1) - Y(0)]
  • Constraints: Simple randomization assumption, treatment assignment probability known

Where:

  • YY: Continuous primary endpoint variable
  • WW: p-dimensional vector of baseline covariates
  • AA: Treatment indicator (1 for new treatment, 0 for control)

Core Method Architecture

1. ANCOVA Estimator (Plug-in Method)

Using G-computation formulation:

  1. Estimate conditional mean function μ(a,w)=E[YA=a,W=w]\mu(a,w) = E[Y|A=a,W=w] using MLE
  2. Extract counterfactual predictions: Ψ^a=1ni=1nμ^(a,Wi)\hat{\Psi}_a = \frac{1}{n}\sum_{i=1}^n \hat{\mu}(a,W_i)
  3. Obtain ATE estimate: Ψ^=Ψ^1Ψ^0\hat{\Psi} = \hat{\Psi}_1 - \hat{\Psi}_0
  4. Calculate asymptotic variance using influence functions

2. Prognostic Score Adjustment

Define prognostic score as: ρD(W,A):=E[YW,A,D]\rho_D(W,A) := E[Y|W,A,D]

where D denotes data source (D=1 for new trial, D=0 for historical data).

Standard prognostic adjustment procedure:

  1. Train prognostic model ρ^0(W,A)\hat{\rho}_0(W,A) using historical data
  2. Include prognostic predictions as additional covariates in ANCOVA analysis
  3. Achieve efficiency under homogeneous treatment effect assumption

3. TMLE Method

TMLE addresses bias issues in machine learning models through the following steps:

  1. Initial Estimation: Obtain initial conditional mean estimate μ^\hat{\mu} using machine learning
  2. Targeted Submodel: Perform MLE update within parametric model family {pϵ(YA,W)N(μ^(A,W)+ϵA±,1):ϵR}\{p_\epsilon(Y|A,W) \sim N(\hat{\mu}(A,W) + \epsilon A_{\pm}, 1) : \epsilon \in \mathbb{R}\} where A±=2A1A_{\pm} = 2A - 1
  3. Update Step: Find MLE solution ϵ\epsilon^* and update prediction function μ^(a,w)=μ^(a,w)+ϵa±\hat{\mu}^*(a,w) = \hat{\mu}(a,w) + \epsilon^* a_{\pm}
  4. Debiasing Condition: Updated model satisfies E[μ^(1,W)μ^(0,W)]=Ψ~E[\hat{\mu}^*(1,W) - \hat{\mu}^*(0,W)] = \tilde{\Psi} where Ψ~\tilde{\Psi} is the unadjusted effect estimate

Key Theoretical Result: Equivalence Proof

Theorem: Within-trial prognostic score adjustment is equivalent to TMLE using a specific submodel.

Proof Sketch:

  1. Within-trial prognostic adjustment uses regression model: Y=β1A±+β2μ^(A,W)+Xβ3+N(0,1)Y = \beta_1 A_{\pm} + \beta_2 \hat{\mu}(A,W) + X\beta_3 + N(0,1)
  2. This is precisely a valid target submodel for TMLE, satisfying:
    • Condition 1: Recovers initial regression when β=(0,1,0)\beta = (0,1,0)
    • Condition 2: Derivative with respect to β1\beta_1 gives debiasing direction A±(Yμ^(A,W))A_{\pm}(Y - \hat{\mu}(A,W))
  3. Therefore, the ANCOVA step in within-trial prognostic adjustment exactly corresponds to the TMLE update step

Experimental Setup

Data Generation Process

Simulation data generated based on structural causal models:

Covariate Generation:

  • W1,W2Unif(2,1)W_1, W_2 \sim \text{Unif}(-2,1)
  • W3N(0,3)W_3 \sim N(0,3)
  • W4Exp(0.8)W_4 \sim \text{Exp}(0.8)
  • W5Γ(5,10)W_5 \sim \Gamma(5,10)
  • W6,W7Unif(1,2)W_6, W_7 \sim \text{Unif}(1,2)

Outcome Generation:

  • Homogeneous effect scenario: m1(W)=ATE+m0(W)m_1(W) = \text{ATE} + m_0(W)
  • Heterogeneous effect scenario: m1(W)m_1(W) includes complex nonlinear interaction terms

where ATE = 0.84, and m0(W)m_0(W) comprises complex combinations of sine functions and indicator functions.

Experimental Design

  • Sample Size: Main experiment n=200, sensitivity analysis n∈50,400
  • Simulation Runs: N=250 replications
  • Machine Learning Method: Discrete Super Learner
  • Evaluation Metrics: Standard error estimation, empirical power, coverage rate

Comparison Methods

  1. Within-trial prognostic score adjustment
  2. Standard TMLE
  3. Unadjusted estimator (as baseline)

Experimental Results

Main Results

1. Theoretical Equivalence Verification

Simulation results confirm theoretical predictions:

  • Within-trial prognostic adjustment and TMLE show high consistency in standard error estimation
  • Point estimates and confidence intervals from both methods are nearly identical
  • Minor differences stem from within-trial method including linear covariate terms in update submodel

2. Performance Comparison

Standard Error Performance:

  • Homogeneous scenario: Standard error estimates nearly identical between methods (approximately 0.21-0.22)
  • Heterogeneous scenario: Maintain consistent excellent performance
  • Empirical standard errors highly aligned with theoretical estimates

Power and Coverage Rate:

  • Power curves completely overlap as sample size increases
  • 95% confidence interval coverage rates stable near nominal level
  • Stable performance across small sample (n=50) to large sample (n=400) ranges

3. Numerical Results

From simulation figures:

  • Average standard error estimates (solid points) highly consistent with empirical standard errors (asterisks)
  • Power monotonically increases with sample size, conforming to theoretical expectations
  • Coverage rates fluctuate within 94%-96% range, approaching 95% nominal level

Experimental Findings

  1. Substantial Equivalence: Within-trial prognostic adjustment and TMLE perform nearly identically in practical applications, validating theoretical equivalence.
  2. Evidence of Redundancy: Including additional linear covariate terms in the update submodel has negligible impact on results, as prognostic scores already capture these linear trends.
  3. Robustness: Both methods demonstrate good robustness across different data generation scenarios and sample sizes.

Development of Prognostic Score Adjustment

  • Historical Origins: Tukey (1993) first proposed related ideas
  • Modern Development: Schuler et al. (2022) formalized the PROCOVA™ method
  • Efficiency Theory: Achieves semiparametric efficiency bound under homogeneous treatment effect assumption

TMLE Method Framework

  • Foundational Theory: van der Laan and Rubin (2006) established theoretical framework for TMLE
  • Cross-fitting Extensions: Multiple studies developed TMLE variants based on cross-validation
  • Efficiency Properties: Achieves local semiparametric efficiency under weak conditions
  • Double Machine Learning: Debiasing method asymptotically equivalent to TMLE
  • Augmented IPW: Alternative doubly robust estimator
  • G-computation: Traditional plug-in estimation method

Conclusions and Discussion

Main Conclusions

  1. Methodological Clarification: Within-trial prognostic score adjustment is essentially TMLE and should not be renamed as a new method.
  2. Practical Recommendations: Researchers should directly use existing TMLE software packages rather than reimplementing within-trial prognostic adjustment.
  3. Theoretical Unification: This equivalence provides deeper theoretical understanding of prognostic adjustment methods.

Limitations

  1. Cross-fitting Requirement: Practical applications require cross-fitting to avoid overfitting, increasing implementation complexity.
  2. Pre-specification Difficulty: Unlike methods based on historical data, TMLE can only pre-specify candidate model libraries rather than specific parameters.
  3. Regulatory Considerations: The ability to pre-specify parameters may be viewed as an advantage when collaborating with regulatory agencies.

Future Directions

  1. Hybrid Methods: Combining prognostic scores constructed from historical data with TMLE, as proposed by Liao et al. (2025).
  2. Small Sample Optimization: Historical data becomes more valuable in trials with smaller sample sizes.
  3. Distribution Shift Handling: Robust methods when historical data and current trials exhibit distributional differences.

In-Depth Evaluation

Strengths

  1. Theoretical Contribution: First explicitly establishes theoretical connection between two seemingly different methods, with important methodological value.
  2. Practical Value: Avoids duplicate development and guides researchers to use mature TMLE tools.
  3. Rigorous Proof: Theoretical equivalence proven through algebraic derivation with solid theoretical foundation.
  4. Comprehensive Verification: Simulation studies cover multiple scenarios with sufficient empirical support.
  5. Clear Writing: Paper structure is clear with transparent technical details, easy to understand.

Weaknesses

  1. Limited Innovation: Primarily reveals equivalence of existing methods, lacking substantial methodological innovation.
  2. Application Scope: Analysis limited to 1:1 randomized trial settings; generalization to more complex designs unclear.
  3. Implementation Differences Overlooked: While theoretically equivalent, implementation detail differences may have impact in certain cases.
  4. Incomplete Comparison: Lacks systematic comparison with other advanced covariate adjustment methods.

Impact

  1. Academic Value: Provides important theoretical clarification for statistical methodology, helping avoid conceptual confusion.
  2. Practical Guidance: Provides clear method selection guidance for clinical trial statisticians.
  3. Educational Significance: Facilitates understanding of relationships between different estimation methods in statistical education.

Applicable Scenarios

  1. Method Selection: When historical data is unavailable, researchers can directly use TMLE rather than developing new within-trial methods.
  2. Theoretical Research: Provides theoretical foundation for further research on covariate adjustment methods.
  3. Regulatory Applications: Requires balancing advantages and disadvantages of different methods in regulatory environments requiring pre-specified analysis plans.

References

This paper cites extensive relevant literature, including:

  • Schuler et al. (2022): Original paper on PROCOVA method
  • van der Laan and Rubin (2006): Foundational work on TMLE
  • Tukey (1993): Early source of prognostic adjustment ideas
  • Multiple modern literature on cross-fitting and doubly robust estimation

Overall Assessment: This is a high-quality methodological paper that, while relatively limited in innovation, possesses important value in theoretical clarification and practical guidance. The paper rigorously proves an important equivalence result, facilitating correct understanding and application of related methods in the statistical community.