2025-11-11T12:13:09.699032

Budget-constrained Active Learning to Effectively De-censor Survival Data

Parsaee, Jiang, Friggstad et al.
Standard supervised learners attempt to learn a model from a labeled dataset. Given a small set of labeled instances, and a pool of unlabeled instances, a budgeted learner can use its given budget to pay to acquire the labels of some unlabeled instances, which it can then use to produce a model. Here, we explore budgeted learning in the context of survival datasets, which include (right) censored instances, where we know only a lower bound on an instance's time-to-event. Here, that learner can pay to (partially) label a censored instance -- e.g., to acquire the actual time for an instance [perhaps go from (3 yr, censored) to (7.2 yr, uncensored)], or other variants [e.g., learn about one more year, so go from (3 yr, censored) to either (4 yr, censored) or perhaps (3.2 yr, uncensored)]. This serves as a model of real world data collection, where follow-up with censored patients does not always lead to uncensoring, and how much information is given to the learner model during data collection is a function of the budget and the nature of the data itself. We provide both experimental and theoretical results for how to apply state-of-the-art budgeted learning algorithms to survival data and the respective limitations that exist in doing so. Our approach provides bounds and time complexity asymptotically equivalent to the standard active learning method BatchBALD. Moreover, empirical analysis on several survival tasks show that our model performs better than other potential approaches on several benchmarks.
academic

Budget-constrained Active Learning to Effectively De-censor Survival Data

Basic Information

  • Paper ID: 2510.12144
  • Title: Budget-constrained Active Learning to Effectively De-censor Survival Data
  • Authors: Ali Parsaee, Bei Jiang, Zachary Friggstad, Russell Greiner (University of Alberta)
  • Classification: cs.LG cs.AI
  • Publication Date: October 15, 2025
  • Paper Link: https://arxiv.org/abs/2510.12144

Abstract

This paper explores the problem of budget-constrained active learning on survival datasets. Survival data contains right-censored instances, where we only know a lower bound on the event occurrence time. Learners can pay a budget to (partially) de-censor instances, for example, obtaining the actual time "7.2 years, uncensored" from "(3 years, censored)", or other variants such as "(3 years, censored)" to "(4 years, censored)" or "(3.2 years, uncensored)". This simulates real-world data collection processes where follow-up on censored patients does not always result in de-censoring. The information gained by the learner model during data collection is a function of both budget and data characteristics.

Research Background and Motivation

Problem Definition

  1. Core Problem: How to effectively select censored instances for de-censoring under budget constraints to maximize survival prediction model performance
  2. Practical Significance:
    • High costs of patient follow-up in medical research
    • Additional testing costs in industrial reliability testing
    • Computational costs in algorithm runtime prediction

Limitations of Existing Methods

  1. Traditional Active Learning: Primarily targets classification and regression tasks, neglecting the special characteristics of censored data
  2. Active Learning in Survival Analysis: Limited research with insufficient budget constraint considerations
  3. BatchBALD Limitations:
    • Assumes oracle provides complete label information
    • Does not account for different costs of individual instances
    • Inapplicable to partial de-censoring scenarios

Research Motivation

Real-world data collection is costly, particularly in medical research, industrial testing, and similar domains. Traditional methods overlook budget constraints and the special nature of censored data, necessitating specialized approaches to handle such complex scenarios.

Core Contributions

  1. Formal Definition: First formal definition of the learning problem for de-censoring instances under budget constraints
  2. Algorithm Innovation: Proposes BBsurv algorithm, adapting BatchBALD to handle survival data and varying instance costs
  3. Theoretical Guarantees: Proves the algorithm achieves optimal lower bound (1-1/e) in polynomial time
  4. Comprehensive Evaluation: Conducts extensive experiments on three real survival datasets, demonstrating method robustness
  5. Benchmark Establishment: Provides eight comparison algorithms, establishing evaluation benchmarks for this task

Methodology Details

Task Definition

Input:

  • Probe depth k ∈ ℜ+ (years explored per probe)
  • Budget B ∈ ℜ+
  • Training dataset D = {xi, ti, δi, ci}Li=1, where:
    • xi: covariates
    • ti: time
    • δi: censoring indicator (1 for uncensored, 0 for censored)
    • ci: probe cost

Output: Select instance set F such that ∑j∈F cj ≤ B, maximizing model performance

Model Architecture

1. Bayesian Survival Model

Uses Bayesian Multi-Task Logistic Regression (MTLR) model:

  • Discretizes continuous time into n time intervals {bi}ni=1
  • Outputs multinomial distribution {p(y = bi|x, ω, D)}ni=1
  • Generates individual survival distribution (ISD)

2. BBsurv Algorithm Core

Probability Adjustment Mechanism:

pcens(y = bi|ω) = p(y = bi|ω) / ∑nr=i p(y = br|ω)

Knowable Interval Processing:

  • Identifies "knowable" intervals within probe depth k
  • Merges intervals beyond probe range into single "unknowable" class buk
  • Generates final probability distribution pfinal

3. Acquisition Function

Based on BatchBALD's mutual information computation:

I(y1:b; ω|x1:b, D) = H(y1:b|x1:b, D) - Ep(ω|D,x1:b)[H(y1:b|x1:b, ω, D)]

Technical Innovations

  1. Probe Depth Modeling: Innovatively models partial de-censoring as probe depth concept
  2. Probability Redistribution: Cleverly handles zero probability intervals before censoring time
  3. Budget Optimization: Reduces problem to weighted maximum coverage, solved via greedy algorithm
  4. Unified Framework: Simultaneously handles uniform and non-uniform cost settings

Experimental Setup

Datasets

  1. MIMIC-IV: 38,520 patients, 93 features, 67% censoring rate
  2. NACD: 2,402 patients, 53 features, 36% censoring rate
  3. SUPPORT: 9,105 patients, 42 features, 32% censoring rate

Evaluation Metrics

  • Primary Metric: MAE-PO (Mean Absolute Error with Pseudo Observations)
  • Auxiliary Metrics: C-index, Integrated Brier Score, MAE on uncensored data

Comparison Methods

  1. BatchBALD: Original BatchBALD algorithm
  2. C-BALD: Censoring-aware BALD variant
  3. IDEAL: Inverse Distance-weighted Active Learning
  4. Entropy Sampling: Entropy-based sampling
  5. Variance Sampling: Variance-based sampling
  6. Closest to Half (CtH): Sampling near 0.5 probability
  7. Mean Closest to Middle (MCtM): Mean-middle point sampling
  8. Clusters to form Batches (CfB): Clustering-based batch formation
  9. Random: Random sampling

Implementation Details

  • 10 time intervals (quantile-based partitioning)
  • Bayesian MTLR model with Spike-and-Slab prior
  • 5000 training iterations
  • Artificial censoring ensures non-informative censoring assumption

Experimental Results

Main Results

Table 1 shows MAE-PO results at budget=10:

  • BBsurv significantly outperforms other methods in most settings
  • Performance converges between BBsurv and BatchBALD as probe depth increases
  • Most notable improvements on MIMIC dataset compared to BatchBALD

Key Findings:

  1. Probe Depth Impact: BBsurv advantage maximized at k=5, approaches BatchBALD at k=100
  2. Dataset Differences: Significant improvements on MIMIC and NACD, smaller differences on SUPPORT
  3. Statistical Significance: Achieves p<0.05 significance in most cases

Budget Sensitivity Analysis

Figure 2 shows cross-budget performance:

  • Uniform Cost Setting: BBsurv consistently optimal across budget levels
  • Non-uniform Cost Setting: BBsurv advantage more pronounced, especially at high budgets
  • Cost Handling Advantage: Submodularity of mutual information enables BBsurv to better handle budget constraints

Ablation Studies

Probe Depth Impact:

  • k=5: BBsurv significantly outperforms baselines
  • k=10: Moderate improvements
  • k=100: Performance approaches BatchBALD

Cost Setting Comparison:

  • Uniform costs: Most methods perform similarly
  • Non-uniform costs: BBsurv and BatchBALD significantly outperform other methods

Experimental Findings

  1. Diversity in Selection: PCA visualization shows BBsurv selects more diverse instances
  2. Unexpected CfB Performance: Clustering method performs well in certain settings
  3. Cost Sensitivity: Information-theoretic methods show greater advantage in non-uniform cost settings

Active Learning Field

  1. Batch Active Learning: BatchBALD as SOTA method, but neglects budget and censored data
  2. Uncertainty Sampling: Selects instances with highest model uncertainty
  3. Diversity Methods: Focuses on sample diversity for improved generalization

Active Learning in Survival Analysis

  1. Vinzamuri et al.: Based on Cox proportional hazards model, but without budget constraints
  2. Hüttel et al.: C-BALD method for censored regression
  3. Dedja et al.: Incremental label updates with random probe depth determination

Budget Learning

  1. Lizotte et al.: Budget learning for naive Bayes classifiers
  2. Maximum Coverage Problem: NP-hard combinatorial optimization problem
  3. Greedy Algorithm: Polynomial-time algorithm with (1-1/e) approximation ratio

Conclusions and Discussion

Main Conclusions

  1. Method Effectiveness: BBsurv outperforms existing methods in most settings
  2. Theoretical Guarantees: Algorithm complexity comparable to BatchBALD while providing optimal approximation ratio
  3. Practical Value: Applicable to medical research, industrial testing, and similar real-world scenarios
  4. Robustness: Stable performance across different datasets, budgets, and probe depths

Limitations

  1. Non-informative Censoring Assumption: May not hold in practical applications
  2. Fixed Probe Depth: Does not consider dynamic probe depth adjustment
  3. Discretization Approximation: Time discretization may lose information
  4. Computational Complexity: Greedy algorithm may be slow on large-scale data

Future Directions

  1. Semi-supervised Extension: Combining unlabeled data to improve performance
  2. Informative Censoring: Relaxing non-informative censoring assumption
  3. Dynamic Probing: Adjusting probe depth based on instance characteristics
  4. Improved Approximation: Exploring more efficient maximum coverage approximation schemes

In-depth Evaluation

Strengths

  1. Problem Novelty: First systematic study of de-censoring survival data under budget constraints
  2. Method Rigor:
    • Complete theoretical analysis with complexity and approximation guarantees
    • Clever algorithm design effectively handling partial information acquisition
  3. Experimental Sufficiency:
    • Three real datasets with multiple evaluation metrics
    • Comprehensive baseline comparisons and ablation studies
    • Statistical significance verification
  4. High Practical Value: Addresses real needs in medical, industrial, and related domains

Weaknesses

  1. Assumption Limitations: Non-informative censoring assumption may not hold in practice
  2. Method Constraints:
    • Discretization may lose continuous time information
    • Fixed probe depth lacks flexibility
  3. Experimental Scope:
    • Relatively limited dataset scale
    • Lacks comparison with more SOTA survival analysis methods
  4. Theoretical Analysis: Missing convergence and generalization error analysis

Impact

  1. Academic Contribution:
    • Opens new research direction, expected to inspire follow-up work
    • Theoretical framework extensible to other incomplete information learning problems
  2. Practical Value:
    • Direct application to clinical trial design
    • Applicable to industrial quality control and reliability testing
  3. Method Generality: Framework adaptable to other active learning algorithms

Applicable Scenarios

  1. Medical Research: Patient follow-up, clinical trial design
  2. Industrial Applications: Product lifetime testing, failure prediction
  3. Algorithm Analysis: Runtime prediction, performance evaluation
  4. Financial Domain: Credit risk assessment, default prediction

References

The paper cites 41 related references, primarily including:

  • Original BatchBALD paper (Kirsch et al., 2019)
  • Classical survival analysis textbooks (Kleinbaum & Klein, 2012)
  • Maximum coverage problem research (Khuller et al., 1999)
  • Bayesian survival models (Qi et al., 2023)
  • Related active learning work (Vinzamuri et al., 2014; Hüttel et al., 2024)

Overall Assessment: This is a high-quality machine learning paper that innovatively addresses active learning for survival data under budget constraints. The method design is clever, theoretical analysis rigorous, and experimental validation comprehensive. While certain assumption limitations exist, it provides effective solutions to important practical applications with significant academic value and practical significance.