2025-11-19T15:49:13.925681

Myopic Bayesian Decision Theory for Batch Active Learning with Partial Batch Label Sampling

Hu, Mussmann

Over the past couple of decades, many active learning acquisition functions have been proposed, leaving practitioners with an unclear choice of which to use. Bayesian Decision Theory (BDT) offers a universal principle to guide decision-making. In this work, we derive BDT for (Bayesian) active learning in the myopic framework, where we imagine we only have one more point to label. This derivation leads to effective algorithms such as Expected Error Reduction (EER), Expected Predictive Information Gain (EPIG), and other algorithms that appear in the literature. Furthermore, we show that BAIT (active learning based on V-optimal experimental design) can be derived from BDT and asymptotic approximations. A key challenge of such methods is the difficult scaling to large batch sizes, leading to either computational challenges (BatchBALD) or dramatic performance drops (top-$B$ selection). Here, using a particular formulation of the decision process, we derive Partial Batch Label Sampling (ParBaLS) for the EPIG algorithm. We show experimentally for several datasets that ParBaLS EPIG gives superior performance for a fixed budget and Bayesian Logistic Regression on Neural Embeddings. Our code is available at https://github.com/ADDAPT-ML/ParBaLS.

academic

Myopic Bayesian Decision Theory for Batch Active Learning with Partial Batch Label Sampling

Basic Information

Paper ID: 2510.09877
Title: Myopic Bayesian Decision Theory for Batch Active Learning with Partial Batch Label Sampling
Authors: Kangping Hu, Stephen Mussmann (Georgia Institute of Technology)
Classification: cs.LG cs.AI stat.ML
Publication Date: October 10, 2025 (Preprint)
Paper Link: https://arxiv.org/abs/2510.09877v1

Abstract

Over the past decades, numerous active learning acquisition functions have been proposed, yet practitioners often struggle to select appropriate methods. Bayesian Decision Theory (BDT) provides universal principles to guide decision-making. This paper derives BDT for (Bayesian) active learning under a myopic framework, assuming only a single additional data point needs to be labeled. This derivation yields effective algorithms such as Expected Error Reduction (EER) and Expected Predictive Information Gain (EPIG). Furthermore, the authors prove that BAIT can be derived through BDT and asymptotic approximations. A key challenge for these methods is scalability to large batch sizes, leading to computational challenges (BatchBALD) or sharp performance degradation (top-B selection). This paper derives a Partial Batch Label Sampling (ParBaLS) method for the EPIG algorithm through a specific decision process formulation. Experiments demonstrate that ParBaLS EPIG performs excellently across multiple datasets under fixed budgets and Bayesian logistic regression settings with neural embeddings.

Research Background and Motivation

Problem Definition

Active learning aims to select the most informative data from large unlabeled datasets for annotation, maximizing model performance under limited annotation budgets. Existing methods include heuristic and probabilistic approaches, but lack explicit selection principles.

Problem Significance

Practical Demand: In modern machine learning, data is typically annotated in batches rather than individually
Method Selection Difficulty: Existing algorithms lack interpretability, making it difficult for practitioners to determine when and which algorithms are effective
Scalability Challenges: Existing methods face computational or performance issues at large batch sizes

Limitations of Existing Methods

Top-B Selection: Ignores dependencies between batch labels, potentially selecting redundant samples
Heuristic Diversity: Requires dataset-specific hyperparameter tuning, infeasible in active learning
Greedy Batch Acquisition: Methods like BatchBALD have exponential computational complexity with batch size

Research Motivation

Provide a unified theoretical framework through Bayesian Decision Theory to explain how existing algorithms work and propose new methods that effectively handle batch selection.

Core Contributions

Theoretical Unification: Unifies multiple algorithms (EER, EPIG, BAIT, etc.) as derivations from Myopic Bayesian Decision Theory (MBDT)
Novel Method: Introduces Partial Batch Label Sampling (ParBaLS) to address batch active learning challenges
Theoretical Analysis: Proves ParBaLS Monte Carlo approximation error is O(1/√m), independent of batch size
Experimental Validation: Validates superior performance of ParBaLS EPIG across 10 different settings

Methodology Details

Task Definition

Given input domain X, output domain Y, and unlabeled pool dataset D⊂X, the goal is to iteratively select T batches S⊂D, each of size |S|=B for annotation, minimizing test loss after training on the labeled set.

Myopic Bayesian Decision Theory (MBDT)

Single-Point Selection Derivation

Under the myopic framework, assuming selection of a single additional data point x̂, the next labeled point is:

argmin_{x̂∈D} E_{ŷ~Y_{x̂}|L} [min_{P∈Δ^{|V|}_Y} E_{y⃗~Y_V|Y_{x̂}=ŷ,L} [∑_{j=1}^{|V|} ℓ(y_j, P_j)]]

For negative log-likelihood loss, the optimal prediction is the posterior distribution, and expected loss simplifies to entropy:

argmax_{x̂∈D} ∑_{x∈V} I(Y_x; Y_{x̂}|L)

This is equivalent to EPIG and EER algorithms.

Batch Selection Challenges

Existing batch strategies fall into three categories:

Top-B: Selects B highest-scoring points, ignoring dependencies
Heuristic Diversity: Adds randomness or diversity, requiring hyperparameter tuning
Greedy Batch Acquisition: Optimizes entire batch, high computational complexity

ParBaLS Method

Core Idea

Introduces a partially committed batch S with unobserved labels; the next optimal point is:

argmax_{x̂∈D} E_{y_S~Y_S|L} [∑_{x∈V} I(Y_x; Y_{x̂}|Y_S = y_S, L)]

Monte Carlo Estimation

Uses Monte Carlo estimation to handle exponential summations:

argmax_{x̂∈D} (1/m) ∑_{i=1}^m ∑_{x∈V} I(Y_x; Y_{x̂}|Y_S = y_S^{(i)}, L)

Algorithm Flow

ParBaLS builds batches incrementally:

Initialize empty batch S=∅
Train Bayesian model M_L
Sample m pseudo-label versions y^{(i)}~Y_D|L
For each batch position:
- Compute EPIG score for each candidate point
- Select highest-scoring point for batch
- Update m parallel models with pseudo-labels
Return complete batch

BAIT Derivation

Through informal asymptotic approximation, BAIT can also be derived from MBDT principles:

Tr([∇²ℓ_{L∪S}(ŵ_L)]^{-1}∇²ℓ_D(ŵ_L))

Experimental Setup

Datasets

Experiments cover 6 dataset categories:

Tabular Data: Airline Passenger Satisfaction, Credit Card Fraud
Standard Image Data: CIFAR-10, CIFAR-100
Real-World Image Data: iWildCam, fMoW (from WILDS benchmark)
One-vs-Many Image Data: Multi-class converted to binary imbalanced scenarios
Subgroup Shift Image Data: Three-class settings, tested only on first two classes

Model Setup

Image Data: Fixed embedding models (CLIP-ViT-B/32 for WILDS, DINOv2-ViT-S/14 for CIFAR)
Tabular Data: Direct application of Bayesian logistic regression
Bayesian Setup: k=400 posterior parameter samples, NUTS sampler

Evaluation Metrics

Test accuracy as primary evaluation metric

Comparison Methods

Bayesian Methods: EPIG, BALD (with top-B or Gumbel noise)
Baseline Methods: Random, Confidence, BatchBALD
Proposed Methods: ParBaLS-MAP EPIG, ParBaLS EPIG

Experimental Parameters

T=10 iterations, B=10 samples per iteration budget
Initial random sampling of 500 samples
For some settings: B=20, initial 100 samples for increased discrimination
5 runs with different seeds per setting

Experimental Results

Main Results

According to comprehensive results in Table 1, ParBaLS EPIG achieves best performance in 9 out of 10 settings:

Algorithm	Highest Mean	Top Performers
ParBaLS EPIG	4	9
ParBaLS-MAP EPIG	2	7
SoftRankEPIG	0	4
EPIG	0	4
Confidence	3	5

Specific Performance

Tabular Datasets (most prominent):

Airline Passenger Satisfaction: ParBaLS EPIG achieves 89.42±0.41%
Credit Card Fraud: ParBaLS EPIG achieves 93.55±0.23%

Subgroup Shift Settings (most challenging):

fMoW: ParBaLS EPIG achieves 31.37±6.60%, significantly outperforming other methods
iWildCam: ParBaLS EPIG achieves 84.72±1.98%

Learning Curve Analysis

Figure 2 shows that on tabular datasets, ParBaLS methods maintain advantages throughout the learning process, with particularly pronounced improvements in low-budget settings.

Ablation Studies

ParBaLS vs ParBaLS-MAP: Full ParBaLS typically outperforms MAP-only version
Batch Size Impact: ParBaLS advantages more pronounced at larger batch sizes (B=20)
Single-Point vs Batch: Appendix experiments show single-point selection (B=1) performs better, but batch selection is more efficient in practice

Active Learning Method Classification

Heuristic Methods: Based on uncertainty (Confidence, Margin, Entropy), diversity (CORESET), or both (BADGE, GALAXY)
Probabilistic Methods: BALD, BatchBALD, BAIT based on information theory or Bayesian principles

Expected Error Reduction (EER)

EER directly focuses on performance metrics like zero-one loss and log-likelihood, providing better interpretability. Related work includes variants combining heuristic methods and adaptive approaches for low-budget scenarios.

Pseudo-Labels in Active Learning

Unlike semi-supervised learning, pseudo-labels in active learning primarily serve:

Training Enhancement: Training with both real and pseudo-labels
Batch Construction: ParBaLS innovation uses pseudo-labels only for temporary batch construction, avoiding contamination of final labeled data

Conclusions and Discussion

Main Conclusions

Theoretical Unification: MBDT provides unified theoretical foundation for multiple active learning algorithms
Batch Solution: ParBaLS effectively addresses scalability issues in batch active learning
Experimental Validation: ParBaLS EPIG performs excellently across settings, particularly suitable for high-uncertainty scenarios

Limitations

Computational Complexity: ParBaLS time complexity is O(TBm), with m parallel models increasing computational burden
Method Applicability: Primarily validated on Bayesian logistic regression; extension to deep networks requires further research
Theoretical Analysis: BAIT derivation relies on informal asymptotic approximations; theoretical rigor needs strengthening

Future Directions

Computational Efficiency: Discover computationally efficient approximations, extend to larger datasets and models
Deep Learning Integration: Research extension of ParBaLS to complete deep neural network training
Theory Refinement: Provide more rigorous theoretical analysis and convergence guarantees

In-Depth Evaluation

Strengths

Theoretical Contribution: Provides unified theoretical framework for active learning algorithms, enhancing interpretability
Practical Value: ParBaLS addresses batch selection problems in real applications
Comprehensive Experiments: Covers multiple data types and challenging settings with convincing results
Method Innovation: Novel application of pseudo-labels in batch construction

Weaknesses

Computational Overhead: Maintenance of m parallel models increases computational cost
Theoretical Rigor: Some derivations (e.g., BAIT) rely on informal approximations
Experimental Limitations: Primarily validated on relatively simple models (logistic regression)
Hyperparameter Sensitivity: Insufficient analysis of m selection trade-offs between performance and computation

Impact

Theoretical Impact: Provides new theoretical perspective for active learning, potentially inspiring future research
Practical Value: ParBaLS method has direct application value, especially in batch annotation scenarios
Reproducibility: Open-source code provided for easy reproduction and extension

Applicable Scenarios

High-Uncertainty Tasks: Tabular data and subgroup shift scenarios with irreducible uncertainty
Batch Annotation Requirements: Real applications requiring batch rather than individual annotation
Bayesian Settings: Models and tasks capable of Bayesian inference

References

This paper cites important literature in active learning, including:

Classical uncertainty sampling methods (Lewis, 1995)
Bayesian active learning methods (Houlsby et al., 2011; Gal et al., 2017)
Batch active learning methods (Kirsch et al., 2019, 2023)
Expected error reduction methods (Roy and McCallum, 2001; Mussmann et al., 2022)

Overall Assessment: This is an important paper with significant theoretical and practical value in active learning. By unifying existing algorithms through MBDT and proposing ParBaLS to address batch selection problems, it provides new research directions for the field. While improvements in computational efficiency and theoretical rigor are possible, its contributions are substantial.