Myopic Bayesian Decision Theory for Batch Active Learning with Partial Batch Label Sampling
Hu, Mussmann
Over the past couple of decades, many active learning acquisition functions have been proposed, leaving practitioners with an unclear choice of which to use. Bayesian Decision Theory (BDT) offers a universal principle to guide decision-making. In this work, we derive BDT for (Bayesian) active learning in the myopic framework, where we imagine we only have one more point to label. This derivation leads to effective algorithms such as Expected Error Reduction (EER), Expected Predictive Information Gain (EPIG), and other algorithms that appear in the literature. Furthermore, we show that BAIT (active learning based on V-optimal experimental design) can be derived from BDT and asymptotic approximations. A key challenge of such methods is the difficult scaling to large batch sizes, leading to either computational challenges (BatchBALD) or dramatic performance drops (top-$B$ selection). Here, using a particular formulation of the decision process, we derive Partial Batch Label Sampling (ParBaLS) for the EPIG algorithm. We show experimentally for several datasets that ParBaLS EPIG gives superior performance for a fixed budget and Bayesian Logistic Regression on Neural Embeddings. Our code is available at https://github.com/ADDAPT-ML/ParBaLS.
academic
Myopic Bayesian Decision Theory for Batch Active Learning with Partial Batch Label Sampling
Over the past decades, numerous active learning acquisition functions have been proposed, yet practitioners often struggle to select appropriate methods. Bayesian Decision Theory (BDT) provides universal principles to guide decision-making. This paper derives BDT for (Bayesian) active learning under a myopic framework, assuming only a single additional data point needs to be labeled. This derivation yields effective algorithms such as Expected Error Reduction (EER) and Expected Predictive Information Gain (EPIG). Furthermore, the authors prove that BAIT can be derived through BDT and asymptotic approximations. A key challenge for these methods is scalability to large batch sizes, leading to computational challenges (BatchBALD) or sharp performance degradation (top-B selection). This paper derives a Partial Batch Label Sampling (ParBaLS) method for the EPIG algorithm through a specific decision process formulation. Experiments demonstrate that ParBaLS EPIG performs excellently across multiple datasets under fixed budgets and Bayesian logistic regression settings with neural embeddings.
Active learning aims to select the most informative data from large unlabeled datasets for annotation, maximizing model performance under limited annotation budgets. Existing methods include heuristic and probabilistic approaches, but lack explicit selection principles.
Practical Demand: In modern machine learning, data is typically annotated in batches rather than individually
Method Selection Difficulty: Existing algorithms lack interpretability, making it difficult for practitioners to determine when and which algorithms are effective
Scalability Challenges: Existing methods face computational or performance issues at large batch sizes
Provide a unified theoretical framework through Bayesian Decision Theory to explain how existing algorithms work and propose new methods that effectively handle batch selection.
Given input domain X, output domain Y, and unlabeled pool dataset D⊂X, the goal is to iteratively select T batches S⊂D, each of size |S|=B for annotation, minimizing test loss after training on the labeled set.
Figure 2 shows that on tabular datasets, ParBaLS methods maintain advantages throughout the learning process, with particularly pronounced improvements in low-budget settings.
EER directly focuses on performance metrics like zero-one loss and log-likelihood, providing better interpretability. Related work includes variants combining heuristic methods and adaptive approaches for low-budget scenarios.
Bayesian active learning methods (Houlsby et al., 2011; Gal et al., 2017)
Batch active learning methods (Kirsch et al., 2019, 2023)
Expected error reduction methods (Roy and McCallum, 2001; Mussmann et al., 2022)
Overall Assessment: This is an important paper with significant theoretical and practical value in active learning. By unifying existing algorithms through MBDT and proposing ParBaLS to address batch selection problems, it provides new research directions for the field. While improvements in computational efficiency and theoretical rigor are possible, its contributions are substantial.