Title: Active Learning of General Halfspaces: Label Queries vs Membership Queries
Authors: Ilias Diakonikolas (University of Wisconsin-Madison), Daniel M. Kane (University of California, San Diego), Mingchen Ma (University of Wisconsin-Madison)
This paper investigates the problem of learning general (non-homogeneous) halfspaces over Gaussian distributions in Rd, considering two query access models. In the classical pool-based active learning model, algorithms can perform adaptive label queries on pre-sampled points. The authors establish strong information-theoretic lower bounds that preclude non-trivial improvements over passive settings. Specifically, any active learner requires Ω~(d/(log(m)ϵ)) label complexity, where m is the number of unlabeled samples. To surpass the passive learning label complexity of O~(d/ϵ), active learners require 2poly(d) unlabeled samples. On the positive side, the authors demonstrate that this lower bound can be circumvented through membership query access, even in the agnostic model. Specifically, they provide a computationally efficient learner with query complexity O~(min{1/p,1/ϵ}+d⋅polylog(1/ϵ)), achieving error guarantee O(opt)+ϵ.
This paper studies the problem of learning general halfspaces under Gaussian distributions. Halfspaces (or linear threshold functions, LTFs) are functions of the form h(x)=sign(w⋅x+t), where w∈Sd−1 is a weight vector and t is a threshold. When t=0, these are called homogeneous halfspaces.
Theoretical Gap: For homogeneous halfspaces, active learning is known to achieve O(dlog(1/ϵ)) label complexity, but whether similar improvements exist for general halfspaces remains an open question.
Practical Importance: Halfspace learning is a classical problem in machine learning with significant impact from the perceptron algorithm to SVMs and AdaBoost.
Query Model Comparison: The capability differences between active learning (label queries) and membership queries require deeper understanding.
Strong Information-Theoretic Lower Bounds: Proves that any active learning algorithm requires Ω~(d/(log(m)ϵ)) label complexity, where m is the number of unlabeled samples
Membership Query Upper Bounds: Provides a computationally efficient algorithm with query complexity O~(min{1/p,1/ϵ}+d⋅polylog(1/ϵ))
Model Separation: Establishes a strong separation between active learning and membership query models
Complexity Characterization: Completely characterizes the complexity of learning general halfspaces under Gaussian marginal distributions
Input: Access to labeled function y(x):Rd→{±1} with target distribution N(0,I)Output: Halfspace h^(x)=sign(w^⋅x+t^)Objective: Minimize error rate err(h^)=Prx∼N(0,I)(h^(x)=y(x))
If a small number of queries can learn a halfspace with error rate p/2, then by randomly partitioning the sample set, one can use the first part to learn the halfspace and the second part to find d negative samples with O(d) expected queries.
Lemma 2.1: If there exists an active learning algorithm that uses r label queries to learn a halfspace with bias p to error rate p/2, then there exists an algorithm using r+O(d) queries to find d negative samples from 2m samples.
Lemma 2.2: For matrix A∈Rk×d, if ∥AAT−dI∥2≤O(d/(t∗)2), then the probability that a random halfspace labels all k samples as negative is at most O(plog(1/p))k.
Theorem 1.1: For any active learning algorithm A, there exists a halfspace h∗ with bias p such that if A performs fewer than Ω~(d/(plog(m))) label queries on m i.i.d. Gaussian samples, then with probability at least 2/3 it outputs a hypothesis with error rate exceeding p/2.
Corollary: To surpass passive learning's O~(d/ϵ) complexity, 2poly(d) unlabeled samples are required.
Theorem 1.2: There exists an algorithm using O~(min{1/p,1/ϵ}+d⋅polylog(1/ϵ)) membership queries with runtime poly(d,M), outputting a hypothesis with error rate at most O(opt)+ϵ.
This paper represents an important contribution to active learning theory, revealing through rigorous mathematical analysis the fundamental differences between different query models, establishing a solid foundation for theoretical development in this field.