2025-11-10T02:33:59.960416

Active Learning of General Halfspaces: Label Queries vs Membership Queries

Diakonikolas, Kane, Ma

We study the problem of learning general (i.e., not necessarily homogeneous) halfspaces under the Gaussian distribution on $R^d$ in the presence of some form of query access. In the classical pool-based active learning model, where the algorithm is allowed to make adaptive label queries to previously sampled points, we establish a strong information-theoretic lower bound ruling out non-trivial improvements over the passive setting. Specifically, we show that any active learner requires label complexity of $\tildeÎ©(d/(\log(m)Îµ))$, where $m$ is the number of unlabeled examples. Specifically, to beat the passive label complexity of $\tilde{O} (d/Îµ)$, an active learner requires a pool of $2^{poly(d)}$ unlabeled samples. On the positive side, we show that this lower bound can be circumvented with membership query access, even in the agnostic model. Specifically, we give a computationally efficient learner with query complexity of $\tilde{O}(\min\{1/p, 1/Îµ\} + d\cdot polylog(1/Îµ))$ achieving error guarantee of $O(opt)+Îµ$. Here $p \in [0, 1/2]$ is the bias and $opt$ is the 0-1 loss of the optimal halfspace. As a corollary, we obtain a strong separation between the active and membership query models. Taken together, our results characterize the complexity of learning general halfspaces under Gaussian marginals in these models.

academic

Active Learning of General Halfspaces: Label Queries vs Membership Queries

基本信息

论文ID: 2501.00508
标题: Active Learning of General Halfspaces: Label Queries vs Membership Queries
作者: Ilias Diakonikolas (University of Wisconsin-Madison), Daniel M. Kane (University of California, San Diego), Mingchen Ma (University of Wisconsin-Madison)
分类: cs.LG (Machine Learning)
提交时间: 2024年12月31日
论文链接: https://arxiv.org/abs/2501.00508

摘要

本文研究在高斯分布 $\mathbb{R}^d$ 上学习一般（非齐次）半空间的问题，考虑了两种查询访问模式。在经典的基于池的主动学习模型中，算法可以对预先采样的点进行自适应标签查询，作者建立了强信息论下界，排除了相对于被动设置的非平凡改进。具体地，任何主动学习器都需要 $\tilde{\Omega}(d/(\log(m)\epsilon))$ 的标签复杂度，其中 $m$ 是未标记样本数量。要超越被动学习的 $\tilde{O}(d/\epsilon)$ 标签复杂度，主动学习器需要 $2^{\text{poly}(d)}$ 个未标记样本。在积极方面，作者证明了通过成员查询访问可以规避这一下界，即使在不可知模型中也是如此。具体地，给出了查询复杂度为 $\tilde{O}(\min\{1/p, 1/\epsilon\} + d \cdot \text{polylog}(1/\epsilon))$ 的计算高效学习器，实现了 $O(\text{opt})+\epsilon$ 的错误保证。

研究背景与动机

问题定义

本文研究在高斯分布下学习一般半空间的问题。半空间（或线性阈值函数LTF）是形如 $h(x) = \text{sign}(w \cdot x + t)$ 的函数，其中 $w \in S^{d-1}$ 是权重向量， $t$ 是阈值。当 $t=0$ 时称为齐次半空间。

研究动机

理论差距：对于齐次半空间，已知主动学习可以实现 $O(d\log(1/\epsilon))$ 的标签复杂度，但对于一般半空间，是否存在类似的改进仍是开放问题。
实际重要性：半空间学习是机器学习的经典问题，从感知机算法到SVM和AdaBoost都有重要影响。
查询模型比较：主动学习（标签查询）与成员查询的能力差异需要深入理解。

现有方法局限性

对于有偏差 $p$ 的一般半空间，需要至少 $1/p$ 个标记样本才能看到小类的第一个点
现有信息论下界为 $\Omega(\min\{1/p, 1/\epsilon\} + d\log(1/\epsilon))$
缺乏对主动学习与成员查询模型差异的严格刻画

核心贡献

强信息论下界：证明了任何主动学习算法都需要 $\tilde{\Omega}(d/(\log(m)\epsilon))$ 的标签复杂度，其中 $m$ 是未标记样本数
成员查询上界：提供了查询复杂度为 $\tilde{O}(\min\{1/p, 1/\epsilon\} + d \cdot \text{polylog}(1/\epsilon))$ 的计算高效算法
模型分离：建立了主动学习与成员查询模型之间的强分离
复杂度刻画：完全刻画了在高斯边际分布下学习一般半空间的复杂度

方法详解

任务定义

输入：访问标记函数 $y(x): \mathbb{R}^d \to \{\pm 1\}$ ，目标分布为 $\mathcal{N}(0,I)$ 输出：半空间 $\hat{h}(x) = \text{sign}(\hat{w} \cdot x + \hat{t})$ 目标：最小化错误率 $\text{err}(\hat{h}) = \Pr_{x \sim \mathcal{N}(0,I)}(\hat{h}(x) \neq y(x))$

下界证明策略

核心思想

如果能用少量查询学习到错误率 $p/2$ 的半空间，那么可以通过随机划分样本集合，用第一部分学习半空间，用第二部分以 $O(d)$ 期望查询找到 $d$ 个负样本。

关键引理

引理2.1：如果存在主动学习算法能用 $r$ 次标签查询学习偏差 $p$ 的半空间至错误率 $p/2$ ，那么存在算法用 $r+O(d)$ 次查询从 $2m$ 个样本中找到 $d$ 个负样本。

引理2.2：对于矩阵 $A \in \mathbb{R}^{k \times d}$ ，如果 $\|AA^T - dI\|_2 \leq O(d/(t^*)^2)$ ，那么随机半空间将所有 $k$ 个样本标记为负的概率至多为 $O(p\log(1/p))^k$ 。

上界算法设计

整体框架（算法1）

偏差估计：用 $\tilde{O}(\min\{1/p, 1/\epsilon\})$ 次查询估计偏差 $p$
阈值网格：构建阈值网格 $\{t_0, t_1, \ldots, t_\psi\}$ ，间隔为 $1/(2\log(1/\epsilon))$
初始化与精化：对每个网格点运行初始化和精化算法
候选选择：用锦标赛方法从候选假设中选择最佳

精化算法（算法3）

使用投影梯度下降方法：

梯度构造： $G_i := \text{proj}_{w_i^{\perp}} zy(A_i^{1/2}z - \tilde{t}w_i)$
更新规则： $w_{i+1} = \text{proj}_{S^{d-1}}(w_i + \mu_i\hat{g}_i)$
定位技术：通过二分搜索找到正确的 $\tilde{t}$

关键引理3.1：如果梯度估计满足一定条件，则 $\sin(\theta_{i+1}/2) \leq (1-1/C_2)\sigma_i$

初始化算法（算法2）

使用标签平滑技术：

平滑标签： $\tilde{y}(x) := y(\sqrt{1-\rho^2}x + \rho z)$ ，其中 $z \sim \mathcal{N}(0,I)$
Chow参数估计：估计 $\mathbb{E}[z\tilde{y}(x_0)]$ 来获得 $w^*$ 的方向

实验设置

理论分析框架

本文主要是理论工作，通过数学证明建立复杂度界限，而非实证实验。

分析工具

信息论方法：Yao极小极大原理
几何分析：高维球面上的集中现象
概率工具：高斯分布的尾部界限和集中不等式

初始化： $\tilde{O}(1/p + d\log(1/\epsilon))$ 次查询
精化： $\tilde{O}(d \cdot \text{polylog}(1/\epsilon))$ 次查询
总复杂度： $\tilde{O}(\min\{1/p, 1/\epsilon\} + d \cdot \text{polylog}(1/\epsilon))$