2025-11-25T20:16:18.201421

The Initial Screening Order Problem

Alvarez, Mastropietro, Ruggieri

We investigate the role of the initial screening order (ISO) in candidate screening. The ISO refers to the order in which the screener searches the candidate pool when selecting $k$ candidates. Today, it is common for the ISO to be the product of an information access system, such as an online platform or a database query. The ISO has been largely overlooked in the literature, despite its impact on the optimality and fairness of the selected $k$ candidates, especially under a human screener. We define two problem formulations describing the search behavior of the screener given an ISO: the best-$k$, where it selects the top $k$ candidates; and the good-$k$, where it selects the first good-enough $k$ candidates. To study the impact of the ISO, we introduce a human-like screener and compare it to its algorithmic counterpart, where the human-like screener is conceived to be inconsistent over time. Our analysis, in particular, shows that the ISO, under a human-like screener solving for the good-$k$ problem, hinders individual fairness despite meeting group fairness, and hampers the optimality of the selected $k$ candidates. This is due to position bias, where a candidate's evaluation is affected by its position within the ISO. We report extensive simulated experiments exploring the parameters of the best-$k$ and good-$k$ problems for both screeners. Our simulation framework is flexible enough to account for multiple candidate screening tasks, being an alternative to running real-world procedures.

academic

The Initial Screening Order Problem

基本信息

论文ID: 2307.15398
标题: The Initial Screening Order Problem
作者: Jose M. Alvarez (KU Leuven), Antonio Mastropietro (University of Pisa), Salvatore Ruggieri (University of Pisa)
分类: cs.LG cs.CY
发表时间: 2023年7月（arXiv预印本，2025年1月更新）
论文链接: https://arxiv.org/abs/2307.15398

摘要

本文研究初始筛选顺序（Initial Screening Order, ISO）在候选人筛选中的作用。ISO指的是筛选者在选择k个候选人时搜索候选池的顺序。当前，ISO通常由信息访问系统（如在线平台或数据库查询）产生。尽管ISO对所选k个候选人的最优性和公平性有重要影响（特别是在人工筛选者情况下），但文献中很大程度上忽略了这一点。作者定义了两个问题表述来描述给定ISO下筛选者的搜索行为：best-k问题（选择前k个最佳候选人）和good-k问题（选择前k个足够好的候选人）。为研究ISO的影响，作者引入了人类型筛选者并与算法筛选者进行比较，其中人类型筛选者被设计为随时间变化而不一致。分析表明，在人类型筛选者求解good-k问题时，ISO会阻碍个体公平性（尽管满足群体公平性），并损害所选k个候选人的最优性。这是由于位置偏差造成的，即候选人的评估受其在ISO中位置的影响。

实践需求：基于与欧洲财富全球500强公司G的合作经验，作者发现了五个关键实践模式：
- G1: 筛选者选择不同的ISO
- G2: 存在完全搜索和部分搜索两种方式
- G3: 关注满足最低基本要求的候选人
- G4: 存在多样性表示配额的公平性目标
- G5: 每个候选人评估时间约为一分钟
理论缺口：现有文献主要关注ISO的创建（作为公平集合选择或排序问题），但很少研究筛选者如何使用ISO，特别是人工筛选者的行为。
公平性关切：位置偏差可能导致相似候选人因在ISO中的位置不同而受到不同对待，违反个体公平性原则。

核心贡献

首次形式化ISO问题：将ISO作为集合选择问题中的关键参数，定义了best-k和good-k两种搜索行为的问题表述。
引入人类型筛选者模型：提出了考虑疲劳效应的人类型筛选者，并与算法筛选者进行理论和实验比较。
提供灵活的仿真工具：开发了能够研究ISO问题的仿真框架，可以在不需要运行真实筛选场景的情况下为实践者提供指导。
揭示位置偏差的公平性影响：证明了ISO在人类型筛选者下会导致个体公平性违反，同时仍满足群体公平性约束。

argmax_{S^k ∈ [C]^k} U^k_add(S^k, θ) s.t. f(S^k) ≥ q

其中效用函数定义为：

U^k_add(S^k, θ) = Σ_{c∈S^k} s(X_c)

Good-k问题

筛选者寻找满足最低要求ψ的前k个候选人，允许部分搜索：

argmax_{S^k ∈ [C]^k} U^k_ψ(S^k, θ) s.t. f(S^k) ≥ q

其中效用函数定义为：

U^k_ψ(S^k, θ) = {
  k - Σ_{c∈S^k} p(c, S^k, θ)  if ∀c ∈ S^k, s(X_c) ≥ ψ
  0                            otherwise
}

惩罚函数p(c, S^k, θ)衡量选择候选人c的"浪费努力"。

s_h_h(X_c) + ε

其中ε是依赖于累积疲劳的随机变量，考虑两种建模选择：

ε₁ ~ N(0, v(Φ(t-1)))：方差随疲劳增加
ε₂ ~ N(μ(Φ(t-1)), v(Φ(t-1)))：均值随疲劳递减

搜索算法

ExaminationSearch（算法1）：解决best-k问题，按得分降序搜索
CascadeSearch（算法2）：解决good-k问题，按ISO顺序搜索
对应的人类型版本（算法3-4）：加入疲劳效应

对称分布：μ=0.5, σ=0.02（顶级候选人概率很低）
非对称分布：μ=0.8, σ=0.05（顶级候选人概率较高）
递增分布：μ=1, σ=0.05（顶级候选人概率最高）

ISO设置

θ ⊥⊥ s：ISO与个体得分独立（随机或字母顺序）
θ ⊥̸⊥ s：ISO与得分相关，相关系数ρ ∈ {-1, -0.8, -0.5}

实验参数

候选池大小：n = 120, 400, 30
选择数量：k = 6, 20
配额：q = 0.5
保护群体比例：pr = 0.2
最低要求：ψ ∈ 0.3, 0.8

评价指标

基准比率（RtB）：相对于基准解的效用比率
Jaccard相似度（JdS）：候选人重叠比例

得分分布影响：
- 对称分布下，随着ψ增加，good-k逐渐接近best-k
- 非对称和递增分布下，即使ψ较大，good-k也难以达到best-k的性能
ISO相关性影响：
- ρ = -1时（完全负相关），good-k与best-k性能相同
- ρ = -0.5时，good-k已能很好近似best-k
规模效应：
- 较大的k/n比率使good-k更好地近似best-k
- ISO影响随k/n增加而减弱