Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits
Xu, Liu, Mattei et al.
We propose a multi-agent multi-armed bandit (MA-MAB) framework aimed at ensuring fair outcomes across agents while maximizing overall system performance. A key challenge in this setting is decision-making under limited information about arm rewards. To address this, we introduce a novel probing framework that strategically gathers information about selected arms before allocation. In the offline setting, where reward distributions are known, we leverage submodular properties to design a greedy probing algorithm with a provable performance bound. For the more complex online setting, we develop an algorithm that achieves sublinear regret while maintaining fairness. Extensive experiments on synthetic and real-world datasets show that our approach outperforms baseline methods, achieving better fairness and efficiency.
academic
Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits
This paper proposes a multi-agent multi-armed bandit (MA-MAB) framework designed to ensure fairness among agents while maximizing overall system performance. To address decision-making challenges with limited arm reward information, a novel probing mechanism is introduced that strategically collects information about selected arms before allocation. In the offline setting (known reward distributions), a greedy probing algorithm with constant-factor approximation guarantees is designed by leveraging submodularity. In the online setting, an algorithm is developed that achieves sublinear regret while maintaining fairness. Extensive experiments on synthetic and real-world datasets demonstrate that the approach outperforms baseline methods in both fairness and efficiency.
Traditional multi-armed bandit problems typically optimize utilitarian welfare (i.e., sum of total rewards), but this leads to severe fairness issues. For example, in ride-sharing scenarios, when a central dispatch system allocates drivers to different geographic regions, optimizing total revenue may result in some drivers receiving no orders at all, causing "starvation" phenomena.
This paper bridges the gap between fair multi-agent approaches and active exploration strategies by introducing a probing mechanism and Nash Social Welfare (NSW) objectives, enabling information acquired through active exploration to directly translate into fair outcomes for all agents.
Novel Framework: Extends the MA-MAB framework by introducing a probing mechanism to test selected arms before allocation, ensuring fairness through NSW optimization
Offline Algorithm: Develops a simple yet effective greedy probing strategy with provable performance guarantees for known reward patterns
Online Algorithm: Proposes an algorithm balancing exploration and fairness, proving that probing and fair allocation do not compromise asymptotic performance (sublinear regret)
Experimental Validation: Demonstrates superiority over baselines in both fairness and efficiency on synthetic and real data
Input: Distributions {F_{m,a}}, cost function α(·), budget I, parameter ζ≥1
Output: Probing set S_pr
1. Initialize S_0 = ∅
2. For i = 1 to I:
- Select arm with maximum marginal gain:
a = argmax_{a∉S_{i-1}} [f_upper(S_{i-1}∪{a}) - f_upper(S_{i-1})]
- S_i = S_{i-1} ∪ {a}
3. Sort candidate sets by (1-α(|S_j|))f_upper(S_j) in descending order
4. Iterate through sorted sets:
- If (1-α(|S_j|))f_upper(S_j) < h(∅), return ∅
- If (1-α(|S_j|))f_upper(S_j) > ζR(S_j), continue
- Otherwise return S_j
Theorem 1 (Approximation Guarantee): For any ζ≥1, the returned S_pr satisfies
R(Spr)≥2e−1e−1⋅ζ1⋅R(S∗)
This is derived from the (1-1/e) approximation factor for submodular maximization.
Construct empirical CDF F̂_{j,a,t} and mean estimate μ̂_{j,a,t}
Main Loop Phase (t > MA):
Probing Set Selection: Call Algorithm 1 based on current estimates F̂_{j,a,t} to select S_t
Probing Update: Sample arms in S_t, update statistics and confidence upper bounds
Uj,a,t=min(μ^j,a,t+wj,a,t,1)
where w_{j,a,t} is confidence width based on Freedman's inequality
Lemma 7 (Concentration Bound): With probability at least 1-δ/2, for all t>A, a∈A, j∈M:
∣μj,a−μ^j,a,t∣≤Nj,a,t2(μ^j,a,t−μ^j,a,t2)ln(δ2MAT)+3Nj,a,tln(δ2MAT)=wj,a,t
Probing and Fairness Integration: First to combine active probing mechanisms with NSW fairness objectives, unlike prior work optimizing only total rewards
Submodular Approximation Technique: Transforms non-submodular problems into submodular optimization via piecewise linear upper bounds, maintaining tractability
UCB-NSW Fusion: Online algorithm cleverly combines UCB confidence bounds with NSW optimization, balancing exploration-exploitation and fairness
Hierarchical Regret Analysis: Partitions rounds into "large γ" and "small γ" categories, handling high and low uncertainty cases separately
Jones, Nguyen & Nguyen (2023): "An efficient algorithm for fair multi-agent multi-armed bandit with low regret" - Foundational work directly extended by this paper
Cole & Gkatzelis (2015): "Approximating the Nash social welfare with indivisible items" - Theoretical foundation for NSW optimization
Zuo, Zhang & Joe-Wong (2020): "Observe Before Play: Multi-Armed Bandit with Pre-Observations" - Pioneer work on probing mechanisms
Bhaskara et al. (2020): "Adaptive probing policies for shortest path routing" - Submodular probing application
Caragiannis et al. (2019): "The unreasonable fairness of maximum Nash welfare" - Theoretical fairness of NSW
This paper makes important contributions to the multi-agent multi-armed bandit field by systematically combining active probing mechanisms with fairness objectives for the first time. Theoretically rigorous (constant-factor approximation and sublinear regret), experimentally comprehensive (synthetic and real data), and methodologically innovative (submodular approximation and UCB-NSW fusion). Primary limitations lie in computational complexity and experimental scale, but these do not diminish its academic value and practical potential. This work opens new research directions in fair machine learning and online decision-making, warranting further exploration and application.