2025-11-25T01:19:18.327955

Distributed Thompson sampling under constrained communication

Zerefa, Ren, Ma et al.

In Bayesian optimization, a black-box function is maximized via the use of a surrogate model. We apply distributed Thompson sampling, using a Gaussian process as a surrogate model, to approach the multi-agent Bayesian optimization problem. In our distributed Thompson sampling implementation, each agent receives sampled points from neighbors, where the communication network is encoded in a graph; each agent utilizes their own Gaussian process to model the objective function. We demonstrate theoretical bounds on Bayesian average regret and Bayesian simple regret, where the bound depends on the structure of the communication graph. Unlike in batch Bayesian optimization, this bound is applicable in cases where the communication graph amongst agents is constrained. When compared to sequential single-agent Thompson sampling, our bound guarantees faster convergence with respect to time as long as the communication graph is connected. We confirm the efficacy of our algorithm with numerical simulations on traditional optimization test functions, demonstrating the significance of graph connectivity on improving regret convergence.

academic

Distributed Thompson sampling under constrained communication

Basic Information

Paper ID: 2410.15543
Title: Distributed Thompson sampling under constrained communication
Authors: Saba Zerefa, Zhaolin Ren, Haitong Ma, Na Li (Harvard School of Engineering and Applied Sciences)
Classification: cs.LG cs.SY eess.SY math.OC stat.ML
Publication Date: January 1, 2025 (arXiv v3)
Paper Link: https://arxiv.org/abs/2410.15543

Abstract

This paper investigates multi-agent Bayesian optimization under communication constraints. The authors propose a distributed Thompson sampling algorithm using Gaussian processes as surrogate models. In this implementation, each agent receives sampling points from neighbors, with the communication network encoded as a graph structure; each agent utilizes its own Gaussian process to model the objective function. The paper establishes theoretical bounds on Bayesian average regret and Bayesian simple regret, which depend on the structure of the communication graph. Unlike batch Bayesian optimization, these bounds apply to scenarios with restricted communication graphs between agents. Compared to sequential single-agent Thompson sampling, the algorithm guarantees faster convergence rates whenever the communication graph is connected.

Research Background and Motivation

Problem Definition

The core problem addressed in this paper is black-box function optimization in multi-agent systems with communication constraints. Specifically:

Black-box Stochastic Optimization Challenge: Finding the maximum of a function when the objective function is unknown and can only be accessed through noisy evaluations
Multi-agent Collaboration Requirements: Multiple agents can sample the objective function in parallel, but their mutual communication may be restricted
Communication Constraint Realism: In practical applications (e.g., multi-robot source seeking, sensor networks), agents may not have access to information from all other agents

Research Significance

This problem has broad applications in several important domains:

Hyperparameter tuning in machine learning
Simulation-based optimization
Experimental design
Multi-robot systems
Sensor network optimization

Limitations of Existing Methods

Centralized Approaches Inapplicable: Require a central coordinator to manage all agent data, impractical in distributed scenarios
Batch Bayesian Optimization Assumptions Too Strong: Assumes all agents have access to identical information, inconsistent with communication-constrained reality
Existing Theoretical Guarantees Demanding: Prior literature providing theoretical guarantees for distributed Bayesian optimization requires fully connected communication graphs

Research Motivation

The authors' starting point is to develop a distributed Bayesian optimization algorithm that works under arbitrary communication graph structures and provides corresponding theoretical guarantees.

Core Contributions

Proposed Distributed Thompson Sampling Algorithm: Designed a novel algorithm for multi-agent Bayesian optimization under communication constraints
Established Theoretical Bounds:
- Bayesian average regret bound: $\tilde{O}\left(\sqrt{\frac{\theta(G)}{\sqrt{Mt}}}\right)$
- Bayesian simple regret bound: $\tilde{O}\left(\sqrt{\frac{1}{t|V_{max}|}}\right)$
Graph Structure Dependency Analysis: Bounds depend on the clique cover number $\theta(G)$ and maximum clique size $|V_{max}|$ of the communication graph
Convergence Guarantees: Proved faster convergence compared to sequential single-agent methods when the communication graph is connected
Numerical Verification: Validated algorithm effectiveness on standard optimization test functions

Methodology Details

Task Definition

For a compact set $X \subset \mathbb{R}^d$ , consider an unknown continuous function $f: X \rightarrow \mathbb{R}$ , with the goal of finding its maximum. There are $M$ agents, each capable of querying $f$ and receiving noisy observations $y = f(x) + \epsilon$ , where $\epsilon \sim \mathcal{N}(0, \sigma_\epsilon^2)$ .

The communication network is described by a graph $G = (V,E)$ , where $|V| = M$ , and an edge $(i,j) \in E$ indicates that agents $i$ and $j$ can communicate. The data accessible to agent $i$ at time $t$ is $D_{t,i} = \{(x_{\tau,j}, y_{\tau,j})\}_{j \in \mathcal{N}(i) \cup \{i\}, \tau < t}$ .

Model Architecture

Gaussian Process Modeling

Each agent $i$ uses an independent Gaussian process $GP_{t,i}$ to model the objective function: $f | \mathcal{F}_{D_{t,i}} \sim GP_{t,i}(\mu_{D_{t,i}}(x), k_{D_{t,i}}(x,x'))$

where:

$\mu_{D_t}(x) = k_t(x)^T(K_{D_t} + \sigma_n^2 I)^{-1}y_{D_t}$
$k_{D_t}(x,x') = k(x,x') - k_{D_t}(x)^T(K_{D_t} + \sigma_n^2 I)^{-1}k_{D_t}(x')$

Distributed Thompson Sampling Algorithm

Algorithm 1: Distributed Thompson Sampling

1. Set GP prior on f
2. Initialization: For i=1,...,M, set initial data D_{1,i} and GP_{0,i}
3. For t=1,...,T:
   For i=1,...,M:
   a) Update posterior GP_{t,i} based on D_{t,i}
   b) Sample function realization from GP_{t,i}: f̂_{t,i} ~ GP_{t,i}
   c) Select query point: x_{t,i} = argmax_x f̂_{t,i}(x)
   d) Observe y_{t,i}
   e) Broadcast (x_{t,i}, y_{t,i}) to neighbors
   f) Collect evaluations C_{t,i} from neighbors
   g) Update data history: D_{t+1,i} = D_{t,i} ∪ C_{t,i} ∪ {(x_{t,i}, y_{t,i})}

Technical Innovations

Coordinator-Free Design: Each agent independently maintains its own GP model, avoiding bottlenecks of centralized approaches
Communication Graph Structure Exploitation: Theoretical analysis cleverly decomposes the communication graph into disjoint cliques and analyzes the performance of each clique separately
Information-Theoretic Analysis Framework: Utilizes information-theoretic concepts such as maximum information gain (MIG) to bound algorithm performance

Experimental Setup

Test Functions

Two standard optimization test functions are used:

Rosenbrock Function: $f(x,y) = (1-x)^2 + 100(y-x^2)^2$ $f (x, y) = (1 - x)^{2} + 100 (y - x^{2})^{2}$
- Characteristic: Contains a large valley with the global minimum located within it
Ackley Function: $f(x,y) = -20\exp(-0.2\sqrt{\frac{x^2+y^2}{2}}) - \exp(\frac{1}{2}(\cos(2\pi x) + \cos(2\pi y))) + 20 + e$ $f (x, y) = - 20 exp (- 0.2 \frac{x ^{2} + y ^{2}}{2}) - exp (\frac{1}{2} (cos (2 π x) + cos (2 π y))) + 20 + e$
- Characteristic: Has many local maxima and one global minimum

Communication Networks

Erdős-Rényi random graphs are used with 20 agents and connection probabilities of 0.2, 0.4, and 0.6 respectively.

Evaluation Metrics

Instantaneous Average Regret: $R^A(t) = \frac{1}{M}\sum_{i=1}^M (f^* - f(x_{t,i}))$
Instantaneous Simple Regret: $R^S(t) = f^* - \max_{i,\tau} f(x_{t,i})$
Cumulative Regret: Time accumulation of the above metrics

Implementation Details

BOTorch package for implementation
Gaussian processes use Matérn kernel ( $\nu = 5/2$ )
50 time steps per run
Argmax computed via grid search

Experimental Results

Main Results

Experimental results strongly support theoretical predictions:

Connectivity Impact: On both Rosenbrock and Ackley functions, graphs with higher connection probability (0.6 > 0.4 > 0.2) achieve better regret convergence performance
Consistent Performance: This trend is verified across both instantaneous simple regret and average regret metrics
Algorithm Effectiveness: Distributed Thompson sampling successfully locates extrema of both test functions

Theoretical Verification

Numerical results validate core predictions of theoretical analysis:

Higher connectivity communication graphs yield better performance
Graph structure significantly impacts algorithm convergence speed

Theoretical Analysis

Main Theorems

Theorem 3.1 (Bayesian Average Regret Bound): Let $\{G_k\}_{k \in \{1,...,n\}}$ be a set of $n$ disjoint cliques of communication graph $G$ . Then the Bayesian average regret after $t$ steps satisfies: $R_{AB}(t) \leq \frac{1}{M}\sum_{k=1}^n |V_k|\left(\frac{C_1}{t|V_k|} + \sqrt{\frac{C_2\xi_{|V_k|}\beta_t\Psi_{t|V_k|}}{t|V_k|}}\right)$

Corollary 3.2: Choosing $n$ as the clique cover number $\theta(G)$ of graph $G$ , we obtain: $R_{AB}(t) = \tilde{O}\left(\sqrt{\frac{\theta(G)}{\sqrt{Mt}}}\right)$

Theorem 3.3 (Bayesian Simple Regret Bound): Let $G_s = (V_s, E_s)$ be a clique of $G$ . Then: $R_{SB}(t) \leq \frac{C_1}{t|V_s|} + \sqrt{\frac{C_2\xi_{|V_s|}\beta_t\Psi_{t|V_s|}}{t|V_s|}}$

Corollary 3.4: Choosing $G_{max}$ as the maximum clique, we obtain: $R_{SB}(t) = \tilde{O}\left(\sqrt{\frac{1}{t|V_{max}|}}\right)$

Convergence Analysis

Compared to sequential single-agent Thompson sampling with $\tilde{O}(\sqrt{1/t})$ regret:

Average regret improvement factor: $\sqrt{\theta(G)/M}$
Simple regret improvement factor: $\sqrt{1/|V_{max}|}$

Bayesian Optimization Field

Single-Agent Methods: GP-UCB, Expected Improvement, Thompson Sampling
Batch Methods: Parallel knowledge gradient, batch Thompson sampling
Multi-Agent Methods: Primarily focused on centralized or batch methods under complete connectivity assumptions

Positioning of This Work

This paper is the first to provide theoretical guarantees for distributed Bayesian optimization under communication constraints (non-fully connected graphs), filling an important gap in the field.

Conclusions and Discussion

Main Conclusions

Algorithm Effectiveness: The proposed distributed Thompson sampling algorithm effectively solves multi-agent Bayesian optimization under communication constraints
Theoretical Guarantees: Established regret bounds dependent on communication graph structure, proving convergence advantages under connected graphs
Graph Structure Importance: Communication graph connectivity significantly impacts algorithm performance

Limitations

Synchronization Assumption: The algorithm assumes a global synchronous clock, which may be unrealistic in practical applications
Computational Complexity: Efficiency issues of argmax computation in high-dimensional spaces remain incompletely resolved
Kernel Function Selection: Theoretical analysis depends on specific kernel function assumptions

Future Directions

Asynchronous Versions: Develop algorithm variants not requiring global synchronization
Efficient Optimization: Research efficient computation methods for argmax in high-dimensional Thompson sampling
Tighter Bounds: Seek tighter regret bounds
Practical Applications: Validate the algorithm in real multi-robot or sensor network systems

In-Depth Evaluation

Strengths

Significant Theoretical Contribution: First to provide theoretical guarantees for distributed Bayesian optimization under communication constraints
Practical Problem Setting: Addresses the important real-world issue of communication constraints
Rigorous Analysis: Clear theoretical proof structure with deep analysis using information-theoretic tools
Sufficient Experimental Support: Numerical experiments well validate theoretical predictions

Weaknesses

Limited Experimental Scale: Validation only on 2D test functions and relatively small-scale networks
Insufficient Practical Considerations: Synchronization assumptions and argmax computation efficiency issues limit practical applicability
Lack of Comparative Experiments: Missing direct comparisons with other distributed optimization methods

Impact

High Theoretical Value: Makes important contributions to distributed Bayesian optimization theory
Broad Application Prospects: Potential applications in multi-robot systems, IoT, and other domains
Strong Extensibility: Provides solid theoretical foundation for subsequent research

Applicable Scenarios

Multi-robot collaborative optimization tasks
Distributed sensor network parameter tuning
Collaborative learning in edge computing environments
Parallel optimization problems with limited communication bandwidth

Overall Assessment: This is a high-quality paper with significant theoretical contributions to the field of distributed Bayesian optimization. The authors cleverly combine graph theory, information theory, and Bayesian optimization to provide theoretical guarantees for communication-constrained scenarios commonly encountered in practice. While there is room for improvement in practical applicability, its theoretical value and guidance for future research are substantial.