2025-11-18T19:13:14.156692

NUBO: A Transparent Python Package for Bayesian Optimization

Diessner, Wilson, Whalley

NUBO, short for Newcastle University Bayesian Optimization, is a Bayesian optimization framework for optimizing expensive-to-evaluate black-box functions, such as physical experiments and computer simulators. Bayesian optimization is a cost-efficient optimization strategy that uses surrogate modeling via Gaussian processes to represent an objective function and acquisition functions to guide the selection of candidate points to approximate the global optimum of the objective function. NUBO focuses on transparency and user experience to make Bayesian optimization accessible to researchers from all disciplines. Clean and understandable code, precise references, and thorough documentation ensure transparency, while a modular and flexible design, easy-to-write syntax, and careful selection of Bayesian optimization algorithms ensure a good user experience. NUBO allows users to tailor Bayesian optimization to their problem by writing a custom optimization loop using the provided building blocks. It supports sequential single-point, parallel multi-point, and asynchronous optimization of bounded, constrained, and mixed (discrete and continuous) parameter input spaces. Only algorithms and methods extensively tested and validated to perform well are included in NUBO. This ensures that the package remains compact and does not overwhelm the user with an unnecessarily large number of options. The package is written in Python but does not require expert knowledge of Python to optimize simulators and experiments. NUBO is distributed as open-source software under the BSD 3-Clause license.

academic

NUBO: A Transparent Python Package for Bayesian Optimization

Basic Information

Paper ID: 2305.06709
Title: NUBO: A Transparent Python Package for Bayesian Optimization
Authors: Mike Diessner, Kevin Wilson, Richard D. Whalley (Newcastle University)
Categories: cs.LG (Machine Learning), cs.MS (Mathematical Software), stat.ML (Statistics - Machine Learning)
Publication Date: arXiv v2, June 3, 2024
Paper Link: https://arxiv.org/abs/2305.06709
Open Source: www.nubopy.com
License: BSD 3-Clause

Abstract

NUBO (Newcastle University Bayesian Optimization) is a Bayesian optimization framework specifically designed for optimizing expensive black-box functions, applicable to scenarios such as physical experiments and computer simulations. The framework employs Gaussian processes for surrogate modeling and uses acquisition functions to guide candidate point selection, approximating the global optimum with minimal function evaluations. NUBO emphasizes transparency and user experience through clear code, precise citations, and comprehensive documentation; and ensures good user experience through modular design, intuitive syntax, and carefully selected algorithms. The framework supports sequential single-point, parallel multi-point, and asynchronous optimization, applicable to bounded, constrained, and mixed (discrete-continuous) parameter spaces. It includes only thoroughly tested and validated algorithms, maintaining package compactness and avoiding choice overload.

Research Background and Motivation

1. Core Problem to Address

Many scientific and engineering fields face optimization problems of expensive black-box functions:

Functions lack known or analytically tractable mathematical expressions
Each function evaluation is costly (material costs, computational costs, time costs)
Derivative information is unavailable
Not suitable for large numbers of function evaluations

Typical application scenarios include:

Parameter optimization in computational fluid dynamics
Molecular design and drug discovery in chemical engineering
Hyperparameter tuning of machine learning models
Neural architecture search

2. Problem Importance

Traditional optimization algorithms (such as Adam, L-BFGS-B, differential evolution) rely on:

Derivative information (typically unavailable)
Large numbers of function evaluations (infeasible for expensive functions)

Bayesian optimization provides a sample-efficient alternative, but existing implementations have limitations.

3. Limitations of Existing Methods

Through detailed comparative analysis (Table 1), existing Python packages have the following issues:

Package	Lines of Code	Parallel Optimization	Asynchronous Optimization	Main Issues
BoTorch	38,419	✓	✓	Codebase too large (29× NUBO), difficult to understand
bayes_opt	1,241	✗	✗	No parallel/asynchronous support
SMAC3	11,217	✗	✗	Limited functionality
pyGPGO	2,029	✗	✗	Limited functionality
GPyOpt	4,605	✓	✗	Maintenance discontinued
Spearmint	3,662	✗	✗	Non-modular design, poor flexibility

Key Issues:

Complexity vs. Transparency: BoTorch is powerful but complex (160 files), difficult for non-experts to understand
Functional Limitations: Most packages lack parallel/asynchronous optimization support
Choice Overload: Offering numerous options makes decision-making difficult for non-experts

4. Research Motivation

To provide interdisciplinary researchers (non-statisticians/computer scientists) with:

Transparency: Concise code (only 1,322 lines, 20 files)
Ease of Use: Modular design, intuitive syntax
Efficiency: Support for parallel/asynchronous/constrained/mixed optimization
Reliability: Only includes validated algorithms

Core Contributions

Lightweight Implementation: Complete Bayesian optimization framework in 1,322 lines of code, only 3.4% of BoTorch's size while providing comparable functionality
Comprehensive Optimization Strategy Support:
- Sequential single-point optimization
- Parallel multi-point optimization
- Asynchronous optimization
- Constrained optimization
- Mixed discrete-continuous parameter optimization
Transparency Design Philosophy:
- Clear code structure
- Precise academic citations
- Comprehensive documentation (paper + website)
User-Friendly Modular Architecture:
- Flexible building block design
- Intuitive Python syntax
- Carefully selected efficient algorithms
Performance Validation: Benchmark testing demonstrates performance comparable to or superior to mainstream packages (BoTorch, SMAC3, etc.), proving that simplicity does not compromise performance
Open Source Ecosystem: Built on PyTorch ecosystem (Torch, GPyTorch), ensuring good extensibility and GPU acceleration support

Methodology Details

Task Definition

Bayesian optimization aims to solve a d-dimensional maximization problem:

$x^* = \arg\max_{x \in X} f(x)$

where:

Input Space $X \in [a,b]^d$ : typically a bounded continuous hyperrectangular space
Objective Function $f(x)$ : expensive, derivative-free black-box function
Observations $y_i = f(x_i) + \epsilon$ : noisy with $\epsilon \sim \mathcal{N}(0, \sigma^2)$
Training Data $D_n = \{(x_i, y_i)\}_{i=1}^n$

Extended Tasks (supported by NUBO):

Constrained Optimization: $\text{subject to } g_i(x) = 0, \quad h_j(x) \geq 0$
Mixed Parameters: Some dimensions are discrete values

Model Architecture

Overall Algorithm Flow (Algorithm 1)

Input: Evaluation budget N, initial points n₀, surrogate model M, acquisition function α
1. Sample n₀ initial points via space-filling design, obtain observations
2. Set training data D_n = {X₀, y₀}
3. While n ≤ N - n₀:
   a. Train surrogate model M (Gaussian Process) with D_n
   b. Maximize acquisition function α to find candidate point x*_n
   c. Evaluate x*_n to obtain y*_n, add to D_n
   d. n = n + 1
4. Return point x* corresponding to highest observed value

Core Components

1. Surrogate Model: Gaussian Process (GP)

Prior Distribution: $f(X_n) \sim \mathcal{N}(m(X_n), K(X_n, X_n))$

NUBO's Configuration Choices:

Mean Function: Constant mean $\mu_{\text{constant}}(x) = c$
Covariance Kernel: Matérn 5/2 ARD kernel $\Sigma_{\text{Matérn}}(x, x') = \sigma_f^2 \left(1 + \frac{\sqrt{5}r}{l} + \frac{5r^2}{3l^2}\right) \exp\left(-\frac{\sqrt{5}r}{l}\right)$ where $r = |x - x'|$

Automatic Relevance Determination (ARD):

Each input dimension has independent length scale $l_d$
Large length scale → dimension unimportant
Small length scale → dimension important

Posterior Distribution: $f(X^*) | D_n, X^* \sim \mathcal{N}(\mu_n(X^*), \sigma_n^2(X^*))$

$\mu_n(X^*) = K(X^*, X_n)[K(X_n, X_n) + \sigma_y^2 I]^{-1}(y - m(X_n)) + m(X^*)$

$\sigma_n^2(X^*) = K(X^*, X^*) - K(X^*, X_n)[K(X_n, X_n) + \sigma_y^2 I]^{-1}K(X_n, X^*)$

Hyperparameter Estimation: Via maximizing log marginal likelihood (MLE): $\log P(y_n | X_n) = -\frac{1}{2}(y_n - m(X_n))^\top[K + \sigma_y^2 I]^{-1}(y_n - m(X_n)) - \frac{1}{2}\log|K + \sigma_y^2 I| - \frac{n}{2}\log 2\pi$

2. Acquisition Functions

Analytical Acquisition Functions (for sequential single-point optimization)

Expected Improvement (EI): $\alpha_{\text{EI}}(X^*) = (\mu_n(X^*) - y_{\text{best}})\Phi(z) + \sigma_n(X^*)\phi(z)$ where $z = \frac{\mu_n(X^*) - y_{\text{best}}}{\sigma_n(X^*)}$

Upper Confidence Bound (UCB): $\alpha_{\text{UCB}}(X^*) = \mu_n(X^*) + \sqrt{\beta}\sigma_n(X^*)$

Optimizer: L-BFGS-B (bounded) or SLSQP (constrained)

Monte Carlo Acquisition Functions (for parallel/asynchronous optimization)

Approximated via reparameterization trick: $\alpha_{\text{EI}}^{\text{MC}}(X^*) = \max(\text{ReLU}(\mu_n(X^*) + Lz - y_{\text{best}}))$

$\alpha_{\text{UCB}}^{\text{MC}}(X^*) = \max\left(\mu_n(X^*) + \sqrt{\frac{\beta\pi}{2}}|Lz|\right)$

where:

$L$ : lower triangular matrix from Cholesky decomposition $LL^\top = K$
$z \sim \mathcal{N}(0, I)$ : standard normal samples

Batch Optimization Strategies:

Joint Optimization: Optimize all batch points simultaneously
Greedy Sequential: Optimize points one-by-one with previous points fixed (empirically better)

Optimizer: Adam (stochastic) or L-BFGS-B/SLSQP (fixed base samples)

Technical Innovation Points

1. Balancing Transparency and Simplicity

Code Comparison: NUBO (1,322 lines) vs BoTorch (38,419 lines)
File Comparison: 20 vs 160 files
Design Philosophy: Avoid over-abstraction, maintain traceable functions and objects

2. Modular Design

Users can build custom optimization loops in 4 steps:

# 1. Define input space
bounds = torch.tensor([[0., 0., ...], [1., 1., ...]])

# 2. Train Gaussian Process
gp = GaussianProcess(x_train, y_train, likelihood)
fit_gp(x_train, y_train, gp, likelihood)

# 3. Define acquisition function
acq = UpperConfidenceBound(gp=gp, beta=4)

# 4. Optimize acquisition function
x_new, _ = single(func=acq, method="L-BFGS-B", bounds=bounds)

3. Practical Solution for Mixed Optimization

Strategy: Enumerate all discrete combinations, optimize continuous parameters for each
Implementation: Specify discrete dimensions and possible values via dictionary
Limitation: Computationally expensive when many discrete dimensions or values (paper honestly acknowledges this)

4. Asynchronous Optimization Support

Scenario: Continue optimization when evaluation time is uncertain
Implementation: Pass pending evaluation points as fixed points in x_pending
Advantage: Fully utilize computational resources

5. Decision Flow Chart (Figure 3)

Provides clear algorithm selection guidance:

Asynchronous? → Parallel? → Constrained?
Each branch recommends specific acquisition function and optimizer combinations

Experimental Setup

Datasets

Two standard benchmark functions (from virtual simulation experiment library 24):

2D Levy Function:
- Dimension: 2
- Characteristics: Multimodal, multiple local optima
- Global optimum: 0.00
6D Hartmann Function:
- Dimension: 6
- Characteristics: Multiple local minima, one global minimum
- Global optimum: 3.32237
- Input space: 0,1⁶

Both functions are negated to convert to maximization problems.

Evaluation Metrics

Best Observed Value: Best output at current iteration (mean ± standard error)
Convergence Speed: Number of evaluations needed to reach global optimum
Time per Iteration: Algorithm computational overhead

Comparison Methods

Comparison with 5 mainstream Python packages:

BoTorch (v0.8.4): Most comprehensive functionality
bayes_opt (v1.4.3): Lightweight
SMAC3 (v2.0.0): Medium complexity
pyGPGO (v0.5.0): Lightweight
NUBO (v1.0.3): This work

Unified Configuration:

Surrogate model: Gaussian Process
Acquisition function: Upper Confidence Bound (UCB)
Runs: 10 repeated experiments
Hardware: Apple Mac mini (M2, 16GB)

Implementation Details

Sequential Optimization

Initial points: Generated via Latin hypercube sampling
Levy: 30 evaluations
Hartmann: 60 evaluations

Parallel Optimization

Batch size: 4
Levy: 30 evaluations (7.5 batches)
Hartmann: 100 evaluations (25 batches)

Experimental Results

Main Results

Table 2: Final Performance Comparison

Package	2D Levy (Sequential)	6D Hartmann (Sequential)	2D Levy (Parallel)	6D Hartmann (Parallel)
NUBO	-0.04 (±0.06)	3.28 (±0.06)	-0.04 (±0.04)	3.27 (±0.06)
BoTorch	-0.21 (±0.20)	3.27 (±0.07)	-0.27 (±0.21)	3.26 (±0.06)
SMAC3	-0.71 (±0.58)	2.70 (±0.38)	-	-
bayes_opt	-0.64 (±0.74)	3.20 (±0.13)	-	-
pyGPGO	-0.28 (±0.31)	2.64 (±1.05)	-	-

Key Findings:

NUBO closest to true optima in all tests (Levy: 0.00, Hartmann: 3.32)
Lowest variance: Most stable results
Best among lightweight packages: Outperforms bayes_opt and pyGPGO
Competitive with complex packages: Comparable to BoTorch and SMAC3

Table 3: Computational Efficiency Comparison

Package	2D Levy (Sequential)	6D Hartmann (Sequential)	2D Levy (Parallel)	6D Hartmann (Parallel)
NUBO	0.60s	1.88s	0.07s	2.20s
BoTorch	0.09s	0.22s	0.00s	0.19s
SMAC3	0.08s	0.25s	-	-
bayes_opt	0.14s	0.24s	-	-
pyGPGO	0.23s	0.65s	-	-

Analysis:

NUBO has higher per-iteration time (up to 2.20s)
But negligible for expensive black-box functions:
- Physical experiments: may require hours/days
- Complex simulations: may require minutes/hours
- Additional 2 seconds negligible relative to evaluation cost

Convergence Curve Analysis (Figure 1)

A) 2D Levy Sequential Optimization:

NUBO converges rapidly to global optimum
bayes_opt and pyGPGO show larger fluctuations
SMAC3 performs poorly

B) 6D Hartmann Sequential Optimization:

All methods converge
NUBO and BoTorch closest to true optimum
SMAC3 and pyGPGO have large variance

C) 2D Levy Parallel Optimization:

NUBO and BoTorch perform similarly
Parallel strategy effective

D) 6D Hartmann Parallel Optimization:

NUBO slightly slower than BoTorch to reach high values
But final value better and more stable

Case Study Results (Section 4)

Task: 6D Hartmann function with first dimension as discrete parameter (11 values)

Setup:

Initial points: 30 (5×dimension)
Optimization iterations: 10
Batch size: 4
Acquisition function: MC-UCB (β=4, 128 samples)

Results:

Found optimal solution at 53rd evaluation
Input: 0.4, 0.9136, 1.0, 0.5669, 0.0, 0.0802
Output: 3.2133 (true optimum 3.32237)
Error: only 3.3%

Comparison with Random and LHS Sampling (Figure 5):

NUBO significantly outperforms random and Latin hypercube sampling
Closest to true optimum after 70 evaluations

Experimental Findings

Simplicity without sacrificing performance: 1,322 lines of code achieves comparable performance to 38,419-line BoTorch
Stability advantage: Smallest standard error, suitable for practical applications
Effective parallel strategy: Greedy sequential strategy performs well on real problems
Mixed optimization feasible: Successfully handles discrete-continuous mixed parameter spaces
Acceptable computational overhead: Additional seconds negligible for expensive black-box functions

Python Implementation Comparison

Feature	NUBO	BoTorch	GPyOpt	Others
Modularity	✓	✓	✓	Partial
Parallel Optimization	✓	✓	✓	✗
Asynchronous Optimization	✓	✓	✗	✗
Code Complexity	Low	High	Medium	Low-Medium
Maintenance Status	Active	Active	Archived	Active

Other Language Implementations

R Language:
- rBayesianOptimization: Basic functionality
- ParBayesianOptimization: Parallel support

Main Research Directions

Hyperparameter Optimization: ML model tuning (Spearmint, SMAC3)
Neural Architecture Search: BANANAS, etc.
Scientific Applications: Fluid dynamics, chemical engineering, materials design

NUBO's Positioning

Target Users: Interdisciplinary researchers (non-ML experts)
Design Philosophy: Transparency > Feature richness
Application Scenarios: Physical experiments and simulation optimization

Conclusions and Discussion

Main Conclusions

NUBO successfully balances transparency and performance:
- Complete functionality in minimal code (1,322 lines)
- Performance comparable to or superior to complex packages (BoTorch)
Comprehensive Feature Support:
- Sequential/parallel/asynchronous optimization
- Constrained and mixed parameter spaces
- Easily customizable modular design
Suitable for Interdisciplinary Applications:
- Clear documentation and code
- Intuitive API design
- Carefully selected reliable algorithms
Good Open Source Ecosystem:
- Built on PyTorch ecosystem
- BSD 3-Clause license
- Active maintenance

Limitations

Limitations Honestly Acknowledged by Paper:

Computational Efficiency:
- ~10× slower per iteration than BoTorch
- But negligible for expensive black-box functions
Mixed Optimization Scalability:
- Enumeration strategy infeasible with many discrete dimensions/values
- No more efficient alternative provided
Feature Coverage:
- No multi-fidelity optimization support
- No multi-objective optimization support
- No high-dimensional specialized methods
Limited Algorithm Selection:
- Only two acquisition functions (EI, UCB)
- Missing other popular methods (e.g., knowledge gradient, entropy search)

Potential Issues:

Limited Benchmark Testing:
- Only 2 synthetic functions tested
- Lack of real-world application comparisons
- No high-dimensional testing (>10D)
Hyperparameter Sensitivity:
- Lack of automated guidance for β parameter selection
- Insufficient analysis of Monte Carlo sample number effects
Insufficient Scalability Verification:
- No large-scale parallel testing (batch > 4)
- GPU acceleration capabilities not demonstrated

Future Directions

Expansion plans explicitly stated in paper:

Multi-Fidelity Optimization: Leverage simulations of different fidelities to accelerate optimization
Multi-Objective Optimization: Simultaneously optimize multiple conflicting objectives
High-Dimensional Optimization: Develop specialized methods for high-dimensional spaces (e.g., embeddings, random embeddings)

In-Depth Evaluation

Strengths

1. Methodological Innovation (Moderate)

No algorithmic innovation: No new Bayesian optimization algorithms proposed
Engineering innovation: Excellent balance between simplicity and functionality
Design innovation: Modular architecture reduces usage barriers

2. Experimental Sufficiency (Good)

✓ Strengths:

Horizontal comparison with 5 mainstream packages
10 repeated experiments providing statistical significance
Both sequential and parallel scenarios
Detailed case studies

✗ Weaknesses:

Only 2 benchmark functions with low dimensions
Lack of real application comparisons
No extreme scenario testing (high-dimensional, large batch)

3. Result Convincingness (Strong)

Quantitative Evidence: Achieves optimal or near-optimal in all tests
Stability: Smallest standard error
Code Comparison: Objective quantification of simplicity advantage
Honest Reporting: Acknowledges computational efficiency disadvantage

4. Writing Clarity (Excellent)

Clear structure: background → methods → experiments → case studies
Detailed formulas: Complete mathematical derivations
Rich code examples: Code snippets for each feature
Effective visualizations: Flow charts, convergence curves, comparison plots

5. Reproducibility (Excellent)

Open source code and documentation
Detailed experimental setup
Provided reproduction materials
Clear version information

Weaknesses

1. Method Limitations

Inefficient enumeration strategy: Mixed optimization infeasible with many discrete dimensions
Conservative algorithm selection: Only EI and UCB, missing modern methods (qKG, MES)
Lack of adaptive strategies: Manual tuning required for hyperparameters like β

2. Experimental Design Flaws

Thin benchmark testing:
- Only 2 synthetic functions
- Maximum dimension only 6D
- No noise robustness testing
Incomplete comparisons:
- No R package comparison
- No GPU acceleration testing
- No memory consumption evaluation
Limited case studies:
- Still synthetic functions
- No real scientific applications demonstrated

3. Insufficient Theoretical Analysis

No convergence guarantee analysis
No sample complexity analysis
No theoretical discussion of greedy strategy properties

4. Performance Issues

Computational efficiency: 10× slower than BoTorch
Unknown scalability:
- Large batch performance?
- High-dimensional performance?
- Numerical stability in long runs?

Impact

1. Contribution to Field (Moderate)

Engineering contribution significant: Lowers barriers for interdisciplinary users
Algorithm contribution limited: No new methods proposed
Educational value high: Clear implementation serves as learning material

2. Practical Value (High)

Applicable Scenarios:

✓ Physical experiment optimization (expensive evaluation)
✓ Engineering simulation (medium-scale parameters)
✓ Teaching and prototyping
✓ Research requiring algorithm understanding

Inapplicable Scenarios:

✗ Large-scale hyperparameter search (efficiency critical)
✗ High-dimensional optimization (>20D)
✗ Competitive research requiring state-of-the-art algorithms

3. Reproducibility (Excellent)

Complete open source code
Comprehensive documentation
Simple pip installation
Active maintenance

4. Potential User Base

Primary users: Experimental scientists, engineers
Secondary users: ML researchers (prototyping)
Educational users: Students and instructors

Applicable Scenarios

Recommended NUBO Usage:

Extremely Expensive Evaluation:
- Physical experiments (hours/days level)
- High-precision simulation (minutes/hours level)
- Algorithm 2-second overhead negligible
Need Algorithm Understanding:
- Research projects requiring algorithm modification
- Teaching and learning purposes
- Need for debugging and result explanation
Medium-Scale Problems:
- Parameter dimension ≤ 10
- Parallel batch ≤ 10
- Discrete parameters ≤ 3D
Specific Feature Requirements:
- Constrained optimization
- Mixed parameter spaces
- Asynchronous evaluation

Recommended Alternative Tools:

Use BoTorch:
- Need state-of-the-art algorithms
- High-dimensional problems (>20D)
- Large-scale parallelization
- GPU acceleration critical
Use SMAC3:
- Hyperparameter optimization
- Need mature industrial-grade tools
Use bayes_opt:
- Simple sequential optimization
- Minimal dependencies required

Selected References

Bayesian Optimization Foundations

1 Frazier (2018): A tutorial on Bayesian optimization
9 Jones et al. (1998): Efficient global optimization - Original EI paper
10 Snoek et al. (2012): Practical Bayesian optimization - Modern BO foundational work
11 Shahriari et al. (2015): Taking the human out of the loop - Survey paper

Gaussian Processes

28 Gramacy (2020): Surrogates - Practical GP textbook
30 Rasmussen & Williams (2006): Gaussian Processes for Machine Learning - Classic textbook

Acquisition Functions

27 Wilson et al. (2018): Maximizing acquisition functions - Batch optimization strategies
32 Srinivas et al. (2010): GP optimization in the bandit setting - UCB theoretical foundations

22 BoTorch (Balandat et al., 2020): Main competitor
21 SMAC3 (Lindauer et al., 2022): Hyperparameter optimization
35 GPyTorch (Gardner et al., 2018): NUBO's GP backend

Overall Assessment

Dimension	Score	Explanation
Innovation	3/5	Strong engineering innovation, weak algorithmic innovation
Technical Quality	4/5	Reliable implementation, efficiency needs improvement
Experimental Sufficiency	3.5/5	Comprehensive comparison, limited benchmarks
Writing Quality	5/5	Clear, detailed, reproducible
Practical Value	4/5	Highly useful in specific scenarios
Impact Potential	3.5/5	Fills niche market, not groundbreaking

Overall Evaluation: This is an excellent tools paper that successfully achieves its core goal—providing transparent, easy-to-use Bayesian optimization for interdisciplinary researchers. While algorithmic innovation is limited, it makes significant contributions in engineering design and user experience. Particularly suitable for scientific and engineering applications requiring algorithm understanding and optimization of expensive black-box functions. The high standards in code quality and documentation merit emulation by other open source projects.