2025-11-16T22:04:13.069952

An Introduction to Zero-Order Optimization Techniques for Robotics

Jordana, Zhang, Amigo et al.

Zero-order optimization techniques are becoming increasingly popular in robotics due to their ability to handle non-differentiable functions and escape local minima. These advantages make them particularly useful for trajectory optimization and policy optimization. In this work, we propose a mathematical tutorial on random search. It offers a simple and unifying perspective for understanding a wide range of algorithms commonly used in robotics. Leveraging this viewpoint, we classify many trajectory optimization methods under a common framework and derive novel competitive RL algorithms.

academic

An Introduction to Zero-Order Optimization Techniques for Robotics

Basic Information

Paper ID: 2506.22087
Title: An Introduction to Zero-Order Optimization Techniques for Robotics
Authors: Armand Jordana, Jianghan Zhang, Joseph Amigo, Ludovic Righetti (New York University)
Classification: cs.RO (Robotics)
Publication Date: arXiv preprint, latest version October 10, 2025
Paper Link: https://arxiv.org/abs/2506.22087

Abstract

Zero-order optimization techniques are gaining increasing popularity in robotics because they can handle non-differentiable functions and escape local minima. These advantages make them particularly useful in trajectory optimization and policy optimization. This paper presents a mathematical tutorial on stochastic search, providing a simple unified perspective for understanding widely-used algorithms in robotics. Leveraging this perspective, the authors classify many trajectory optimization methods under a unified framework and derive novel and competitive reinforcement learning algorithms.

Research Background and Motivation

Core Problem

The core problem addressed in this paper is how to achieve a unified understanding of zero-order optimization algorithms widely used in robotics, including various methods in trajectory optimization (TO) and reinforcement learning (RL).

Problem Significance

Practical Necessity: Robot systems frequently encounter non-differentiable objective functions, particularly in contact-rich problems (e.g., locomotion, manipulation)
Computational Advances: Development of parallel computing and GPU hardware has made sampling-intensive zero-order methods feasible on complex robotic systems
Theoretical Fragmentation: While existing algorithms have strong theoretical foundations, they lack unified understanding in the robotics community

Limitations of Existing Approaches

Algorithm Isolation: Algorithms such as MPPI, CMA-ES, and REINFORCE appear unrelated, lacking a unified framework
Scattered Theory: These algorithms are distributed across optimization, statistics, machine learning, and control domains
Limited Application Guidance: Lack of guidance for designing new algorithms from a unified perspective

Research Motivation

By establishing a unified perspective through stochastic search and Gaussian smoothing, connecting zero-order methods in both trajectory optimization and policy optimization, the work aims to deepen theoretical understanding while guiding new algorithm design.

Core Contributions

Unified Theoretical Framework: Provides a unified perspective for understanding zero-order algorithms in TO and RL based on stochastic search
Algorithm Reinterpretation: Unifies classical algorithms (MPPI, CMA, REINFORCE) under a Gaussian smoothing framework
Novel Algorithm Derivation: Derives new competitive RL algorithms (e.g., RS-DDPG, LSE-DDPG) based on the unified framework
Theoretical Insights: Explains the theoretical mechanism by which stochastic algorithms escape local minima
Experimental Validation: Verifies the framework's effectiveness and new algorithms' competitiveness on multiple robotic tasks

Methodology Details

Problem Formulation

This paper focuses on solving the following general optimization problem: $\min_{x \in \mathbb{R}^n} f(x)$

This formulation encompasses a broad range of problems in robotics:

Trajectory Optimization: Optimization in trajectory space (finite-dimensional)
Policy Optimization: Optimization in policy parameter space (infinite-dimensional functions)

Core Theoretical Framework

1. Stochastic Search Fundamentals

Pure Random Search (Algorithm 1):

Input: x₀ ∈ Rⁿ
while stopping condition not met:
    Randomly sample x̃ in Rⁿ
    if f(x̃) < f(x):
        x ← x̃
Output: x

Greedy Local Search (Algorithm 2):

Input: x₀ ∈ Rⁿ, Σ
while stopping condition not met:
    Sample d ~ N(0,Σ)
    if f(x+d) < f(x):
        x ← x+d

2. Gaussian Smoothing Gradient Approximation

Core Idea: Rather than directly approximating the gradient of the original function f, study a smoothed surrogate function: $f_μ(x) = \mathbb{E}[f(x + μϵ)]$ where $ϵ \sim \mathcal{N}(0,Σ)$

Key Derivation: The gradient of the surrogate function can be estimated through function evaluations: $\nabla f_μ(x) = \mathbb{E}\left[\frac{f(x+μϵ) - f(x)}{μ}Σ^{-1}ϵ\right]$

This provides a gradient estimate: $g = \frac{f(x+μϵ) - f(x)}{μ}Σ^{-1}ϵ$

3. Log-Sum-Exp Transformation

Theoretical Foundation of MPPI: Consider the continuous log-sum-exp transformation function: $f_{μ,λ}(x) = -λ \log\left(\mathbb{E}\left[\exp\left(-\frac{1}{λ}f(x+μϵ)\right)\right]\right)$

Its gradient is: $\nabla f_{μ,λ}(x) = \frac{-λ\mathbb{E}[\exp(-\frac{1}{λ}f(x+μϵ))Σ^{-1}ϵ]}{μ\mathbb{E}[\exp(-\frac{1}{λ}f(x+μϵ))]}$

This directly corresponds to the MPPI update rule: $x \leftarrow \sum_{k=1}^K w_k x_k$ where the weights are: $w_k = \frac{\exp(-\frac{1}{λ}(f(x_k) - ρ))}{\sum_j \exp(-\frac{1}{λ}(f(x_j) - ρ))}$

Technical Innovations

1. Establishment of Unified Perspective

Unifies seemingly different algorithms (MPPI, CMA, REINFORCE) under a Gaussian smoothing framework
Reveals log-sum-exp transformation as a generalization of Gaussian smoothing

2. Natural Gradient Interpretation

Proves that MPPI executes natural gradient steps: $x \leftarrow x - αF^{-1}g$ where F is the Fisher information matrix, equal to the inverse of the covariance matrix for Gaussian distributions

3. CMA Derivation

Rederives CMA from the perspective of optimizing Gaussian distribution parameters: $\min_{θ=(x,Σ)} \mathbb{E}_{z\sim\mathcal{N}(x,Σ)}[f(z)]$

Using natural gradient yields the update rules:

Σ ← (1-α∑wₖ)Σ + α∑wₖ(xₖ-x)(xₖ-x)ᵀ
x ← (1-α∑wₖ)x + α∑wₖxₖ

4. Theoretical Explanation of Global Convergence

Explains how stochasticity helps escape local minima through Langevin dynamics: $x_{k+1} = x_k - α_k g_k + γ_k ϵ_k$

Experimental Setup

Trajectory Optimization Experiments

Benchmarks: Four benchmark problems based on Hydra

Cartpole: Classic inverted pendulum control
DoubleCartPole: Double inverted pendulum system
PushT: Pushing task
Humanoid: Humanoid robot control

Comparison Algorithms:

Predictive Sampling
Randomized Smoothing
MPPI
MPPI-CMA (proposed)

Experimental Configuration:

2048 samples per iteration
MPPI temperature parameter λ = 0.1
Average over 6 random seeds
Control bounds enforced through penalty terms in cost function

Reinforcement Learning Experiments

Environments: 7 MuJoCo continuous control environments

Comparison Algorithms:

DDPG vs RS-DDPG vs LSE-DDPG
TD3 vs RS-TD3 vs LSE-TD3

Experimental Configuration:

Implementation based on CleanRL
10 samples per update
Sampling noise standard deviation 0.1
Average over 5 runs

Evaluation Metrics

TO: Cost reduction curves during optimization
RL: Normalized scores and episode rewards

Experimental Results

Trajectory Optimization Results

MPPI-CMA Performs Best: Consistently outperforms MPPI on all test problems
Predictive Sampling Unexpectedly Effective: Despite simplicity, shows good performance
Randomized Smoothing Sensitive: Highly sensitive to step size selection with high variance
Value of Covariance Adaptation: Demonstrates importance of adaptive covariance matrices

Reinforcement Learning Results

Significant DDPG Improvements: RS-DDPG and LSE-DDPG significantly outperform original DDPG
Limited TD3 Improvements: TD3 is already a strong algorithm with limited improvement space
Universal Smoothing Benefits: Demonstrates universal value of Q-function gradient smoothing

Key Findings

Log-Sum-Exp Advantages: Better handles multimodal functions compared to standard Gaussian smoothing
Temperature Parameter Importance: Appropriate temperature parameter λ is critical for performance
Parallelization Friendly: All methods parallelize well

Trajectory Optimization Domain

Classical Methods: Gradient descent, Newton's method and other deterministic methods prone to local minima
Sampling Methods: Predictive Sampling, MPPI and other zero-order methods
Theoretical Connections: 13 first demonstrates similarity between MPPI and CMA-ES, 14 understands MPPI as approximate gradient method

Reinforcement Learning Domain

Parameter Space Search: 16,17 explore stochastic search in policy parameter space
Policy Gradient Connections: 18,19 establish connections between policy gradients and stochastic search
Evolution Strategies: 20,21 provide comprehensive surveys connecting RL and ES techniques

Positioning of This Work

This paper provides the first broad perspective connecting gradient-free methods in both TO and RL, filling the gap of a unified theoretical framework.

Conclusions and Discussion

Main Conclusions

Unified Framework Effective: Stochastic search perspective successfully unifies multiple zero-order algorithms in TO and RL
Theory Guides Practice: Unified understanding enables design of new competitive algorithms
Value of Stochasticity: Theoretically explains the mechanism by which stochastic algorithms escape local minima
Practical Validation: Verifies framework and new algorithms' effectiveness on multiple robotic tasks

Limitations

Asymptotic Convergence: Global convergence guarantees are only asymptotic, with limited practical significance
Curse of Dimensionality: Sampling methods still suffer from the curse of dimensionality
Hyperparameter Sensitivity: Temperature parameter, step size, etc. require careful tuning
Constraint Handling: Current framework primarily addresses unconstrained optimization

Future Directions

Constrained Optimization: Extend to constrained zero-order optimization
Global Solution Search: Develop more effective global solution search methods
Adaptive Parameters: Automatically tune temperature, step size, and other hyperparameters
Theory Refinement: Provide stronger theoretical guarantees for stochastic smoothing

In-Depth Evaluation

Strengths

Outstanding Theoretical Contribution: Provides the first unified theoretical framework for zero-order optimization in robotics
Mathematical Rigor: Derivations are rigorous and theoretical analysis is thorough
Practical Guidance Value: Theoretical insights directly guide new algorithm design
Comprehensive Experiments: Covers multiple benchmarks in both TO and RL domains
Writing Clarity: Complex theory is expressed clearly and is easy to understand

Weaknesses

Limited Novelty: Primarily reinterprets existing algorithms; original algorithm contributions are relatively limited
Experimental Scale: RL experiments only tested on MuJoCo environments, lacking more complex robotic tasks
Theory Gap: Global convergence theory for stochastic smoothing is not as complete as SPSA
Practical Limitations: Some theoretical results (e.g., asymptotic convergence) have limited practical value

Impact

Academic Value: Provides important theoretical unification for the robotics optimization domain
Educational Significance: As a tutorial paper, has good educational value for students and researchers
Method Inspiration: Unified framework may inspire more new algorithm designs
Cross-Domain Connection: Promotes communication and collaboration between TO and RL communities

Applicable Scenarios

Non-smooth Optimization: Robotic control problems involving contact and collision
High-Dimensional Optimization: Neural network policy parameter optimization
Parallel Computing: Scenarios with abundant parallel computing resources
Exploratory Research: Complex optimization problems requiring escape from local minima

References

The paper cites 51 related references, primarily including:

Optimization Theory: 1 Conn et al. on derivative-free optimization, 12 Nesterov on stochastic smoothing
Robotics Applications: 2,3 recent sampling MPC applications, 4,5 RL successes in robotics
Classical Algorithms: 8 CMA-ES, 10 MPPI, 11 REINFORCE
Theoretical Foundations: 22 Spall's SPSA, 27 MCMC methods

Through a unified perspective of stochastic search, this paper successfully connects seemingly different optimization methods in robotics, providing not only important theoretical insights but also guiding new algorithm design. While somewhat limited in algorithmic originality, its theoretical unification value and educational significance make it an important contribution to the field.