An Introduction to Zero-Order Optimization Techniques for Robotics
Jordana, Zhang, Amigo et al.
Zero-order optimization techniques are becoming increasingly popular in robotics due to their ability to handle non-differentiable functions and escape local minima. These advantages make them particularly useful for trajectory optimization and policy optimization. In this work, we propose a mathematical tutorial on random search. It offers a simple and unifying perspective for understanding a wide range of algorithms commonly used in robotics. Leveraging this viewpoint, we classify many trajectory optimization methods under a common framework and derive novel competitive RL algorithms.
academic
An Introduction to Zero-Order Optimization Techniques for Robotics
Zero-order optimization techniques are gaining increasing popularity in robotics because they can handle non-differentiable functions and escape local minima. These advantages make them particularly useful in trajectory optimization and policy optimization. This paper presents a mathematical tutorial on stochastic search, providing a simple unified perspective for understanding widely-used algorithms in robotics. Leveraging this perspective, the authors classify many trajectory optimization methods under a unified framework and derive novel and competitive reinforcement learning algorithms.
The core problem addressed in this paper is how to achieve a unified understanding of zero-order optimization algorithms widely used in robotics, including various methods in trajectory optimization (TO) and reinforcement learning (RL).
Practical Necessity: Robot systems frequently encounter non-differentiable objective functions, particularly in contact-rich problems (e.g., locomotion, manipulation)
Computational Advances: Development of parallel computing and GPU hardware has made sampling-intensive zero-order methods feasible on complex robotic systems
Theoretical Fragmentation: While existing algorithms have strong theoretical foundations, they lack unified understanding in the robotics community
By establishing a unified perspective through stochastic search and Gaussian smoothing, connecting zero-order methods in both trajectory optimization and policy optimization, the work aims to deepen theoretical understanding while guiding new algorithm design.
Core Idea: Rather than directly approximating the gradient of the original function f, study a smoothed surrogate function:
fμ(x)=E[f(x+μϵ)]
where ϵ∼N(0,Σ)
Key Derivation: The gradient of the surrogate function can be estimated through function evaluations:
∇fμ(x)=E[μf(x+μϵ)−f(x)Σ−1ϵ]
This provides a gradient estimate:
g=μf(x+μϵ)−f(x)Σ−1ϵ
Proves that MPPI executes natural gradient steps:
x←x−αF−1g
where F is the Fisher information matrix, equal to the inverse of the covariance matrix for Gaussian distributions
This paper provides the first broad perspective connecting gradient-free methods in both TO and RL, filling the gap of a unified theoretical framework.
Through a unified perspective of stochastic search, this paper successfully connects seemingly different optimization methods in robotics, providing not only important theoretical insights but also guiding new algorithm design. While somewhat limited in algorithmic originality, its theoretical unification value and educational significance make it an important contribution to the field.