2025-11-17T00:52:13.221997

On the random generation of Butcher trees

Huang, Privault

The main goal of this paper is to provide an algorithm for the random sampling of Butcher trees and the probabilistic numerical solution of ordinary differential equations (ODEs). This approach complements and simplifies a recent approach to the probabilistic representation of ODE solutions, by removing the need to generate random branching times. The random sampling of trees is compared to the finite order truncation of Butcher series in numerical experiments.

academic

On the random generation of Butcher trees

Basic Information

Paper ID: 2404.05969
Title: On the random generation of Butcher trees
Authors: Qiao Huang (Southeast University), Nicolas Privault (Nanyang Technological University)
Classification: math.CA (Classical Analysis), math.PR (Probability)
Publication Date: November 11, 2025 (arXiv v2)
Paper Link: https://arxiv.org/abs/2404.05969

Abstract

The primary objective of this paper is to provide an algorithm for random sampling of Butcher trees for probabilistic numerical solution of ordinary differential equations (ODEs). This method complements and simplifies recently proposed probabilistic representations of ODE solutions by eliminating the need to generate random branching times. The paper compares random tree sampling with finite-order truncation methods of Butcher series through numerical experiments.

Research Background and Motivation

Problem Background

Core Problem: Numerical solution of ordinary differential equations is a fundamental problem in scientific computing. Traditional methods use Butcher series (based on rooted tree enumeration and Taylor expansion) to represent ODE solutions, but generating high-order trees is computationally expensive.
Significance:
- Butcher series provide theoretical foundation for Runge-Kutta methods
- Widely applied in geometric numerical integration
- More efficient numerical methods are needed for complex nonlinear ODEs
Limitations of Existing Methods:
- Computational Complexity: Truncating Butcher series requires enumerating all n-th order trees, with computational cost growing exponentially with order
- Fixed Order Limitation: Traditional methods truncate at fixed order, making it difficult to adaptively adjust precision
- Complexity of Prior Probabilistic Methods: The method in reference 20 requires generating random branching time sequences
Research Motivation:
- Use Monte Carlo methods to estimate Butcher series through random tree generation
- Provide an alternative approach where accuracy improves with iteration count
- Inspired by applications of Feynman-Kac formula in nonlinear PDEs
- Simplify prior probabilistic representations by removing random branching time generation

Core Contributions

Direct Random Tree Generation Algorithm: Proposes a method for random generation of Butcher trees based on uniform attachment, without requiring random branching times, simpler and more direct than reference 20
Probabilistic Representation Theorem: Establishes a probabilistic representation formula for ODE solutions (Theorem 1): $x(t) = \mathbb{E}\left[\frac{(t-t_0)^{|T|}F(T)(x_0)}{(|T| \vee 1)p_{|T|}}\right]$ where T is a randomly generated Butcher tree
Extension to Semilinear ODEs: Extends the method to semilinear ODEs $\dot{x}(t) = Ax(t) + f(x(t))$ , combining Poisson-distributed tree sizes and continuous-time Markov chains (Theorem 2)
Numerical Implementation and Comparison: Provides complete Mathematica code implementation and verifies method effectiveness through numerical experiments, comparing performance of different probability distributions
Theoretical Analysis:
- Proves combinatorial properties of labeled trees (Lemma 3)
- Derives optimal probability distributions (variance minimization)
- Establishes convergence conditions and moment bounds

Detailed Methodology

Task Definition

Input:

Initial value ODE problem: $\dot{x}(t) = f(x(t))$ , $x(t_0) = x_0 \in \mathbb{R}^d$
Target time point $t > t_0$
Smooth function $f: \mathbb{R}^d \to \mathbb{R}^d$

Output:

Approximate value of solution $x(t)$ at time $t$

Constraints:

Bounded derivatives of $f$ : $|\nabla^m f(x_0)| \leq C$ for all $m \geq 0$
Time interval restriction: $t \in [t_0, t_0 + 1/C)$

Fundamental Theory of Butcher Trees

Tree Definition and Representation

A rooted tree $\tau = (V, E, \bullet)$ consists of vertex set V, edge set E, and root node $\bullet$ . Defined recursively using B+ operation:

$[\tau_1, \ldots, \tau_m]$ denotes creating a new root and connecting to the roots of $\tau_1, \ldots, \tau_m$

Key Functions:

Elementary Differential $F: \mathcal{T} \to C^\infty(\mathbb{R}^d, \mathbb{R}^d)$ $F : T \to C^{\infty} (R^{d}, R^{d})$ :
- $F(\emptyset) = \text{Id}$ , $F(\bullet) = f$
- $F(\tau) = \nabla^m f(F(\tau_1), \ldots, F(\tau_m))$ for $\tau = [\tau_1, \ldots, \tau_m]$
Symmetry $\sigma(\tau)$ $σ (τ)$ :
- $\sigma([\tau_1^{k_1}, \ldots, \tau_n^{k_n}]) = \prod_{i=1}^n k_i! \sigma(\tau_i)^{k_i}$
Tree Factorial $\tau!$ $τ!$ :
- $\tau! = |\tau| \prod_{i=1}^m \tau_i!$ for $\tau = [\tau_1, \ldots, \tau_m]$

Butcher Series Representation

Classical Butcher series expansion of ODE solution: $x(t) = \sum_{\tau \in \mathcal{T}} \frac{(t-t_0)^{|\tau|}}{\tau! \sigma(\tau)} F(\tau)(x_0)$

The coefficient $\alpha(\tau) = \frac{|\tau|!}{\tau! \sigma(\tau)}$ represents the number of labelings of tree $\tau$ .

Labeled Trees and Combinatorial Structure

Labeled Tree Definition: A tree $\tau = (V, E, 1)$ with vertices labeled by $\{1, \ldots, n\}$ such that parent node labels are smaller than child node labels. Denoted as $\mathcal{T}_n^\sharp$ .

Key Lemma (Lemma 3): Any labeled tree $\tau \in \mathcal{T}_n^\sharp$ can be uniquely represented as: $\tau = \bullet *_{l_1} \bullet *_{l_2} \cdots *_{l_{n-1}} \bullet$ where $(l_1, \ldots, l_{n-1}) \in \triangle_{n-1} := \{(l_1, \ldots, l_{n-1}): 1 \leq l_i \leq i\}$

Grafting Product: $\tau_1 *_l \tau_2$ denotes attaching the root of $\tau_2$ to the vertex labeled $l$ in $\tau_1$ .

Corollary 1: The number of n-th order labeled trees is $|\mathcal{T}_n^\sharp| = (n-1)!$

Random Tree Generation Algorithm (Algorithm 6)

Steps:

Choose Tree Size: Sample tree order $n$ from probability distribution $(p_n)_{n \geq 0}$ , i.e., $P(|T| = n) = p_n$
Initialization: Start from root node $\bullet$ (label 1)
Iterative Attachment: For $l = 1, \ldots, n-1$ $l = 1, \dots, n - 1$ :
- Uniformly randomly select a vertex from current tree
- Attach new vertex (label $l+1$ ) to selected vertex
- Repeat until reaching order $n$

Conditional Distribution: Given $|T| = n$ , random tree $T$ is uniformly distributed on $\mathcal{T}_n^\sharp$ : $q_n^\sharp(\tau) := P(T = \tau | |T| = n) = \frac{1}{(n-1)!}$

Conditional distribution after forgetting labels: $q_n(\tau) = P(\iota(T) = \tau | |T| = n) = \frac{\alpha(\tau)}{(n-1)!}$

Probabilistic Representation Theorem

Theorem 1 (Main Result): Assume $|\nabla^m f(x_0)| \leq C$ for all $m \geq 0$ . Then for $t \in [t_0, t_0 + 1/C)$ : $x(t) = \mathbb{E}\left[\frac{(t-t_0)^{|T|}F(T)(x_0)}{(|T| \vee 1)p_{|T|}}\right]$

Proof Sketch:

Utilize uniform distribution property of labeled trees
Apply law of total expectation: $\mathbb{E}[\cdot] = \sum_{n=0}^\infty p_n \sum_{\tau \in \mathcal{T}_n^\sharp} q_n^\sharp(\tau) F(\tau)(x_0)$
From $q_n^\sharp(\tau) = 1/(n-1)!$ and $\alpha(\tau) = |\tau|!/(\tau! \sigma(\tau))$ , obtain Butcher series
Integrability guaranteed by moment bounds: $\mathbb{E}\left[\left|\frac{(t-t_0)^{|T|}F(T)(x_0)}{(|T| \vee 1)p_{|T|}}\right|^q\right] \leq \frac{|x_0|^q}{p_0^{q-1}} + \sum_{n=1}^\infty \frac{(C(t-t_0))^{nq}}{n^q p_n^{q-1}}$

Extension to Semilinear ODEs (Theorem 2)

For semilinear ODE: $\dot{x}(t) = Ax(t) + g(x(t))$ , where $A$ is a linear operator:

Method:

Use Poisson distribution for tree size: $p_n = e^{-(t-t_0)}(t-t_0)^n/n!$
Introduce independent continuous-time Markov chain $(X_t)_{t \geq t_0}$ with generator $A$
Utilize exponential Butcher series representation

Probabilistic Representation: $x_i(t) = e^{t-t_0} \mathbb{E}\left[((|T_t|-1) \vee 0)! (F_g(T_t)(x_0))_{X_{t-T_{|T_t|}}} \mathbf{1}_{\{T_{|T_t|} \leq t\}} \mid X_{t_0} = i\right]$

where $T_t$ is a Poisson-sized random tree and $F_g$ is the elementary differential of $g$ .

Technical Innovations

Elimination of Branching Times: Unlike reference 20, no need to generate random time sequences $(T_i)_{i \geq 1}$ ; directly construct trees via uniform attachment
Combinatorial Equivalence: Utilize bijection between labeled trees and sequences $(l_1, \ldots, l_{n-1}) \in \triangle_{n-1}$ (Lemma 3) to establish concise probabilistic construction
Flexible Distribution Choice: Framework allows arbitrary probability distribution $(p_n)_{n \geq 0}$ , can be chosen based on variance optimization
Semilinear Structure Exploitation: Handle linear part via Markov chain and nonlinear part via random trees, achieving structural decomposition
Theoretical Guarantees: Provide explicit convergence conditions and moment bounds, ensuring feasibility of Monte Carlo estimation

Experimental Setup

Test Equations

Example 1 (Equation 27): Exponential ODE $\dot{x}(t) = e^{x(t)}, \quad x(0) = x_0$ Analytical solution: $x(t) = -\log(e^{-x_0} - t)$ , domain $t \in [0, e^{-x_0})$

Example 2 (Equation 28): Semilinear ODE $\dot{x}(t) = tx(t) + x^2(t), \quad x(0) = 1/2$ Analytical solution: $x(t) = \frac{e^{t^2/2}}{2 - \int_0^t e^{s^2/2}ds}$

Experimental Parameters

Truncated Butcher Series:

Order range: $n = 1, \ldots, 8$
Implemented via command B[f,t,x0,t0,n]

Monte Carlo Method:

Geometric Distribution:
- Parameters $p = 0.5$ or $p = 0.75$
- Sample sizes: 70,000 (Equation 27), 10,000 (Equation 28)
Poisson Distribution:
- Parameter $\lambda = t - t_0$
- Sample size: 100,000
Optimal Distribution: $p_0 = c_0 x_0$ , $p_n = c_0(Ct)^n/n$ ( $n \geq 1$ )

Evaluation Metrics

Computational Time: Compare time required by different methods to achieve similar accuracy
Numerical Error: Absolute error relative to analytical solution
Variance Analysis: Compare second moment bounds of different probability distributions: $\frac{x_0^2}{p_0} + \sum_{n=1}^\infty \frac{(C(t-t_0))^{2n}}{n^2 p_n}$

Implementation Details

Mathematica Code:

One-dimensional ODE: MCsample[f_, t_, x0_, dist_]
Multi-dimensional ODE: Complete implementation in Section 7
Open source: https://github.com/nprivaul/mc-odes/blob/main/mc-odes.nb

Tree Generation Process:

Store trees using graph structures
Vertex labels store derivative information
Random selection: RandomVariate[DiscreteUniformDistribution[{1, l}]]

Experimental Results

Computational Time Comparison (Table 1)

Order $n$	1	2	3	4	5	6	7	8	MC (Geometric)
Equation 27 (d=1)	0s	0s	0.1s	0.1s	0.4s	0.5s	3s	21s	22s (70K)
Equation 28 (d=2)	0s	0s	0s	0.2s	1s	13s	222s	>1h	164s (10K)

Observations:

Butcher series computation time grows exponentially with order
Monte Carlo method time remains relatively stable
For Equation 28, 8th order truncation exceeds 1 hour, while MC method takes 164 seconds

Main Numerical Results (Figure 2)

Equation 27 ( $x_0 = 1$ , $t \in [0, 0.35]$ ):

B-8 series: Highly consistent with exact solution
B-6 series: Deviation appears for $t > 0.25$
MC method (geometric distribution, 70K samples): Good agreement with exact solution, small variance

Equation 28 ( $x_0 = 1/2$ , $t \in [0, 1]$ ):

B-7 series: High accuracy
B-5 series: Significant deviation for $t > 0.6$
MC method (geometric distribution, 10K samples): Tracks exact solution, slightly larger variance

Key Findings:

MC method achieves accuracy comparable to high-order truncation within similar computational time
MC method avoids combinatorial explosion from tree enumeration
Sample size can be flexibly adjusted based on accuracy requirements

Probability Distribution Comparison (Figures 3-4)

Second Moment Bound Analysis (Figure 3):

Optimal Distribution $p_n = c_0(Ct)^n/n$ : Provides minimum variance bound for all $C$ values
Geometric Distribution ( $p=0.5$ ): Variance bound approximately 2-3 times optimal distribution
Geometric Distribution ( $p=0.75$ ): Even higher variance bound

Numerical Performance (Figure 4):

Poisson Distribution (100K samples):
- Significant fluctuations, large variance
- Error increases for $t > 0.2$
- Theoretically unbounded variance (series diverges)
Geometric Distribution (70K samples):
- Stable tracking of exact solution
- Bounded and small variance
- Excellent performance on $t \in [0, 0.35]$

Conclusion: Geometric distribution performs best in practice, balancing variance and computational efficiency

Tree Generation Examples (Figure 1)

Demonstrates systematic generation process for 3rd and 4th order trees:

3rd order trees: 2 different structures
4th order trees: 3 main structures
Each vertex annotated with corresponding derivative order

Butcher Series Theory

Classical Literature:
- Butcher (1963, 2016, 2021) 1,2,3: Established B-series algebraic analysis framework
- Hairer et al. (2006) 11: Standard reference for geometric numerical integration
- Deuflhard & Bornemann (2002) 10: ODE methods in scientific computing
Computational Implementation:
- Ketcheson & Ranocha (2022) 16: Complete B-series implementation in Julia

Probabilistic Methods

Branching Processes:
- Skorokhod (1964) 22: Branching diffusion processes
- Vatutin (1993) 23,24: Branching processes and random tree theory
- Ikeda et al. (1968-1969) 15: Branching Markov processes
Probabilistic Representation of PDEs:
- McKean (1975) 19: Brownian motion in KPP equation
- Le Jan & Sznitman (1997) 17: Random cascades and Navier-Stokes equation
- Dalang et al. (2008) 6: Feynman-Kac type formula for wave equation
- Henry-Labordère et al. (2019) 13: Branching diffusion representation of semilinear PDEs
Probabilistic Methods for ODEs:
- Penent & Privault (2022) 21: Predecessor work simplified in this paper, requiring random branching times
- Nguwi et al. (2023) 20: Fully nonlinear Feynman-Kac formula for arbitrary order derivatives

Exponential Integrators

Exponential Butcher Series:
- Hochbruck & Ostermann (2010) 14: Survey of exponential integrators
- Luan & Ostermann (2013) 18: Exponential B-series for stiff cases

Positioning of This Work

vs. 21: Removes random branching times, simpler and more direct algorithm
vs. 20: Focuses on ODEs rather than PDEs, more efficient implementation
vs. 6,13: Extends to nonlinear ODEs, uses general trees rather than linear chains
vs. Classical Methods: Provides Monte Carlo alternative, avoids combinatorial explosion

Conclusions and Discussion

Main Conclusions

Theoretical Contributions:
- Establishes new probabilistic representation of ODE solutions (Theorem 1) based on random Butcher trees
- Proves equivalence between labeled trees and uniform attachment process (Lemma 3)
- Extends to semilinear ODEs, combining Poisson processes and Markov chains (Theorem 2)
Algorithm Advantages:
- No need to generate random branching times, simpler implementation
- Avoids explicit enumeration of high-order trees, alleviates combinatorial explosion
- Accuracy can be flexibly improved by increasing sample size
Numerical Verification:
- On test equations, MC method achieves accuracy comparable to high-order Butcher series
- Computational time significantly better than series truncation for high orders
- Geometric distribution performs best in practice

Limitations

Convergence Speed:
- Monte Carlo method convergence rate is $O(1/\sqrt{N})$ , slower than deterministic high-order methods
- For low-dimensional smooth problems, Runge-Kutta methods remain more efficient
- Paper explicitly states: "Monte Carlo estimators cannot compete with classical Runge-Kutta schemes"
Applicability Range Restrictions:
- Requires bounded derivative condition: $|\nabla^m f(x_0)| \leq C$
- Time interval limited: $t \in [t_0, t_0 + 1/C)$
- For stiff problems or long-time integration, conditions may be too restrictive
Variance Issues:
- Poisson distribution theoretically has unbounded variance
- Careful selection of probability distribution needed to control variance
- Optimal distribution $p_n = c_0(Ct)^n/n$ depends on unknown constant $C$
High-Dimensional Challenges:
- Multi-dimensional ODE code implementation more complex (see Section 7)
- Dimension-dependent complexity in tree labeling and derivative computation
- Numerical experiments limited to 1-2 dimensions
Experimental Limitations:
- Only two simple equations tested
- Lacks direct comparison with other probabilistic methods (e.g., 20)
- Adaptive sampling strategies not explored

Future Directions

Method Improvements:
- Develop adaptive importance sampling strategies
- Study variance reduction techniques (e.g., control variates)
- Explore parallel implementations
Theoretical Extensions:
- Relax bounded derivative conditions
- Extend to stochastic differential equations (SDEs)
- Study adaptive time-stepping strategies
Application Areas:
- Extend to partial differential equations (PDEs)
- Apply to high-dimensional problems (avoid curse of dimensionality)
- Combine with deep learning methods

In-Depth Evaluation

Strengths

Theoretical Innovation (★★★★☆):
- Core Innovation: Establishes direct connection between uniform distribution of labeled trees and Butcher series coefficients; Lemma 3's bijection simplifies probabilistic construction
- Mathematical Rigor: Provides complete convergence proofs and moment bound estimates
- Structural Insight: Decomposition for semilinear ODEs (linear part→Markov chain, nonlinear part→random trees) demonstrates deep structural understanding
Algorithm Simplicity (★★★★★):
- Simple Implementation: Significantly simplified algorithm flow compared to references 20,21
- Readable Code: Clear Mathematica implementation, easy to understand and reproduce
- Open Source: GitHub code repository provided, promotes research reproducibility
Mathematical Elegance (★★★★★):
- Introduction of grafting product unifies tree operations
- Probabilistic representation formula (18) is concise and beautiful
- Deep fusion of combinatorics and probability
Experimental Design (★★★☆☆):
- Compares multiple probability distributions (Poisson, geometric, optimal)
- Provides quantitative analysis of computational time and accuracy
- Variance analysis supported by theory

Weaknesses

Limited Practical Utility (★★☆☆☆):
- Efficiency Issue: Paper acknowledges "cannot compete with classical Runge-Kutta schemes"
- Narrow Applicability: Only advantageous in special cases where tree enumeration is unavoidable
- Parameter Dependence: Optimal distribution requires knowing constant $C$ , difficult to determine in practice
Insufficient Experiments (★★☆☆☆):
- Few Test Cases: Only two simple equations, lacking complex system tests
- Dimension Limitation: Only 1-2 dimensions tested, high-dimensional performance unknown
- Missing Comparisons: No direct comparison with other probabilistic methods (e.g., 20)
- Shallow Error Analysis: Lacks detailed convergence rate analysis
Theoretical Limitations (★★★☆☆):
- Short Time Interval: $t < t_0 + 1/C$ restricts long-time integration
- High Smoothness Requirement: Requires all-order derivatives bounded
- Coarse Variance Bound: Moment bound (20) may not be tight
Writing Issues (★★★☆☆):
- Lacks clear guidance on "when to use this method"
- Insufficient comparison of advantages/disadvantages with existing methods
- Some technical details (e.g., multi-dimensional implementation) relegated to appendix, affecting readability

Impact Assessment

Theoretical Contribution (★★★★☆):
- Provides new probabilistic perspective on Butcher series
- Connects combinatorics, probability theory, and numerical analysis
- May inspire probabilistic reformulation of other numerical methods
Practical Value (★★☆☆☆):
- Currently mainly theoretical exploration
- Limited practical application scenarios
- May be useful in specific problems (e.g., uncertainty quantification)
Reproducibility (★★★★★):
- Complete open source code
- Clear algorithm description
- Numerical results verifiable
Academic Impact:
- Citation Potential: Moderate. Novel method but limited application scope
- Follow-up Research: May inspire work on variance reduction, adaptive sampling
- Interdisciplinary Value: Connects probability, combinatorics, numerical analysis with some interdisciplinary significance

Applicable Scenarios

Recommended Use:

High-Order Tree Enumeration Difficult: When very high-order Butcher series needed and tree enumeration infeasible
Uncertainty Quantification: Need to simultaneously estimate solution and its uncertainty
Educational Demonstration: As probabilistic interpretation tool for Butcher series
Theoretical Research: Exploring probabilistic foundations of numerical methods

Not Recommended For:

Routine ODE Solving: Classical Runge-Kutta methods more efficient
Real-Time Computation: Monte Carlo variance causes unstable results
Stiff Problems: Time-step restriction $t < t_0 + 1/C$ too severe
High Accuracy Requirements: Convergence rate $O(1/\sqrt{N})$ too slow

Comprehensive Scoring

Innovation: 8/10 (Novel probabilistic perspective, simplifies prior methods)
Rigor: 9/10 (Complete mathematical proofs, solid theoretical foundation)
Practicality: 4/10 (Limited practical value at current stage)
Experimentation: 5/10 (Reasonable experimental design but limited scope)
Impact: 6/10 (Significant theoretical contribution, limited practical application)

Overall: This is a theoretically elegant and mathematically rigorous paper that provides a novel probabilistic interpretation of Butcher series. The algorithm's simplicity and theoretical completeness are its main strengths. However, practical value is limited by inherent defects of Monte Carlo methods (slow convergence, large variance) and strict applicability conditions. The paper is better suited as theoretical exploration and methodological contribution rather than a replacement for practical solvers. If effective variance reduction techniques and adaptive strategies can be developed in the future, the method's practical utility could improve significantly.

Selected References

Butcher, J.C. (2021). B-Series: Algebraic Analysis of Numerical Methods. Springer. Authoritative monograph on Butcher series
Hairer, E., Lubich, C., & Wanner, G. (2006). Geometric numerical integration. Springer. Classic textbook on geometric numerical integration
Penent, G., & Privault, N. (2022). Numerical evaluation of ODE solutions by Monte Carlo enumeration of Butcher series. BIT Numerical Mathematics, 62:1921-1944. Predecessor work simplified in this paper
Henry-Labordère, P., et al. (2019). Branching diffusion representation of semilinear PDEs and Monte Carlo approximation. Ann. Inst. H. Poincaré Probab. Statist., 55(1):184-210. Branching diffusion representation for PDEs
Ketcheson, D.I., & Ranocha, H. (2022). Computing with B-series. ACM Transactions on Mathematical Software. Julia implementation of B-series