2025-11-20T09:19:22.153634

Jet Functors and Weil Algebras in Automatic Differentiation: A Geometric Analysis

Sangha
We present a geometric formulation of automatic differentiation (AD) using jet bundles and Weil algebras. Reverse-mode AD emerges as cotangent-pullback, while Taylor-mode corresponds to evaluation in a Weil algebra. From these principles, we derive concise statements on correctness, stability, and complexity: a functorial identity for reverse-mode, algebraic exactness of higher-order derivatives, and explicit bounds on truncation error. We further show that tensorized Weil algebras permit one-pass computation of all mixed derivatives with cost linear in the algebra dimension, avoiding the combinatorial blow-up of nested JVP/VJP schedules. This framework interprets AD theory through the lens of differential geometry and offers a foundation for developing structure-preserving differentiation methods in deep learning and scientific computing. Code and examples are available at https://git.nilu.no/geometric-ad/jet-weil-ad.
academic

Jet Functors and Weil Algebras in Automatic Differentiation: A Geometric Analysis

Basic Information

  • Paper ID: 2510.14342
  • Title: Jet Functors and Weil Algebras in Automatic Differentiation: A Geometric Analysis
  • Author: Amandip Sangha (The Climate and Environmental Research Institute NILU, Norway)
  • Classification: cs.LG math.DG stat.ML
  • Publication Date: October 16, 2025
  • Paper Link: https://arxiv.org/abs/2510.14342

Abstract

This paper proposes a geometric formulation of automatic differentiation (AD) based on jet bundles and Weil algebras. Reverse-mode AD is characterized as cotangent-pullback, while Taylor-mode AD corresponds to evaluation in Weil algebras. Based on these principles, the author derives concise statements regarding correctness, stability, and complexity: functor identities for reverse-mode, algebraic exactness for higher-order derivatives, and explicit bounds on truncation errors. The author further demonstrates that tensorized Weil algebras enable computing all mixed derivatives in a single pass at a cost linear in the algebra dimension, avoiding the combinatorial explosion of nested JVP/VJP scheduling. This framework interprets AD theory through the lens of differential geometry, providing a foundation for developing structure-preserving differentiation methods in deep learning and scientific computing.

Research Background and Motivation

Core Problems

Automatic Differentiation (AD) is a fundamental technique in modern machine learning and scientific computing, yet existing AD theory lacks a unified geometric theoretical framework, leading to:

  1. Theoretical Fragmentation: The theoretical foundations of reverse-mode AD (backpropagation) and higher-order AD are scattered across different mathematical frameworks
  2. Complexity Explosion: Computing higher-order mixed derivatives faces combinatorial complexity explosion
  3. Lack of Invariance: Existing methods lack coordinate-independent geometric interpretations, affecting stability analysis

Research Significance

This research is significant for:

  • Theoretical Unification: Providing a unified differential geometric foundation for AD
  • Computational Efficiency: Addressing efficiency issues in higher-order derivative computation
  • Application Prospects: Providing theoretical support for geometry-aware methods in deep learning

Limitations of Existing Methods

  1. Traditional AD Methods: Rely on coordinate representations, lacking geometric invariance
  2. Higher-Order Derivative Computation: Nested JVP/VJP methods suffer from exponential complexity
  3. Stability Analysis: Lacking systematic error propagation theory

Core Contributions

  1. Established geometric theory of backpropagation: Proved that reverse-mode AD is equivalent to cotangent-pullback operations, providing coordinate-independent formulation
  2. Proposed Weil algebra framework: Formulated Taylor-mode AD as exact evaluation in Weil algebras, guaranteeing algebraic exactness
  3. Developed tensorized Weil algebra method: Enabling single-pass computation of all mixed derivatives with complexity linear in algebra dimension
  4. Provided complete theoretical analysis: Including correctness proofs, stability bounds, and complexity analysis

Methodology Details

Problem Definition

Given a smooth map f:MNf: M \to N (where M,NM, N are smooth manifolds) and a scalar function :NR\ell: N \to \mathbb{R}, the objectives are:

  1. Computing the gradient of the composite function f\ell \circ f
  2. Computing higher-order derivatives of ff
  3. Implementing the above computations in a geometrically invariant manner

Core Theoretical Framework

1. Geometric Formulation of Reverse-Mode AD

Theorem 1 (Backpropagation as Cotangent-Pullback): For smooth maps f:MNf: M \to N and :NR\ell: N \to \mathbb{R}: x(f)=(dfx)(df(x))\nabla_x(\ell \circ f) = (df_x)^*(d\ell_{f(x)})

Equivalently, at the jet level: (j1f)(j1)=j1(f)(j^1f)^*(j^1\ell) = j^1(\ell \circ f)

This theorem reformulates backpropagation as a pullback operation on cotangent spaces, with the following geometric significance:

  • Coordinate Independence: Results do not depend on specific coordinate system choice
  • Functoriality: Satisfies (d(gf)x)=(dfx)(dgf(x))(d(g \circ f)_x)^* = (df_x)^* \circ (dg_{f(x)})^*
  • Naturality: Compatible with smooth reparameterization

2. Taylor-Mode in Weil Algebras

Theorem 2 (Exactness of Weil-Mode Evaluation): Let WW be a Weil algebra satisfying mk+1=0m^{k+1} = 0. Then the lifting map TWf:TWUTWRmT_W f: T_W U \to T_W \mathbb{R}^m exactly computes all kk-th order derivatives of ff at xx as coefficients of the truncated Taylor expansion.

Construction of Weil algebras:

  • Form: W=R[ε]/(εk+1)W = \mathbb{R}[\varepsilon]/(\varepsilon^{k+1}) or tensor product form
  • Nilpotency εk+1=0\varepsilon^{k+1} = 0 automatically implements truncation
  • Algebraic operations directly correspond to derivative propagation rules

3. Tensorized Weil Algebras

Theorem 3 (Complexity of Tensorized Weil Algebras): Consider tensorized Weil algebra: Wj=1pR[εj]/(εjρj+1),dimW=j=1p(ρj+1)W \cong \bigotimes_{j=1}^p \mathbb{R}[\varepsilon_j]/(\varepsilon_j^{\rho_j+1}), \quad \dim W = \prod_{j=1}^p (\rho_j + 1)

Single evaluation of ff at the WW-point xW:=x+j=1pεjv(j)x_W := x + \sum_{j=1}^p \varepsilon_j v^{(j)} yields all mixed directional derivatives, with time complexity O(dimWQ)O(\dim W \cdot Q), where QQ is the number of scalar operations in the original program.

Technical Innovations

  1. Geometric Unification: First unification of all AD modes under a differential geometric framework
  2. Algebraic Exactness: Achieving algebraic exactness of truncation through nilpotency, avoiding numerical errors
  3. Linear Complexity: Tensorized method avoids combinatorial explosion of traditional nested methods
  4. No Reverse Tape: Weil-mode only requires storing coefficient arrays, eliminating computational graph storage

Experimental Setup

Theoretical Verification

The author primarily validates the method's effectiveness through theoretical analysis, including:

  1. Correctness Verification: Through functorial properties
  2. Stability Analysis: Providing explicit error bounds
  3. Complexity Analysis: Theoretical comparison with traditional methods

Stability Analysis

Lemma 1 (Backward Stability of Reverse Sweep): For a straight-line program with primitives {ϕi}i=1L\{\phi_i\}_{i=1}^L, if each adjoint ϕi\phi_i^* satisfies: ϕi(v)Liv,ϕ^i(v)ϕi(v)δiϕi(v)\|\phi_i^*(v)\| \leq L_i\|v\|, \quad \|\hat{\phi}_i^*(v) - \phi_i^*(v)\| \leq \delta_i\|\phi_i^*(v)\|

then the computed pullback satisfies: f^(yˉ)(i=1L(1+δi)Li)yˉ\|\hat{f}^*(\bar{y})\| \leq \left(\prod_{i=1}^L (1+\delta_i)L_i\right)\|\bar{y}\|

Complexity Comparison

MethodTime ComplexitySpace ComplexityTape Required
Nested JVP/VJPO((p+kk)Q)O(\binom{p+k}{k} \cdot Q)O(L)O(L) (tape)Yes
Tensorized WeilO(j=1p(ρj+1)Q)O(\prod_{j=1}^p(\rho_j+1) \cdot Q)O(dimW)O(\dim W)No

Experimental Results

Theoretical Results Verification

Coefficient Growth Envelope

Corollary 1: Assume fCk+1(Br(x),Rm)f \in C^{k+1}(B_r(x), \mathbb{R}^m) and its derivatives satisfy Df(z)M\|D^\ell f(z)\| \leq M_\ell. Then Taylor coefficients satisfy: fα(x)Mαα!\|f_\alpha(x)\| \leq \frac{M_{|\alpha|}}{\alpha!}

Truncation Stability

For step size ρ<r\rho < r, the remainder satisfies the standard Cauchy estimate: Rk+1(z)Mk+1(k+1)!ρk+1\|R_{k+1}(z)\| \leq \frac{M_{k+1}}{(k+1)!}\rho^{k+1}

Practical Performance Analysis

While the paper primarily focuses on theoretical analysis, it provides key performance insights:

  1. Memory Efficiency: Weil-mode avoids reverse tape storage
  2. Parallelization-Friendly: Coefficient operations naturally support vectorization
  3. Numerical Stability: Truncation errors can be explicitly controlled

Main Research Directions

  1. Categorical Perspective on AD: Elliott (2018), Fong et al. (2019) proposed functor formulations of AD
  2. Geometric AD Theory: Betancourt (2018) explored jet geometry applications in AD
  3. Higher-Order AD Algorithms: Giles (2008), Fike and Alonso (2012) analyzed numerical stability

Advantages of This Work

  1. Theoretical Completeness: First to provide a complete geometric theoretical framework for AD
  2. Practicality: Tensorized Weil algebra method has practical application value
  3. Unification: Unifies reverse, forward, and higher-order AD under a single framework

Conclusions and Discussion

Main Conclusions

  1. Geometric Unification: All AD modes can be uniformly understood within a differential geometric framework
  2. Computational Advantages: Tensorized Weil algebras provide efficient methods for higher-order derivative computation
  3. Theoretical Completeness: Provides complete theoretical analysis of correctness, stability, and complexity

Limitations

  1. Implementation Complexity: Practical implementation of Weil algebras requires carefully designed data structures
  2. Scope of Applicability: Primarily applicable to scenarios requiring dense mixed derivatives
  3. Numerical Precision: Higher-order computations may face numerical precision challenges

Future Directions

  1. Intrinsic AD on Manifolds: Extension to general Riemannian manifolds
  2. PDE-Constrained Optimization: Application to variational and PDE-constrained problems
  3. Higher-Order Tensor Compression: Developing compression techniques for coefficient arrays
  4. Systematic Primitive Lifting: Systematically lifting linear algebra and special functions to Weil algebras

In-Depth Evaluation

Strengths

  1. Strong Theoretical Innovation: First complete geometric theoretical framework for AD
  2. Mathematical Rigor: All theorems have complete mathematical proofs
  3. High Practical Value: Tensorized Weil algebra method addresses real computational problems
  4. Clear Exposition: Complex mathematical concepts are explained relatively clearly

Weaknesses

  1. Lack of Experimental Validation: Primarily theoretical work, lacking actual algorithm implementation and performance testing
  2. Limited Application Scenarios: Primarily applicable to specific scenarios requiring higher-order derivatives
  3. Insufficient Implementation Details: Limited guidance for practical system implementation

Impact

  1. Academic Value: Provides new mathematical foundations for AD theory
  2. Application Potential: Important application prospects in scientific computing and geometric deep learning
  3. Inspirational Value: Provides new perspectives for related research areas

Applicable Scenarios

  1. Scientific Computing: Physical simulations requiring high-precision higher-order derivatives
  2. Optimization Algorithms: Efficient implementation of second-order optimization methods
  3. Geometric Deep Learning: Neural network training on manifolds
  4. Meta-Learning: Adaptive algorithms requiring higher-order gradients

References

The paper cites 18 important references, primarily including:

  • Elliott (2018): Functional formulation of AD
  • Fong et al. (2019): Categorical perspective on backpropagation
  • Betancourt (2018): Geometric theory of higher-order AD
  • Baydin et al. (2018): AD survey
  • Kolář et al. (1993): Natural operations in differential geometry

Overall Assessment: This is a high-quality theoretical paper that provides a novel geometric theoretical framework for automatic differentiation. While lacking experimental validation, its theoretical contributions are significant and provide important mathematical foundations for related field development. The primary value of this work lies in theoretical unification and methodological innovation, making it important for advancing AD theory.