2025-11-20T09:19:22.153634

Jet Functors and Weil Algebras in Automatic Differentiation: A Geometric Analysis

Sangha

We present a geometric formulation of automatic differentiation (AD) using jet bundles and Weil algebras. Reverse-mode AD emerges as cotangent-pullback, while Taylor-mode corresponds to evaluation in a Weil algebra. From these principles, we derive concise statements on correctness, stability, and complexity: a functorial identity for reverse-mode, algebraic exactness of higher-order derivatives, and explicit bounds on truncation error. We further show that tensorized Weil algebras permit one-pass computation of all mixed derivatives with cost linear in the algebra dimension, avoiding the combinatorial blow-up of nested JVP/VJP schedules. This framework interprets AD theory through the lens of differential geometry and offers a foundation for developing structure-preserving differentiation methods in deep learning and scientific computing. Code and examples are available at https://git.nilu.no/geometric-ad/jet-weil-ad.

academic

Jet Functors and Weil Algebras in Automatic Differentiation: A Geometric Analysis

Basic Information

Paper ID: 2510.14342
Title: Jet Functors and Weil Algebras in Automatic Differentiation: A Geometric Analysis
Author: Amandip Sangha (The Climate and Environmental Research Institute NILU, Norway)
Classification: cs.LG math.DG stat.ML
Publication Date: October 16, 2025
Paper Link: https://arxiv.org/abs/2510.14342

Abstract

This paper proposes a geometric formulation of automatic differentiation (AD) based on jet bundles and Weil algebras. Reverse-mode AD is characterized as cotangent-pullback, while Taylor-mode AD corresponds to evaluation in Weil algebras. Based on these principles, the author derives concise statements regarding correctness, stability, and complexity: functor identities for reverse-mode, algebraic exactness for higher-order derivatives, and explicit bounds on truncation errors. The author further demonstrates that tensorized Weil algebras enable computing all mixed derivatives in a single pass at a cost linear in the algebra dimension, avoiding the combinatorial explosion of nested JVP/VJP scheduling. This framework interprets AD theory through the lens of differential geometry, providing a foundation for developing structure-preserving differentiation methods in deep learning and scientific computing.

Research Background and Motivation

Core Problems

Automatic Differentiation (AD) is a fundamental technique in modern machine learning and scientific computing, yet existing AD theory lacks a unified geometric theoretical framework, leading to:

Theoretical Fragmentation: The theoretical foundations of reverse-mode AD (backpropagation) and higher-order AD are scattered across different mathematical frameworks
Complexity Explosion: Computing higher-order mixed derivatives faces combinatorial complexity explosion
Lack of Invariance: Existing methods lack coordinate-independent geometric interpretations, affecting stability analysis

Research Significance

This research is significant for:

Theoretical Unification: Providing a unified differential geometric foundation for AD
Computational Efficiency: Addressing efficiency issues in higher-order derivative computation
Application Prospects: Providing theoretical support for geometry-aware methods in deep learning

Limitations of Existing Methods

Traditional AD Methods: Rely on coordinate representations, lacking geometric invariance
Higher-Order Derivative Computation: Nested JVP/VJP methods suffer from exponential complexity
Stability Analysis: Lacking systematic error propagation theory

Core Contributions

Established geometric theory of backpropagation: Proved that reverse-mode AD is equivalent to cotangent-pullback operations, providing coordinate-independent formulation
Proposed Weil algebra framework: Formulated Taylor-mode AD as exact evaluation in Weil algebras, guaranteeing algebraic exactness
Developed tensorized Weil algebra method: Enabling single-pass computation of all mixed derivatives with complexity linear in algebra dimension
Provided complete theoretical analysis: Including correctness proofs, stability bounds, and complexity analysis

Methodology Details

Problem Definition

Given a smooth map $f: M \to N$ (where $M, N$ are smooth manifolds) and a scalar function $\ell: N \to \mathbb{R}$ , the objectives are:

Computing the gradient of the composite function $\ell \circ f$
Computing higher-order derivatives of $f$
Implementing the above computations in a geometrically invariant manner

Core Theoretical Framework

1. Geometric Formulation of Reverse-Mode AD

Theorem 1 (Backpropagation as Cotangent-Pullback): For smooth maps $f: M \to N$ and $\ell: N \to \mathbb{R}$ : $\nabla_x(\ell \circ f) = (df_x)^*(d\ell_{f(x)})$

Equivalently, at the jet level: $(j^1f)^*(j^1\ell) = j^1(\ell \circ f)$

This theorem reformulates backpropagation as a pullback operation on cotangent spaces, with the following geometric significance:

Coordinate Independence: Results do not depend on specific coordinate system choice
Functoriality: Satisfies $(d(g \circ f)_x)^* = (df_x)^* \circ (dg_{f(x)})^*$
Naturality: Compatible with smooth reparameterization

2. Taylor-Mode in Weil Algebras

Theorem 2 (Exactness of Weil-Mode Evaluation): Let $W$ be a Weil algebra satisfying $m^{k+1} = 0$ . Then the lifting map $T_W f: T_W U \to T_W \mathbb{R}^m$ exactly computes all $k$ -th order derivatives of $f$ at $x$ as coefficients of the truncated Taylor expansion.

Construction of Weil algebras:

Form: $W = \mathbb{R}[\varepsilon]/(\varepsilon^{k+1})$ or tensor product form
Nilpotency $\varepsilon^{k+1} = 0$ automatically implements truncation
Algebraic operations directly correspond to derivative propagation rules

3. Tensorized Weil Algebras

Theorem 3 (Complexity of Tensorized Weil Algebras): Consider tensorized Weil algebra: $W \cong \bigotimes_{j=1}^p \mathbb{R}[\varepsilon_j]/(\varepsilon_j^{\rho_j+1}), \quad \dim W = \prod_{j=1}^p (\rho_j + 1)$

Single evaluation of $f$ at the $W$ -point $x_W := x + \sum_{j=1}^p \varepsilon_j v^{(j)}$ yields all mixed directional derivatives, with time complexity $O(\dim W \cdot Q)$ , where $Q$ is the number of scalar operations in the original program.

Technical Innovations

Geometric Unification: First unification of all AD modes under a differential geometric framework
Algebraic Exactness: Achieving algebraic exactness of truncation through nilpotency, avoiding numerical errors
Linear Complexity: Tensorized method avoids combinatorial explosion of traditional nested methods
No Reverse Tape: Weil-mode only requires storing coefficient arrays, eliminating computational graph storage

Experimental Setup

Theoretical Verification

The author primarily validates the method's effectiveness through theoretical analysis, including:

Correctness Verification: Through functorial properties
Stability Analysis: Providing explicit error bounds
Complexity Analysis: Theoretical comparison with traditional methods

Stability Analysis

Lemma 1 (Backward Stability of Reverse Sweep): For a straight-line program with primitives $\{\phi_i\}_{i=1}^L$ , if each adjoint $\phi_i^*$ satisfies: $\|\phi_i^*(v)\| \leq L_i\|v\|, \quad \|\hat{\phi}_i^*(v) - \phi_i^*(v)\| \leq \delta_i\|\phi_i^*(v)\|$

then the computed pullback satisfies: $\|\hat{f}^*(\bar{y})\| \leq \left(\prod_{i=1}^L (1+\delta_i)L_i\right)\|\bar{y}\|$

Complexity Comparison

Method	Time Complexity	Space Complexity	Tape Required
Nested JVP/VJP	$O(\binom{p+k}{k} \cdot Q)$	$O(L)$ (tape)	Yes
Tensorized Weil	$O(\prod_{j=1}^p(\rho_j+1) \cdot Q)$	$O(\dim W)$	No

Experimental Results

Theoretical Results Verification

Coefficient Growth Envelope

Corollary 1: Assume $f \in C^{k+1}(B_r(x), \mathbb{R}^m)$ and its derivatives satisfy $\|D^\ell f(z)\| \leq M_\ell$ . Then Taylor coefficients satisfy: $\|f_\alpha(x)\| \leq \frac{M_{|\alpha|}}{\alpha!}$

Truncation Stability

For step size $\rho < r$ , the remainder satisfies the standard Cauchy estimate: $\|R_{k+1}(z)\| \leq \frac{M_{k+1}}{(k+1)!}\rho^{k+1}$

Practical Performance Analysis

While the paper primarily focuses on theoretical analysis, it provides key performance insights:

Memory Efficiency: Weil-mode avoids reverse tape storage
Parallelization-Friendly: Coefficient operations naturally support vectorization
Numerical Stability: Truncation errors can be explicitly controlled

Main Research Directions

Categorical Perspective on AD: Elliott (2018), Fong et al. (2019) proposed functor formulations of AD
Geometric AD Theory: Betancourt (2018) explored jet geometry applications in AD
Higher-Order AD Algorithms: Giles (2008), Fike and Alonso (2012) analyzed numerical stability

Advantages of This Work

Theoretical Completeness: First to provide a complete geometric theoretical framework for AD
Practicality: Tensorized Weil algebra method has practical application value
Unification: Unifies reverse, forward, and higher-order AD under a single framework

Conclusions and Discussion

Main Conclusions

Geometric Unification: All AD modes can be uniformly understood within a differential geometric framework
Computational Advantages: Tensorized Weil algebras provide efficient methods for higher-order derivative computation
Theoretical Completeness: Provides complete theoretical analysis of correctness, stability, and complexity

Limitations

Implementation Complexity: Practical implementation of Weil algebras requires carefully designed data structures
Scope of Applicability: Primarily applicable to scenarios requiring dense mixed derivatives
Numerical Precision: Higher-order computations may face numerical precision challenges

Future Directions

Intrinsic AD on Manifolds: Extension to general Riemannian manifolds
PDE-Constrained Optimization: Application to variational and PDE-constrained problems
Higher-Order Tensor Compression: Developing compression techniques for coefficient arrays
Systematic Primitive Lifting: Systematically lifting linear algebra and special functions to Weil algebras

In-Depth Evaluation

Strengths

Strong Theoretical Innovation: First complete geometric theoretical framework for AD
Mathematical Rigor: All theorems have complete mathematical proofs
High Practical Value: Tensorized Weil algebra method addresses real computational problems
Clear Exposition: Complex mathematical concepts are explained relatively clearly

Weaknesses

Lack of Experimental Validation: Primarily theoretical work, lacking actual algorithm implementation and performance testing
Limited Application Scenarios: Primarily applicable to specific scenarios requiring higher-order derivatives
Insufficient Implementation Details: Limited guidance for practical system implementation

Impact

Academic Value: Provides new mathematical foundations for AD theory
Application Potential: Important application prospects in scientific computing and geometric deep learning
Inspirational Value: Provides new perspectives for related research areas

Applicable Scenarios

Scientific Computing: Physical simulations requiring high-precision higher-order derivatives
Optimization Algorithms: Efficient implementation of second-order optimization methods
Geometric Deep Learning: Neural network training on manifolds
Meta-Learning: Adaptive algorithms requiring higher-order gradients

References

The paper cites 18 important references, primarily including:

Elliott (2018): Functional formulation of AD
Fong et al. (2019): Categorical perspective on backpropagation
Betancourt (2018): Geometric theory of higher-order AD
Baydin et al. (2018): AD survey
Kolář et al. (1993): Natural operations in differential geometry

Overall Assessment: This is a high-quality theoretical paper that provides a novel geometric theoretical framework for automatic differentiation. While lacking experimental validation, its theoretical contributions are significant and provide important mathematical foundations for related field development. The primary value of this work lies in theoretical unification and methodological innovation, making it important for advancing AD theory.