2025-11-16T06:16:12.477685

Approximation theory for 1-Lipschitz ResNets

Murari, Furuya, SchÃ¶nlieb

1-Lipschitz neural networks are fundamental for generative modelling, inverse problems, and robust classifiers. In this paper, we focus on 1-Lipschitz residual networks (ResNets) based on explicit Euler steps of negative gradient flows and study their approximation capabilities. Leveraging the Restricted Stone-Weierstrass Theorem, we first show that these 1-Lipschitz ResNets are dense in the set of scalar 1-Lipschitz functions on any compact domain when width and depth are allowed to grow. We also show that these networks can exactly represent scalar piecewise affine 1-Lipschitz functions. We then prove a stronger statement: by inserting norm-constrained linear maps between the residual blocks, the same density holds when the hidden width is fixed. Because every layer obeys simple norm constraints, the resulting models can be trained with off-the-shelf optimisers. This paper provides the first universal approximation guarantees for 1-Lipschitz ResNets, laying a rigorous foundation for their practical use.

academic

Approximation theory for 1-Lipschitz ResNets

基本信息

论文ID: 2505.12003
标题: Approximation theory for 1-Lipschitz ResNets
作者: Davide Murari (University of Cambridge), Takashi Furuya (Doshisha University, RIKEN AIP), Carola-Bibiane Schönlieb (University of Cambridge)
分类: cs.LG cs.NA math.NA
发表会议: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
论文链接: https://arxiv.org/abs/2505.12003v2

摘要

本文研究了基于负梯度流显式欧拉步骤的1-Lipschitz残差网络(ResNets)的逼近能力。利用限制性Stone-Weierstrass定理，首先证明了当宽度和深度允许增长时，这些1-Lipschitz ResNets在任何紧致域上的标量1-Lipschitz函数集合中稠密。还证明了这些网络可以精确表示标量分段仿射1-Lipschitz函数。进一步证明了更强的结论：通过在残差块之间插入范数约束的线性映射，在隐藏宽度固定时仍能保持相同的稠密性。由于每层都遵循简单的范数约束，所得模型可以用现成的优化器进行训练。

研究背景与动机

问题重要性

1-Lipschitz神经网络在多个重要领域具有基础性作用：

生成建模：Wasserstein GAN中的判别器必须是1-Lipschitz的，以通过Kantorovich-Rubinstein对偶性提供1-Wasserstein距离的有效估计
逆问题：在Plug-and-Play算法中，1-Lipschitz约束保证了迭代方案的收敛性
鲁棒分类器：控制网络的Lipschitz常数可以提高对对抗攻击的鲁棒性

现有方法局限性

表达能力下降：约束网络的Lipschitz常数通常会降低其表达能力，导致性能明显下降
理论缺失：对于约束网络的逼近性质理解不足，不同的约束策略可能产生显著不同的表达能力
实现困难：现有的1-Lipschitz ResNet缺乏严格的理论保证

研究动机

本文旨在填补1-Lipschitz ResNets理论分析的空白，提供严格的数学基础来理解这类网络的逼近能力，并为实际应用提供理论支撑。

核心贡献

首个通用逼近定理：为1-Lipschitz ResNets提供了首个通用逼近保证，证明了基于负梯度流的ResNets在标量1-Lipschitz函数集合中的稠密性
固定宽度的逼近结果：通过引入范数约束的线性映射，证明了即使在固定网络宽度的情况下，仍能保持通用逼近性质
构造性证明方法：提供了两种证明策略 - 基于限制性Stone-Weierstrass定理和基于分段仿射函数的构造性方法
实用架构设计：提出的网络架构具有明确的约束条件，可以用标准优化器进行训练

方法详解

任务定义

研究在紧致集 $X \subset \mathbb{R}^d$ 上的1-Lipschitz函数空间： $C_1(X,\mathbb{R}) = \{g : X \to \mathbb{R} \mid \|g(y) - g(x)\|_2 \leq \|y - x\|_2, \forall x,y \in X\}$

目标是构造神经网络集合，使其在 $C_1(X,\mathbb{R})$ 中稠密。

核心构建模块

1-Lipschitz残差层

基于负梯度流的显式欧拉步骤： $\Phi_{\theta_\ell}(x) = x - \tau_\ell W_\ell^T \sigma(W_\ell x + b_\ell)$

其中 $\sigma = \text{ReLU}$ ，约束条件： $0 \leq \tau_\ell \leq 2/\|W_\ell\|_2^2$ ， $\|W_\ell\|_2 \leq 1$

网络架构定义

无界宽度和深度的网络集合： $\mathcal{G}_{d,\sigma}(X,\mathbb{R}) = C_1(X,\mathbb{R}) \cap \{v^T \circ \Phi_{\theta_L} \circ \cdots \circ \Phi_{\theta_1} \circ Q : X \to \mathbb{R}\}$

固定宽度的网络集合： $\tilde{\mathcal{G}}_{d,\sigma,h}(X,\mathbb{R}) = \{v^T \circ \Phi_{\theta_L} \circ A_{L-1} \circ \cdots \circ A_1 \circ \Phi_{\theta_1} \circ Q : X \to \mathbb{R}\}$

其中 $A_i$ 是范数约束的仿射映射。

技术创新点

1. 双重证明策略

Stone-Weierstrass方法：验证网络集合是分离点的格，满足限制性Stone-Weierstrass定理的条件
构造性方法：证明网络可以精确表示所有分段仿射1-Lipschitz函数

2. 固定宽度的创新设计

通过引入特殊的残差层结构： $\tilde{\mathcal{E}}_{h,\sigma} = \left\{\Phi_\theta : \mathbb{R}^{h+3} \to \mathbb{R}^{h+3} \mid \Phi_\theta(x) = \begin{bmatrix} \max\{x_1, x_2\} \\ \min\{x_1, x_2\} \\ x_3 \\ \tilde{\Phi}_\theta(x_{4:}) \end{bmatrix}\right\}$