2025-11-10T02:37:09.167057

Distributionally robust approximation property of neural networks

Ceylan, PrÃ¶mel

The universal approximation property uniformly with respect to weakly compact families of measures is established for several classes of neural networks. To that end, we prove that these neural networks are dense in Orlicz spaces, thereby extending classical universal approximation theorems even beyond the traditional $L^p$-setting. The covered classes of neural networks include widely used architectures like feedforward neural networks with non-polynomial activation functions, deep narrow networks with ReLU activation functions and functional input neural networks.

academic

Distributionally robust approximation property of neural networks

基本信息

论文ID: 2510.09177
标题: Distributionally robust approximation property of neural networks
作者: Mihriban Ceylan, David J. Prömel
分类: stat.ML cs.LG math.FA math.PR
发表时间: October 13, 2025
论文链接: https://arxiv.org/abs/2510.09177

摘要

The universal approximation property uniformly with respect to weakly compact families of measures is established for several classes of neural networks. To that end, we prove that these neural networks are dense in Orlicz spaces, thereby extending classical universal approximation theorems even beyond the traditional $L^p$ -setting. The covered classes of neural networks include widely used architectures like feedforward neural networks with non-polynomial activation functions, deep narrow networks with ReLU activation functions and functional input neural networks.

研究背景与动机

问题定义

该研究要解决的核心问题是建立神经网络的分布鲁棒近似性质（distributionally robust approximation property）。具体而言，传统的通用近似定理（Universal Approximation Theorems, UATs）只考虑单一固定分布μ下的 $L^p(μ)$ 空间中的近似，而本文要证明神经网络能够在弱紧测度族 $\mathcal{M}$ 上一致地近似函数，即对于给定函数 $f$ 和任意 $ε > 0$ ，存在神经网络 $η$ 使得： $\sup_{ν \in \mathcal{M}} \|f - η\|_{L^1(ν)} < ε$

研究重要性

理论意义：扩展了经典的通用近似定理，从单一分布设定推广到分布族的一致近似
实际需求：在机器学习实践中，数据分布的不确定性是普遍存在的挑战，被称为分布不确定性（distributional uncertainty）
应用价值：为分布鲁棒学习、对抗训练、噪声数据处理等领域提供理论基础

现有方法局限性

经典的通用近似定理存在以下局限：

单分布限制：仅针对固定的单一测度μ在 $L^p(μ)$ 空间中建立近似性质
空间限制：主要局限在 $L^p$ 空间框架内，缺乏更一般的函数空间理论
鲁棒性缺失：无法处理分布漂移或分布不确定性场景

研究动机

本文的研究动机源于：

现实应用中分布不确定性的普遍存在（如Knightian不确定性、对抗样本等）
需要理论支撑分布鲁棒优化和统计学习的发展
将神经网络理论从 $L^p$ 空间扩展到更一般的Orlicz空间的理论需求

核心贡献

Orlicz空间中的通用近似定理：首次证明了多类神经网络在Orlicz空间中关于Luxemburg范数的稠密性，这是对经典 $L^p$ 空间结果的重要推广
分布鲁棒近似性质：建立了神经网络相对于弱紧测度族的分布鲁棒通用近似定理，提供了处理分布不确定性的理论基础
广泛的网络架构覆盖：涵盖了多种重要的神经网络架构：
- 有界非多项式激活函数的前馈网络
- ReLU激活的深窄网络
- 函数输入神经网络
理论框架创新：通过Orlicz空间理论，提供了统一处理不同损失函数（如交叉熵、KL散度）的数学框架

方法详解

任务定义

给定弱紧测度族 $\mathcal{M}$ 和合适的函数 $f: \mathbb{R}^{N_0} \to \mathbb{R}^{N_L}$ ，对于任意 $ε > 0$ ，寻找神经网络 $η$ 使得： $\sup_{ν \in \mathcal{M}} \|f - η\|_{L^1(ν)} < ε$

理论架构

Orlicz空间框架

论文基于Orlicz空间理论构建数学框架。对于Young函数φ，Orlicz空间定义为： $L^φ(μ; \mathbb{R}^{N_L}) := \{f: \mathbb{R}^{N_0} \to \mathbb{R}^{N_L} : \int_{\mathbb{R}^{N_0}} φ(α\|f\|) dμ < ∞ \text{ for some } α > 0\}$

配备gauge范数： $N_{φ,μ}(f) := \inf\{k > 0: \int_{\mathbb{R}^{N_0}} φ(\|f\|/k) dμ ≤ 1\}$

神经网络定义

前馈神经网络： $η = w_L ∘ ϱ ∘ w_{L-1} ∘ \cdots ∘ ϱ ∘ w_1$
函数输入神经网络： $η(x) = \sum_{n=1}^N y_n ϱ(h_n(x))$ ，其中 $h_n \in \mathcal{H}$ 为加性族

核心定理

定理2.3（Orlicz空间中的通用近似定理）

对于N-函数φ和局部有限Borel测度μ，神经网络在Orlicz心 $M^φ(μ)$ 中关于gauge范数稠密，涵盖：

有界非常数激活函数（有限测度）
ReLU激活函数（局部有限测度）
连续非多项式激活函数（紧支撑测度）
函数输入神经网络（满足特定条件）

定理3.1（分布鲁棒通用近似定理）

对于弱紧测度族 $\mathcal{M}$ 及其关联Young对 $(φ_\mathcal{M}, ψ_\mathcal{M})$ ，对任意 $f \in M^{φ_\mathcal{M}}(μ; \mathbb{R}^{N_L})$ 和 $ε > 0$ ，存在相应类别的神经网络η使得： $\sup_{ν \in \mathcal{M}} \|f - η\|_{L^1(ν; \mathbb{R}^{N_L})} < ε$