2025-11-15T09:01:12.242557

Numerical Methods for Kernel Slicing

Rux, Hertrich, Neumayer
Kernels are key in machine learning for modeling interactions. Unfortunately, brute-force computation of the related kernel sums scales quadratically with the number of samples. Recent Fourier-slicing methods lead to an improved linear complexity, provided that the kernel can be sliced and its Fourier coefficients are known. To obtain these coefficients, we view the slicing relation as an inverse problem and present two algorithms for their recovery. Extensive numerical experiments demonstrate the speed and accuracy of our methods.
academic

Numerical Methods for Kernel Slicing

Basic Information

  • Paper ID: 2510.11478
  • Title: Numerical Methods for Kernel Slicing
  • Authors: Nicolaj Rux (Chemnitz University of Technology), Johannes Hertrich (Université Paris Dauphine-PSL and Inria Mokaplan), Sebastian Neumayer (Chemnitz University of Technology)
  • Classification: math.NA, cs.NA
  • Publication Date: October 14, 2025
  • Paper Link: https://arxiv.org/abs/2510.11478v1

Abstract

Kernel functions are crucial for modeling interactions in machine learning. However, brute-force computation of kernel sums exhibits quadratic complexity growth with respect to sample size. Recent Fourier slicing methods can reduce this complexity to linear, provided that the kernel can be sliced and its Fourier coefficients are known. To obtain these coefficients, this paper formulates the slicing relationship as an inverse problem and proposes two recovery algorithms. Extensive numerical experiments demonstrate the speed and accuracy of the proposed methods.

Research Background and Motivation

Core Problem

Kernel methods are widely applied in machine learning for density estimation, support vector machine classification, principal component analysis, maximum mean discrepancy (MMD), and other tasks. The computational bottleneck in these applications typically involves evaluating expressions of the form:

sm:=n=1NF(xnym)wn,m=1,,Ms_m := \sum_{n=1}^N F(\|x_n - y_m\|)w_n, \quad m = 1,\ldots,M

where FC([0,))F \in C([0,\infty)) is a radial basis function, x1,,xN,y1,,yMRdx_1,\ldots,x_N, y_1,\ldots,y_M \in \mathbb{R}^d are sample points, and wRNw \in \mathbb{R}^N are weights.

Computational Complexity Challenges

Direct computation requires O(NMd)O(NMd) operations, which is infeasible for large datasets. Classical methods such as fast Fourier summation and fast multipole methods, while reducing complexity to O(M+N)O(M+N), suffer from exponential dependence on dimension d>4d > 4 due to reliance on fast Fourier transforms or spatial partitioning, rendering them impractical.

Advantages of Slicing Algorithms

The fundamental idea of slicing algorithms is to find a function fLloc1([0,))f \in L^1_{loc}([0,\infty)) such that:

F(x)=1ωd1Sd1f(ξ,x)dξF(\|x\|) = \frac{1}{\omega_{d-1}} \int_{S^{d-1}} f(|\langle\xi, x\rangle|)d\xi

where ωd1=2πd/2/Γ(d/2)\omega_{d-1} = 2\pi^{d/2}/\Gamma(d/2) is the surface measure of the dd-dimensional sphere. By discretizing the integral, kernel summation can be reduced to one-dimensional cases and computed efficiently using fast Fourier summation.

Core Contributions

  1. Formalization of slicing function recovery as an inverse problem, establishing a complete theoretical framework
  2. Proposal of two numerical algorithms for recovering cosine series coefficients required for fast Fourier summation
  3. Provision of rigorous error estimates, including analysis of forward error and slicing error
  4. Extensive numerical experiments validating the efficiency and accuracy of the methods on various kernel functions
  5. Extension of method applicability to handle unknown slicing functions without analytical knowledge

Methodology Details

Problem Formulation

Given a radial basis function F:[0,)RF: [0,\infty) \to \mathbb{R}, find a function f:[0,)Rf: [0,\infty) \to \mathbb{R} such that the slicing relationship F=Sd[f]F = S_d[f] holds, where SdS_d is a generalized Riemann-Liouville fractional integral operator:

Sd[f](s)=01f(ts)ϱd(t)dtS_d[f](s) = \int_0^1 f(ts)\varrho_d(t)dt

where ϱd(t):=cd(1t2)(d3)/2\varrho_d(t) := c_d(1-t^2)^{(d-3)/2}, cd:=2Γ(d/2)πΓ((d1)/2)c_d := \frac{2\Gamma(d/2)}{\sqrt{\pi}\Gamma((d-1)/2)}.

Model Architecture

1. Optimization Problem Formulation

The slicing function recovery is transformed into a regularized minimization problem:

a^=argminaRKSd[fa]FH2+τ2faG2\hat{a} = \arg\min_{a \in \mathbb{R}^K} \|S_d[f_a] - F\|_H^2 + \tau^2\|f_a\|_G^2

where fa=C1[a]f_a = C^{-1}[a] is a KK-term cosine series:

fa(t)=a0+2k=1K1akcos(πkt)f_a(t) = a_0 + \sqrt{2}\sum_{k=1}^{K-1} a_k \cos(\pi kt)

2. Spatial Domain Method (Algorithm 1)

  • Matrix Construction: Compute hk:=Sd[gk]h_k := S_d[g_k], where gkg_k are cosine basis functions
  • Discretization: Approximate integrals using Gauss-Legendre quadrature
  • Solution: Solve the least squares problem H^Tab^22+τ2Da22\|\hat{H}^T a - \hat{b}\|_2^2 + \tau^2\|Da\|_2^2

3. Frequency Domain Method (Algorithm 2)

  • Operator Representation: Construct matrix representation of operator S:=CSdC1S := C \circ S_d \circ C^{-1}
  • Coefficient Calculation: Utilize the relationship Sj,k=Sd[sinc(+j)+sinc(j)](k)S_{j,k} = S_d[\text{sinc}(\cdot + j) + \text{sinc}(\cdot - j)](k)
  • Optimization Solution: Solve the regularized problem in frequency domain space

Technical Innovations

  1. Theoretical Foundation: Establish boundedness theory of slicing operator SdS_d on different function spaces
  2. Numerical Stability: Address ill-posed problems through Tikhonov regularization
  3. Error Decomposition: Decompose total error into forward error and slicing error components
  4. Convergence Analysis: Prove convergence rates under function smoothness assumptions

Experimental Setup

Datasets

Multiple radial basis functions are tested:

  • Gaussian: F(s)=exp(s2/(2c2))F(s) = \exp(-s^2/(2c^2))
  • Laplace: F(s)=exp(cs)F(s) = \exp(-c|s|)
  • Inverse Multiquadric (IMQ): F(s)=(c2+s2)1/2F(s) = (c^2 + s^2)^{-1/2}
  • Thin Plate Spline (TPS): F(s)=(cs)2log(cs)F(s) = (cs)^2\log(|cs|)
  • Logarithmic Kernel (LOG): F(s)=log(cs)F(s) = \log(|cs|)
  • Bump Function and Multiquadric (MQ)

Evaluation Metrics

  • Forward Error: FK(s)F(s)|F_K(s) - F(s)|
  • Relative L2 Error: ss^2/s2\|s - \hat{s}\|_2/\|s\|_2
  • Runtime Comparison

Comparison Methods

  • Direct Method: Truncated Fourier series when analytical solution f=Sd1[F]f = S_d^{-1}[F] is known
  • PyKeOps: Highly optimized GPU brute-force computation package
  • Three Configurations: S-L2-H1, F-L2-H1, F-H1-H1

Implementation Details

  • Use L=210L = 2^{10} quadrature points
  • K=28K = 2^8 cosine coefficients in domain, J=210J = 2^{10} in range
  • Regularization parameter τ{106,107,104}\tau \in \{10^{-6}, 10^{-7}, 10^{-4}\}

Experimental Results

Main Results

Forward Error Analysis

For Laplace and Bump functions, forward error FK(s)F(s)|F_K(s) - F(s)| remains below 10210^{-2} across the entire interval [0,1][0,1], with slightly larger errors in irregular regions (e.g., at s=0s=0 for Laplace function).

Fast Kernel Summation Accuracy

In tests with d=1000d=1000 dimensions and N=M=104N=M=10^4 samples:

FunctionS-L2-H1F-L2-H1F-H1-H1Direct
Gaussian6.53×10⁻³6.62×10⁻³6.61×10⁻³6.56×10⁻³
Laplace8.58×10⁻³8.32×10⁻³1.30×10⁻²5.90×10⁻³
IMQ2.25×10⁻³2.27×10⁻³2.28×10⁻³2.26×10⁻³
LOG1.00×10⁻¹1.80×10⁻¹1.55×10⁻¹2.98×10¹

Runtime Comparison

  • Computational Overhead: Coefficient computation time approximately 0.1 seconds (GPU) to 1.3 seconds (CPU)
  • Acceleration Effect: Fast summation methods begin to outperform brute-force methods when N3×103N \geq 3 \times 10^3
  • Significant Speedup: Approximately 50-fold acceleration achieved for N=5×104N = 5 \times 10^4 samples

Ablation Studies

The choice of regularization parameter τ\tau is critical:

  • Excessively small τ\tau leads to numerical instability
  • Excessively large τ\tau results in over-regularization
  • Optimal values typically fall within the range 10610^{-6} to 10410^{-4}

Development of Slicing Methods

  • Originally appeared in random one-dimensional projections of Wasserstein distance
  • Extended to kernel metrics such as MMD
  • Closely related to random Fourier features but more general

Fast Kernel Summation Methods

  • Traditional Methods: Non-uniform fast Fourier transform, fast multipole methods
  • High-Dimensional Challenges: Curse of dimensionality limits applicability of traditional methods
  • GPU Implementation: KeOps and similar tools remain competitive at moderate dimensions

Theoretical Foundations

The slicing relationship has multiple names in harmonic analysis and fractional calculus:

  • Adjoint Radon transform
  • Generalized Riemann-Liouville fractional integral
  • Special case of Erdélyi-Kober integral

Conclusions and Discussion

Main Conclusions

  1. Theoretical Contribution: Establish complete slicing operator theory, including operator norm estimates and error bounds
  2. Numerical Methods: The two proposed algorithms effectively recover coefficients of unknown slicing functions
  3. Practical Value: Methods significantly outperform brute-force computation in high dimensions, applicable to large-scale applications

Limitations

  1. Dimension Dependence: While complexity is improved, still requires O(dP)O(dP) computational cost
  2. Regularization Sensitivity: Careful tuning of regularization parameters is necessary
  3. Smoothness Requirements: Convergence analysis depends on function smoothness assumptions

Future Directions

  1. Adaptive Parameter Selection: Develop methods for automatic regularization parameter selection
  2. More Efficient Quadrature: Explore specialized quadrature rules to enhance accuracy
  3. Application Extensions: Validate method practicality in concrete machine learning tasks

In-Depth Evaluation

Strengths

  1. Theoretical Rigor: Provides complete functional analysis framework, including operator boundedness and convergence analysis
  2. Method Practicality: Two algorithms each have advantages; spatial domain method is intuitive, frequency domain method is theoretically elegant
  3. Comprehensive Experiments: Tests multiple kernel functions from smooth to non-smooth, validating method robustness
  4. Excellent Performance: Achieves significant computational acceleration while maintaining accuracy

Weaknesses

  1. Parameter Tuning: Regularization parameter selection requires experience, lacking automated methods
  2. Memory Requirements: Matrix storage may become a bottleneck in extremely high-dimensional cases
  3. Special Case Handling: Method performance is limited for certain ill-conditioned kernels (e.g., LOG)

Impact

  1. Academic Value: Provides new theoretical tools and numerical techniques for high-dimensional kernel methods
  2. Practical Significance: Holds important value in large-scale machine learning applications
  3. Reproducibility: Open-source code provided, facilitating researcher adoption and extension

Applicable Scenarios

  • Large-Scale Machine Learning: Particularly suitable for kernel method applications with large sample sizes and high dimensions
  • Scientific Computing: Broad application prospects in numerical simulations requiring efficient kernel summation
  • Real-Time Systems: Enables fast online inference through pre-computed coefficients

References

The paper cites 52 relevant references spanning kernel methods, fast algorithms, harmonic analysis, and other fields, providing a solid theoretical foundation for the research.