Kernels are key in machine learning for modeling interactions. Unfortunately, brute-force computation of the related kernel sums scales quadratically with the number of samples. Recent Fourier-slicing methods lead to an improved linear complexity, provided that the kernel can be sliced and its Fourier coefficients are known. To obtain these coefficients, we view the slicing relation as an inverse problem and present two algorithms for their recovery. Extensive numerical experiments demonstrate the speed and accuracy of our methods.
Authors: Nicolaj Rux (Chemnitz University of Technology), Johannes Hertrich (Université Paris Dauphine-PSL and Inria Mokaplan), Sebastian Neumayer (Chemnitz University of Technology)
Kernel functions are crucial for modeling interactions in machine learning. However, brute-force computation of kernel sums exhibits quadratic complexity growth with respect to sample size. Recent Fourier slicing methods can reduce this complexity to linear, provided that the kernel can be sliced and its Fourier coefficients are known. To obtain these coefficients, this paper formulates the slicing relationship as an inverse problem and proposes two recovery algorithms. Extensive numerical experiments demonstrate the speed and accuracy of the proposed methods.
Kernel methods are widely applied in machine learning for density estimation, support vector machine classification, principal component analysis, maximum mean discrepancy (MMD), and other tasks. The computational bottleneck in these applications typically involves evaluating expressions of the form:
sm:=∑n=1NF(∥xn−ym∥)wn,m=1,…,M
where F∈C([0,∞)) is a radial basis function, x1,…,xN,y1,…,yM∈Rd are sample points, and w∈RN are weights.
Direct computation requires O(NMd) operations, which is infeasible for large datasets. Classical methods such as fast Fourier summation and fast multipole methods, while reducing complexity to O(M+N), suffer from exponential dependence on dimension d>4 due to reliance on fast Fourier transforms or spatial partitioning, rendering them impractical.
The fundamental idea of slicing algorithms is to find a function f∈Lloc1([0,∞)) such that:
F(∥x∥)=ωd−11∫Sd−1f(∣⟨ξ,x⟩∣)dξ
where ωd−1=2πd/2/Γ(d/2) is the surface measure of the d-dimensional sphere. By discretizing the integral, kernel summation can be reduced to one-dimensional cases and computed efficiently using fast Fourier summation.
Given a radial basis function F:[0,∞)→R, find a function f:[0,∞)→R such that the slicing relationship F=Sd[f] holds, where Sd is a generalized Riemann-Liouville fractional integral operator:
Sd[f](s)=∫01f(ts)ϱd(t)dt
where ϱd(t):=cd(1−t2)(d−3)/2, cd:=πΓ((d−1)/2)2Γ(d/2).
For Laplace and Bump functions, forward error ∣FK(s)−F(s)∣ remains below 10−2 across the entire interval [0,1], with slightly larger errors in irregular regions (e.g., at s=0 for Laplace function).
The paper cites 52 relevant references spanning kernel methods, fast algorithms, harmonic analysis, and other fields, providing a solid theoretical foundation for the research.