2025-11-19T21:37:14.535760

Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption

Lee, Lee, Kim et al.

Recent studies have explored the deployment of privacy-preserving deep neural networks utilizing homomorphic encryption (HE), especially for private inference (PI). Many works have attempted the approximation-aware training (AAT) approach in PI, changing the activation functions of a model to low-degree polynomials that are easier to compute on HE by allowing model retraining. However, due to constraints in the training environment, it is often necessary to consider post-training approximation (PTA), using the pre-trained parameters of the existing plaintext model without retraining. Existing PTA studies have uniformly approximated the activation function in all layers to a high degree to mitigate accuracy loss from approximation, leading to significant time consumption. This study proposes an optimized layerwise approximation (OLA), a systematic framework that optimizes both accuracy loss and time consumption by using different approximation polynomials for each layer in the PTA scenario. For efficient approximation, we reflect the layerwise impact on the classification accuracy by considering the actual input distribution of each activation function while constructing the optimization problem. Additionally, we provide a dynamic programming technique to solve the optimization problem and achieve the optimized layerwise degrees in polynomial time. As a result, the OLA method reduces inference times for the ResNet-20 model and the ResNet-32 model by 3.02 times and 2.82 times, respectively, compared to prior state-of-the-art implementations employing uniform degree polynomials. Furthermore, we successfully classified CIFAR-10 by replacing the GELU function in the ConvNeXt model with only 3-degree polynomials using the proposed method, without modifying the backbone model.

academic

Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption

Basic Information

Paper ID: 2310.10349
Title: Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption
Authors: Junghyun Lee, Joon-Woo Lee, Eunsang Lee, Young-Sik Kim, Yongwoo Lee, Yongjune Kim, Jong-Seon No
Classification: cs.CR (Cryptography and Security), cs.AI (Artificial Intelligence)
Publication Date: October 2023 (arXiv v4: October 14, 2025)
Paper Link: https://arxiv.org/abs/2310.10349

Abstract

This paper proposes an Optimized Layerwise Approximation (OLA) method for efficient private inference on Fully Homomorphic Encryption (FHE). The method optimizes accuracy loss and computational time by employing different approximation polynomials for each layer in post-training approximation (PTA) scenarios, significantly improving inference efficiency. The OLA method reduces inference time for ResNet-20 and ResNet-32 models by 3.02× and 2.82× respectively, and successfully replaces the GELU function in ConvNeXt models with only degree-3 polynomials.

Research Background and Motivation

Problem Definition

In privacy-preserving machine learning (PPML), Fully Homomorphic Encryption (FHE) enables direct computation on encrypted data. However, FHE schemes only support basic arithmetic operations (addition and multiplication) and cannot directly handle non-arithmetic activation functions (such as ReLU, GELU, sigmoid, etc.).

Problem Significance

Growing Privacy Demands: With the development of cloud computing, MLaaS (Machine Learning as a Service) requires protecting data privacy while providing services
Practical Requirements: Existing methods have excessively long inference times, making them difficult to meet practical application needs
Model Compatibility: Privacy-preserving inference must be achieved without retraining models

Limitations of Existing Methods

AAT Methods: Require model retraining and perform poorly on large-scale datasets
PTA Methods: Use uniform high-degree polynomial approximations across all layers, resulting in prolonged inference times
Computational Efficiency: Existing methods fail to account for different layers' varying impacts on classification accuracy

Research Motivation

Addressing the main bottleneck of PTA methods—excessive inference time—by proposing a systematic layerwise optimization framework that balances accuracy and efficiency through different polynomial degrees for different layers.

Core Contributions

Proposed OLA Framework: First to introduce layerwise optimized approximation for PTA scenarios, using different polynomial degrees for each layer
Distribution-Aware Approximation: Based on weighted least squares, considering actual input distributions of activation functions across layers
Dynamic Programming Algorithm: Provides polynomial-time complexity optimization algorithm for optimal degree allocation
Significant Performance Gains: Achieves 2.82-3.02× inference acceleration on ResNet and ConvNeXt models
Theoretical Analysis: Provides complete mathematical foundation and convergence proofs

Methodology Details

Task Definition

Input: Pre-trained deep neural network models containing non-arithmetic activation functions Output: Optimal polynomial degree allocation for each layer Constraints: Inference time budget K, accuracy loss threshold Objective: Minimize average loss variance while satisfying time constraints

Model Architecture

1. Distribution-Aware Approximation (Theorem 1)

For activation function f(x) and input distribution φ(x), the optimal degree-d approximation polynomial is:

P_φ[d; f](x) = Σ(l=0 to d) h_l(x) ∫ φ(t)f(t)h_l(t)dt

where {h_l(x)} are orthogonal polynomial bases obtained through the Gram-Schmidt process.

2. Average Loss Variance Modeling

Treating approximation error as a random variable, the loss function variance is:

Var[ΔL] = Σ(i=1 to N_L) A_i E_φi[d_i; f]

where:

A_i = (1/N_T) Σ_k Σ_j (∂L/∂a_{i,j})²: Impact weight of layer i on accuracy
E_φid_i; f: Minimized MSE for layer i

3. Optimization Problem Formulation

minimize: V(d) = Σ(i=1 to N_L) A_i E_i(d_i)
subject to: T(d) = Σ(i=1 to N_L) T_i(d_i) ≤ K

4. Dynamic Programming Solution (Algorithm 1)

Time Complexity: O(N_L × N_K × |S|)
Space Complexity: O(N_L × N_K)
Recurrence Relation: P(l+1,k) constructed based on optimal solutions of {P(l,k')}

Technical Innovations

Layerwise Differentiated Processing: First systematic approach to allocate different polynomial degrees to different layers
Input Distribution Modeling: Uses actual inter-layer data distributions rather than theoretical distributions
Scaled Distribution-Aware Approximation: Adjusts distribution variance through parameter r to improve approximation precision in low-probability regions
Modulus Chain Management: Optimizes FHE parameters for different degrees, reducing bootstrapping overhead

Experimental Setup

Datasets

CIFAR-10/100: Small-scale image classification datasets
ImageNet: Large-scale image classification dataset
Preprocessing: Standardization and data augmentation

Evaluation Metrics

Inference Time: Actual execution time in FHE environment
Top-1 Accuracy: Classification accuracy
τ(d): Discrete-time delay indicator
Speedup Ratio: Time reduction relative to baseline

Comparison Methods

Minimax Approximation: Lee et al. 4's composite minimax polynomial method
Uniform Degree Method: All layers using same degree polynomials
AAT Methods: HyPHEN, DeepReDuce and other retraining methods

Implementation Details

FHE Scheme: RNS-CKKS
Security Level: 128-bit
Degree Search Space: S = {3,7,15,31,63,88,127,154,210,255,261,393,511,603,703,813,917,1023}
Discretization Unit: ν = 1/4
Library: Lattigo v3.0.5

Experimental Results

Main Results

Model	Dataset	Method	Accuracy(%)	τ(d)	Speedup
ResNet-20	CIFAR-10	Minimax	91.55	2,788	-
ResNet-20	CIFAR-10	OLA	90.69	1,106	2.52×
ResNet-32	CIFAR-10	Minimax	92.45	4,624	-
ResNet-32	CIFAR-10	OLA	91.69	1,927	2.40×

FHE Actual Test Results:

ResNet-20: Inference time reduced from 1,231s to 407s (3.02× speedup)
ResNet-32: Inference time reduced from 1,913s to 679s (2.82× speedup)

Ablation Study

Component	Distribution-Aware	Dynamic Programming	ResNet-20 τ(d)	ResNet-110 τ(d)
Baseline	✗	✗	1,440	21,172
+Distribution-Aware	✓	✗	1,142	10,725
+Dynamic Programming	✓	✓	1,106	9,448

Findings:

Distribution-aware approximation contributes the largest performance improvement
Dynamic programming shows more significant effects on deeper networks (ResNet-110 reduction of 11.91%)

ConvNeXt Model Results

ConvNeXt-T (CIFAR-10): Achieves 91.42% accuracy using only degree-3 polynomials
ConvNeXt-S (ImageNet): Achieves 84.64% accuracy using polynomials with degree ≤ 31

Preprocessing Overhead Analysis

Dataset	Model	Input Distribution Analysis(s)	Dynamic Programming(s)
CIFAR-10	ResNet-20	8.12	7.76
CIFAR-10	ResNet-110	17.97	773.07
ImageNet	ResNet-18	9,510.94	6.23

HE-based PPML Research Directions

PTA Methods: Lee et al. 4,5, Kim et al. 6 - Focus on linear operation optimization
AAT Methods: HyPHEN 17, DeepReDuce 43 - Require model retraining
Hybrid Methods: Schemes combining HE and MPC

Non-Arithmetic Operation Handling

TFHE Schemes: Support bitwise operations with large memory overhead
CKKS Schemes: Support packing, requiring function approximation
Polynomial Approximation: Minimax, least squares and other methods

Advantages of This Work

First systematic framework for layerwise optimization
Complete theoretical foundation with sufficient experimental validation
Significant performance gains in PTA scenarios

Conclusions and Discussion

Main Conclusions

Effectiveness of Layerwise Approximation: Different layers indeed have varying impacts on classification accuracy, making layerwise optimization justified
Practical Improvement: Significant inference acceleration brings FHE-based PI closer to practical applications
Theoretical Completeness: Provides complete mathematical framework and efficient solution algorithms

Limitations

Preprocessing Overhead: For large-scale datasets (ImageNet), input distribution analysis requires considerable time
Memory Requirements: Dynamic programming algorithm consumes significant memory for deep networks
Activation Function Restrictions: Primarily targets univariate activation functions; extension to multivariate functions like softmax requires further development

Future Directions

Transformer Support: Extension to privacy-preserving inference for large language models
Multivariate Functions: Development of approximation methods for functions like softmax
Adaptive Optimization: Dynamic approximation strategy adjustment based on hardware resources
Federated Learning Integration: Combination with other privacy-preserving techniques

In-Depth Evaluation

Strengths

Strong Innovation: First systematic solution to layerwise optimization in PTA
Solid Theory: Rigorous mathematical derivations and complete theorem proofs
Comprehensive Experiments: Full validation across multiple datasets and model architectures
High Practical Value: Significant performance gains demonstrate practical application potential
Clear Writing: Well-structured paper with accurate technical descriptions

Weaknesses

Computational Complexity: While polynomial-time, may face challenges in ultra-large-scale networks
Parameter Sensitivity: Selection of scaling parameter r requires empirical tuning
Generalization Capability: Primarily validated on CNN architectures; applicability to other architectures requires further verification
Security Analysis: Lacks in-depth analysis of additional security risks introduced by approximation

Impact

Academic Contribution: Provides new optimization perspectives for FHE-based PPML
Practical Value: Advances privacy-preserving AI toward practical applications
Reproducibility: Provides detailed implementation details with open-source commitment
Inspirational Significance: Layerwise optimization concepts extensible to other privacy-computing scenarios

Applicable Scenarios

Cloud AI Services: Machine learning services requiring user data privacy protection
Medical AI: Diagnostic systems processing sensitive medical data
Financial Risk Control: Privacy-preserving credit assessment and risk analysis systems
Federated Learning: Complementary technique for secure aggregation

References

Lee et al. "Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed convolutions." ICML 2022.
Kim et al. "Optimized privacy-preserving cnn inference with fully homomorphic encryption." IEEE TIFS 2023.
Gilad-Bachrach et al. "Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy." ICML 2016.
Cheon et al. "A full rns variant of approximate homomorphic encryption." SAC 2018.

Summary: The proposed OLA method holds significant importance in the FHE-based private inference domain. By achieving substantial inference efficiency improvements through layerwise optimization, it establishes an important foundation for practical applications of privacy-preserving AI. Despite certain limitations, its innovation and practical value make it an important contribution to the field.