2025-11-19T21:37:14.535760

Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption

Lee, Lee, Kim et al.
Recent studies have explored the deployment of privacy-preserving deep neural networks utilizing homomorphic encryption (HE), especially for private inference (PI). Many works have attempted the approximation-aware training (AAT) approach in PI, changing the activation functions of a model to low-degree polynomials that are easier to compute on HE by allowing model retraining. However, due to constraints in the training environment, it is often necessary to consider post-training approximation (PTA), using the pre-trained parameters of the existing plaintext model without retraining. Existing PTA studies have uniformly approximated the activation function in all layers to a high degree to mitigate accuracy loss from approximation, leading to significant time consumption. This study proposes an optimized layerwise approximation (OLA), a systematic framework that optimizes both accuracy loss and time consumption by using different approximation polynomials for each layer in the PTA scenario. For efficient approximation, we reflect the layerwise impact on the classification accuracy by considering the actual input distribution of each activation function while constructing the optimization problem. Additionally, we provide a dynamic programming technique to solve the optimization problem and achieve the optimized layerwise degrees in polynomial time. As a result, the OLA method reduces inference times for the ResNet-20 model and the ResNet-32 model by 3.02 times and 2.82 times, respectively, compared to prior state-of-the-art implementations employing uniform degree polynomials. Furthermore, we successfully classified CIFAR-10 by replacing the GELU function in the ConvNeXt model with only 3-degree polynomials using the proposed method, without modifying the backbone model.
academic

Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption

Basic Information

  • Paper ID: 2310.10349
  • Title: Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption
  • Authors: Junghyun Lee, Joon-Woo Lee, Eunsang Lee, Young-Sik Kim, Yongwoo Lee, Yongjune Kim, Jong-Seon No
  • Classification: cs.CR (Cryptography and Security), cs.AI (Artificial Intelligence)
  • Publication Date: October 2023 (arXiv v4: October 14, 2025)
  • Paper Link: https://arxiv.org/abs/2310.10349

Abstract

This paper proposes an Optimized Layerwise Approximation (OLA) method for efficient private inference on Fully Homomorphic Encryption (FHE). The method optimizes accuracy loss and computational time by employing different approximation polynomials for each layer in post-training approximation (PTA) scenarios, significantly improving inference efficiency. The OLA method reduces inference time for ResNet-20 and ResNet-32 models by 3.02× and 2.82× respectively, and successfully replaces the GELU function in ConvNeXt models with only degree-3 polynomials.

Research Background and Motivation

Problem Definition

In privacy-preserving machine learning (PPML), Fully Homomorphic Encryption (FHE) enables direct computation on encrypted data. However, FHE schemes only support basic arithmetic operations (addition and multiplication) and cannot directly handle non-arithmetic activation functions (such as ReLU, GELU, sigmoid, etc.).

Problem Significance

  1. Growing Privacy Demands: With the development of cloud computing, MLaaS (Machine Learning as a Service) requires protecting data privacy while providing services
  2. Practical Requirements: Existing methods have excessively long inference times, making them difficult to meet practical application needs
  3. Model Compatibility: Privacy-preserving inference must be achieved without retraining models

Limitations of Existing Methods

  1. AAT Methods: Require model retraining and perform poorly on large-scale datasets
  2. PTA Methods: Use uniform high-degree polynomial approximations across all layers, resulting in prolonged inference times
  3. Computational Efficiency: Existing methods fail to account for different layers' varying impacts on classification accuracy

Research Motivation

Addressing the main bottleneck of PTA methods—excessive inference time—by proposing a systematic layerwise optimization framework that balances accuracy and efficiency through different polynomial degrees for different layers.

Core Contributions

  1. Proposed OLA Framework: First to introduce layerwise optimized approximation for PTA scenarios, using different polynomial degrees for each layer
  2. Distribution-Aware Approximation: Based on weighted least squares, considering actual input distributions of activation functions across layers
  3. Dynamic Programming Algorithm: Provides polynomial-time complexity optimization algorithm for optimal degree allocation
  4. Significant Performance Gains: Achieves 2.82-3.02× inference acceleration on ResNet and ConvNeXt models
  5. Theoretical Analysis: Provides complete mathematical foundation and convergence proofs

Methodology Details

Task Definition

Input: Pre-trained deep neural network models containing non-arithmetic activation functions Output: Optimal polynomial degree allocation for each layer Constraints: Inference time budget K, accuracy loss threshold Objective: Minimize average loss variance while satisfying time constraints

Model Architecture

1. Distribution-Aware Approximation (Theorem 1)

For activation function f(x) and input distribution φ(x), the optimal degree-d approximation polynomial is:

P_φ[d; f](x) = Σ(l=0 to d) h_l(x) ∫ φ(t)f(t)h_l(t)dt

where {h_l(x)} are orthogonal polynomial bases obtained through the Gram-Schmidt process.

2. Average Loss Variance Modeling

Treating approximation error as a random variable, the loss function variance is:

Var[ΔL] = Σ(i=1 to N_L) A_i E_φi[d_i; f]

where:

  • A_i = (1/N_T) Σ_k Σ_j (∂L/∂a_{i,j})²: Impact weight of layer i on accuracy
  • E_φid_i; f: Minimized MSE for layer i

3. Optimization Problem Formulation

minimize: V(d) = Σ(i=1 to N_L) A_i E_i(d_i)
subject to: T(d) = Σ(i=1 to N_L) T_i(d_i) ≤ K

4. Dynamic Programming Solution (Algorithm 1)

  • Time Complexity: O(N_L × N_K × |S|)
  • Space Complexity: O(N_L × N_K)
  • Recurrence Relation: P(l+1,k) constructed based on optimal solutions of {P(l,k')}

Technical Innovations

  1. Layerwise Differentiated Processing: First systematic approach to allocate different polynomial degrees to different layers
  2. Input Distribution Modeling: Uses actual inter-layer data distributions rather than theoretical distributions
  3. Scaled Distribution-Aware Approximation: Adjusts distribution variance through parameter r to improve approximation precision in low-probability regions
  4. Modulus Chain Management: Optimizes FHE parameters for different degrees, reducing bootstrapping overhead

Experimental Setup

Datasets

  • CIFAR-10/100: Small-scale image classification datasets
  • ImageNet: Large-scale image classification dataset
  • Preprocessing: Standardization and data augmentation

Evaluation Metrics

  • Inference Time: Actual execution time in FHE environment
  • Top-1 Accuracy: Classification accuracy
  • τ(d): Discrete-time delay indicator
  • Speedup Ratio: Time reduction relative to baseline

Comparison Methods

  • Minimax Approximation: Lee et al. 4's composite minimax polynomial method
  • Uniform Degree Method: All layers using same degree polynomials
  • AAT Methods: HyPHEN, DeepReDuce and other retraining methods

Implementation Details

  • FHE Scheme: RNS-CKKS
  • Security Level: 128-bit
  • Degree Search Space: S = {3,7,15,31,63,88,127,154,210,255,261,393,511,603,703,813,917,1023}
  • Discretization Unit: ν = 1/4
  • Library: Lattigo v3.0.5

Experimental Results

Main Results

ModelDatasetMethodAccuracy(%)τ(d)Speedup
ResNet-20CIFAR-10Minimax91.552,788-
ResNet-20CIFAR-10OLA90.691,1062.52×
ResNet-32CIFAR-10Minimax92.454,624-
ResNet-32CIFAR-10OLA91.691,9272.40×

FHE Actual Test Results:

  • ResNet-20: Inference time reduced from 1,231s to 407s (3.02× speedup)
  • ResNet-32: Inference time reduced from 1,913s to 679s (2.82× speedup)

Ablation Study

ComponentDistribution-AwareDynamic ProgrammingResNet-20 τ(d)ResNet-110 τ(d)
Baseline1,44021,172
+Distribution-Aware1,14210,725
+Dynamic Programming1,1069,448

Findings:

  • Distribution-aware approximation contributes the largest performance improvement
  • Dynamic programming shows more significant effects on deeper networks (ResNet-110 reduction of 11.91%)

ConvNeXt Model Results

  • ConvNeXt-T (CIFAR-10): Achieves 91.42% accuracy using only degree-3 polynomials
  • ConvNeXt-S (ImageNet): Achieves 84.64% accuracy using polynomials with degree ≤ 31

Preprocessing Overhead Analysis

DatasetModelInput Distribution Analysis(s)Dynamic Programming(s)
CIFAR-10ResNet-208.127.76
CIFAR-10ResNet-11017.97773.07
ImageNetResNet-189,510.946.23

HE-based PPML Research Directions

  1. PTA Methods: Lee et al. 4,5, Kim et al. 6 - Focus on linear operation optimization
  2. AAT Methods: HyPHEN 17, DeepReDuce 43 - Require model retraining
  3. Hybrid Methods: Schemes combining HE and MPC

Non-Arithmetic Operation Handling

  1. TFHE Schemes: Support bitwise operations with large memory overhead
  2. CKKS Schemes: Support packing, requiring function approximation
  3. Polynomial Approximation: Minimax, least squares and other methods

Advantages of This Work

  • First systematic framework for layerwise optimization
  • Complete theoretical foundation with sufficient experimental validation
  • Significant performance gains in PTA scenarios

Conclusions and Discussion

Main Conclusions

  1. Effectiveness of Layerwise Approximation: Different layers indeed have varying impacts on classification accuracy, making layerwise optimization justified
  2. Practical Improvement: Significant inference acceleration brings FHE-based PI closer to practical applications
  3. Theoretical Completeness: Provides complete mathematical framework and efficient solution algorithms

Limitations

  1. Preprocessing Overhead: For large-scale datasets (ImageNet), input distribution analysis requires considerable time
  2. Memory Requirements: Dynamic programming algorithm consumes significant memory for deep networks
  3. Activation Function Restrictions: Primarily targets univariate activation functions; extension to multivariate functions like softmax requires further development

Future Directions

  1. Transformer Support: Extension to privacy-preserving inference for large language models
  2. Multivariate Functions: Development of approximation methods for functions like softmax
  3. Adaptive Optimization: Dynamic approximation strategy adjustment based on hardware resources
  4. Federated Learning Integration: Combination with other privacy-preserving techniques

In-Depth Evaluation

Strengths

  1. Strong Innovation: First systematic solution to layerwise optimization in PTA
  2. Solid Theory: Rigorous mathematical derivations and complete theorem proofs
  3. Comprehensive Experiments: Full validation across multiple datasets and model architectures
  4. High Practical Value: Significant performance gains demonstrate practical application potential
  5. Clear Writing: Well-structured paper with accurate technical descriptions

Weaknesses

  1. Computational Complexity: While polynomial-time, may face challenges in ultra-large-scale networks
  2. Parameter Sensitivity: Selection of scaling parameter r requires empirical tuning
  3. Generalization Capability: Primarily validated on CNN architectures; applicability to other architectures requires further verification
  4. Security Analysis: Lacks in-depth analysis of additional security risks introduced by approximation

Impact

  1. Academic Contribution: Provides new optimization perspectives for FHE-based PPML
  2. Practical Value: Advances privacy-preserving AI toward practical applications
  3. Reproducibility: Provides detailed implementation details with open-source commitment
  4. Inspirational Significance: Layerwise optimization concepts extensible to other privacy-computing scenarios

Applicable Scenarios

  1. Cloud AI Services: Machine learning services requiring user data privacy protection
  2. Medical AI: Diagnostic systems processing sensitive medical data
  3. Financial Risk Control: Privacy-preserving credit assessment and risk analysis systems
  4. Federated Learning: Complementary technique for secure aggregation

References

  1. Lee et al. "Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed convolutions." ICML 2022.
  2. Kim et al. "Optimized privacy-preserving cnn inference with fully homomorphic encryption." IEEE TIFS 2023.
  3. Gilad-Bachrach et al. "Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy." ICML 2016.
  4. Cheon et al. "A full rns variant of approximate homomorphic encryption." SAC 2018.

Summary: The proposed OLA method holds significant importance in the FHE-based private inference domain. By achieving substantial inference efficiency improvements through layerwise optimization, it establishes an important foundation for practical applications of privacy-preserving AI. Despite certain limitations, its innovation and practical value make it an important contribution to the field.