Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption
Lee, Lee, Kim et al.
Recent studies have explored the deployment of privacy-preserving deep neural networks utilizing homomorphic encryption (HE), especially for private inference (PI). Many works have attempted the approximation-aware training (AAT) approach in PI, changing the activation functions of a model to low-degree polynomials that are easier to compute on HE by allowing model retraining. However, due to constraints in the training environment, it is often necessary to consider post-training approximation (PTA), using the pre-trained parameters of the existing plaintext model without retraining. Existing PTA studies have uniformly approximated the activation function in all layers to a high degree to mitigate accuracy loss from approximation, leading to significant time consumption. This study proposes an optimized layerwise approximation (OLA), a systematic framework that optimizes both accuracy loss and time consumption by using different approximation polynomials for each layer in the PTA scenario. For efficient approximation, we reflect the layerwise impact on the classification accuracy by considering the actual input distribution of each activation function while constructing the optimization problem. Additionally, we provide a dynamic programming technique to solve the optimization problem and achieve the optimized layerwise degrees in polynomial time. As a result, the OLA method reduces inference times for the ResNet-20 model and the ResNet-32 model by 3.02 times and 2.82 times, respectively, compared to prior state-of-the-art implementations employing uniform degree polynomials. Furthermore, we successfully classified CIFAR-10 by replacing the GELU function in the ConvNeXt model with only 3-degree polynomials using the proposed method, without modifying the backbone model.
academic
Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption
This paper proposes an Optimized Layerwise Approximation (OLA) method for efficient private inference on Fully Homomorphic Encryption (FHE). The method optimizes accuracy loss and computational time by employing different approximation polynomials for each layer in post-training approximation (PTA) scenarios, significantly improving inference efficiency. The OLA method reduces inference time for ResNet-20 and ResNet-32 models by 3.02× and 2.82× respectively, and successfully replaces the GELU function in ConvNeXt models with only degree-3 polynomials.
In privacy-preserving machine learning (PPML), Fully Homomorphic Encryption (FHE) enables direct computation on encrypted data. However, FHE schemes only support basic arithmetic operations (addition and multiplication) and cannot directly handle non-arithmetic activation functions (such as ReLU, GELU, sigmoid, etc.).
Growing Privacy Demands: With the development of cloud computing, MLaaS (Machine Learning as a Service) requires protecting data privacy while providing services
Practical Requirements: Existing methods have excessively long inference times, making them difficult to meet practical application needs
Model Compatibility: Privacy-preserving inference must be achieved without retraining models
Addressing the main bottleneck of PTA methods—excessive inference time—by proposing a systematic layerwise optimization framework that balances accuracy and efficiency through different polynomial degrees for different layers.
Input: Pre-trained deep neural network models containing non-arithmetic activation functions
Output: Optimal polynomial degree allocation for each layer
Constraints: Inference time budget K, accuracy loss threshold
Objective: Minimize average loss variance while satisfying time constraints
Layerwise Differentiated Processing: First systematic approach to allocate different polynomial degrees to different layers
Input Distribution Modeling: Uses actual inter-layer data distributions rather than theoretical distributions
Scaled Distribution-Aware Approximation: Adjusts distribution variance through parameter r to improve approximation precision in low-probability regions
Modulus Chain Management: Optimizes FHE parameters for different degrees, reducing bootstrapping overhead
Effectiveness of Layerwise Approximation: Different layers indeed have varying impacts on classification accuracy, making layerwise optimization justified
Practical Improvement: Significant inference acceleration brings FHE-based PI closer to practical applications
Theoretical Completeness: Provides complete mathematical framework and efficient solution algorithms
Preprocessing Overhead: For large-scale datasets (ImageNet), input distribution analysis requires considerable time
Memory Requirements: Dynamic programming algorithm consumes significant memory for deep networks
Activation Function Restrictions: Primarily targets univariate activation functions; extension to multivariate functions like softmax requires further development
Lee et al. "Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed convolutions." ICML 2022.
Kim et al. "Optimized privacy-preserving cnn inference with fully homomorphic encryption." IEEE TIFS 2023.
Gilad-Bachrach et al. "Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy." ICML 2016.
Cheon et al. "A full rns variant of approximate homomorphic encryption." SAC 2018.
Summary: The proposed OLA method holds significant importance in the FHE-based private inference domain. By achieving substantial inference efficiency improvements through layerwise optimization, it establishes an important foundation for practical applications of privacy-preserving AI. Despite certain limitations, its innovation and practical value make it an important contribution to the field.