2025-11-11T08:28:09.570070

Improving deep neural network performance through sampling

Ghantasala, Li, Jaiswal et al.

Energy efficient sampling with probabilistic neurons or p-bits has been demonstrated in the context of Boltzmann machines and it is natural to ask if these approaches can be extended to the field of generative AI where energy costs have become prohibitively large. However, this very active field is dominated by feedforward deep neural networks (DNNs) which primarily use multi-bit deterministic neurons with no role for sampling. In this paper we first show that it is feasible to obtain superior accuracy through the use of multiple samples generated by probabilistic networks. This possibility raises the question of which option is energetically preferable for improving accuracy: generating more samples, or adding more bits to a single deterministic sample. We provide a simple expression that can be used to estimate these energy tradeoffs and illustrate it with results for different algorithms and architectures.

academic

Improving deep neural network performance through sampling

Basic Information

Paper ID: 2507.07763
Title: Improving deep neural network performance through sampling
Authors: Lakshmi A. Ghantasala, Ming-Che Li, Risi Jaiswal, Behtash Behin-Aein, Joseph Makin, Shreyas Sen, Supriyo Datta
Classification: cond-mat.dis-nn
Publication Date: October 27, 2025 (arXiv preprint)
Institution: Purdue University Elmore School of Electrical and Computer Engineering
Paper Link: https://arxiv.org/abs/2507.07763

Abstract

This paper explores the possibility of extending energy-efficient sampling methods from probabilistic neurons (p-bits) in Boltzmann machines to the generative AI domain. Addressing the current limitation that deep neural networks primarily use multi-bit deterministic neurons without sampling mechanisms, the paper first demonstrates that multiple samples generated through probabilistic networks can achieve superior accuracy. It then poses a core question: to improve accuracy, is it more energy-efficient to generate more samples or to increase the bit precision of individual deterministic samples? The paper provides a simple energy consumption trade-off estimation expression and validates it through experimental results across different algorithms and architectures.

Research Background and Motivation

Problem Background

Energy Consumption Crisis: The energy cost of generative AI has reached prohibitive levels, urgently requiring energy-efficient optimization solutions
Technical Disparity: Probabilistic neurons (p-bits) in Boltzmann machines have demonstrated significant energy efficiency advantages, yet feedforward deep neural networks still primarily use multi-bit deterministic neurons
Sampling Deficiency: Current mainstream DNN architectures lack sampling mechanisms, limiting their capabilities in probabilistic inference

Research Motivation

Extending p-bits Applications: Extending the energy efficiency advantages of p-bits verified in Ising computing to the machine learning domain
Energy-Accuracy Trade-off: Systematically analyzing the energy consumption trade-off relationship between sampling quantity and bit precision
Unified Evaluation Framework: Establishing a universal energy consumption evaluation framework applicable to different probabilistic DNN implementations

Core Contributions

Proposed the probabilistic DNN (p-DNN) framework: Integrating p-bits into feedforward deep neural networks to enable sampling-based inference
Developed sample-aware training methods: Significantly improving probabilistic network performance through multi-sample averaging training strategies
Established an energy consumption analysis framework: Proposing a universal elementary operation energy model applicable to evaluating energy trade-offs across different architectures and algorithms
Verified practical feasibility: Through FPGA implementation validation, confirming the accuracy of theoretical analysis and demonstrating the practical value of the method
Provided quantitative insights: Demonstrating that only 2 samples are needed to surpass deterministic baselines, and 10 samples can match the accuracy of 3-bit deterministic models

Methodology Details

Task Definition

This paper investigates how to introduce probabilistic sampling mechanisms into deep neural networks to achieve better energy-accuracy trade-offs. Specifically, it includes:

Input: Traditional multi-bit deterministic DNN
Output: Probabilistic DNN based on p-bits, capable of generating multiple samples and improving performance through averaging
Constraints: Optimizing overall energy efficiency while maintaining or improving accuracy

Model Architecture

1. p-DNN Basic Building Blocks

The paper defines the basic operational unit of p-DNN (Figure 1), with its energy model as:

$\epsilon_{EO} = n b_w \epsilon_{wM} + (n+1) b_a \epsilon_{aM} + \epsilon_S(n, b_a, b_w) + \epsilon_N$

Where:

$\epsilon_{wM}, \epsilon_{aM}$ : Weight and activation memory access energy consumption
$\epsilon_S$ : Synaptic computation energy consumption
$\epsilon_N$ : Neuron energy consumption
$n$ : Fan-in connection count
$b_w, b_a$ : Weight and activation bit precision

2. Multi-sample Energy Model

For T samples, the energy model is revised to:

$\epsilon_{EO} = n b_w \epsilon_{wM} + T[(n+1) b_a \epsilon_{aM} + \epsilon_S(n, b_a, b_w) + \epsilon_N]$

This indicates that when weight loading energy dominates, the marginal cost of multiple samples is relatively low.

3. Sample-Aware Training Strategy

Forward Propagation: Adding randomness to activation functions at each layer to generate multiple samples
Loss Computation: Computing loss based on multi-sample averaged results
Backpropagation: Using straight-through estimators to handle gradients of stochastic activations

Technical Innovations

1. MAC to AC Simplification

Simplifying traditional multiply-accumulate (MAC) operations to accumulate-only (AC) operations:

Deterministic: $w_1x_1 + w_2x_2 + ... + w_nx_n$ (requires multiplication)
Probabilistic: Selectively accumulating weight subsets (addition only)

2. p-bit Activation Function

Employing probabilistic activation of the form $b = \text{sign}(\tanh(W) - \text{rand}\{-1,+1\})$ , where random numbers provide sampling randomness.

3. Noise Injection Method

Adding noise to already-trained deterministic models to obtain sampling benefits without requiring retraining.

Experimental Setup

Datasets

CIFAR-10: For image classification tasks, 50,000 training images, 10,000 test images
CelebA: For facial image generation, 162,770 training images, scaled to 64×64×3
MNIST: For FPGA verification experiments on digit generation tasks

Evaluation Metrics

Classification Tasks: Accuracy
Generation Tasks: Fréchet Inception Distance (FID)
Energy Metrics: Energy per inference (J/inference), energy gain ratio

Comparison Methods

32-bit deterministic DNN baseline
Quantized models with different bit widths (1-bit, 3-bit, etc.)
Random bit stream methods

Implementation Details

Optimizer: ADAM optimizer
Learning Rate: 1e-3 (classification), 1e-4 (generation)
Training Epochs: 1000 epochs
Batch Size: 64
Weight Initialization: Glorot initialization

Experimental Results

Main Results

1. Image Classification Performance

1 Sample: p-DNN matches 32-bit deterministic baseline accuracy
2 Samples: Surpasses deterministic baseline performance
10 Samples: Achieves accuracy level of 3-bit deterministic models

2. Image Generation Quality

Sample-Aware Training: Significantly improves generated image quality with FID scores approaching 32-bit baseline
Training-Testing Matching: Best results when training and testing use the same number of samples
Progressive Improvement: Image quality continuously improves with increasing sample count

3. Energy Analysis Results

Memory Dominance: DNN energy consumption is primarily determined by memory access, with computation accounting for a small portion
Sampling Advantage: In DRAM scenarios, adding one sample increases energy by only 0.7% but improves accuracy by 2%
Overall Gains: Under 1% accuracy tolerance, p-DNN achieves over 2x energy reduction compared to 32-bit DNN

Ablation Studies

1. Activation Function Comparison

Sigmoid vs Tanh: Both activation functions perform similarly in probabilistic models
Deterministic Differences: Tanh deterministic models perform poorly, highlighting the robustness of probabilistic models

2. Noise Injection Verification

No Retraining Required: Simple noise injection achieves performance improvement with 2 samples
Monotonic Improvement: Performance improvement is monotonic, demonstrating method stability

FPGA Verification Results

Energy Verification: Measured energy highly aligns with theoretical predictions (2.5x vs 2.3x gain)
Hardware Efficiency: MAC-related CLB LUT usage reduced by 2.9x
RNG Overhead: Random number generator energy and area overhead negligible in the overall system

p-bits and Ising Computing

Boltzmann Machine Applications: p-bits have demonstrated significant energy efficiency advantages in optimization and sampling problems
Hardware Implementation: Physical p-bits implementations based on s-MTJ, Zener diodes, etc.
Architecture Reuse: Existing BM hardware can be directly utilized for p-DNN implementation

Neural Network Quantization

Weight Quantization: Extensive work has reduced weight precision to 4 bits or lower
Activation Quantization: Activation quantization is relatively difficult, typically challenging to reduce below 8 bits without performance loss
Binary Networks: BinaryConnect, Binarized Neural Networks, and other 1-bit network methods

Stochastic Computing

Bit Stream Computing: Traditional methods using random bit streams to represent continuous signals
Fundamental Differences: p-DNN's sampling mechanism differs in principle from random bit streams

Conclusions and Discussion

Main Conclusions

Feasibility Verification: Probabilistic sampling can effectively improve DNN performance, with significant gains achievable from a small number of samples
Energy Advantages: In memory-dominated modern AI systems, the computational overhead of sampling is nearly negligible
Runtime Adjustability: p-DNN can dynamically adjust sample count at runtime, flexibly balancing energy and accuracy
Hardware Friendliness: Existing p-bit hardware architectures can directly support p-DNN implementation

Limitations

Sample Requirements: Some tasks may require numerous samples to achieve ideal performance
Training Complexity: Sample-aware training increases the complexity of the training process
Memory Dependency: Energy advantages largely depend on memory access costs being dominant
Application Scope: Primarily verified on vision tasks; applicability in other domains requires further investigation

Future Directions

Large Language Model Applications: Extending p-DNN to larger-scale models like LLMs
Analog Implementation: Exploring analog circuit-based p-bit implementations for further energy reduction
In-Memory Computing Integration: Combining with in-memory computing architectures to maximize energy efficiency advantages
Advanced Sampling Strategies: Developing sample combination methods beyond simple averaging

In-Depth Evaluation

Strengths

Strong Innovation: First systematic introduction of p-bits into feedforward DNNs, opening new research directions
Solid Theory: Provides a comprehensive energy analysis framework with strong generality and extensibility
Sufficient Experiments: Covers multiple tasks including classification and generation, with FPGA verification of practical feasibility
High Practical Value: Provides practically viable optimization solutions in the context of current AI energy crisis
In-Depth Analysis: Thoroughly analyzes memory vs. computation energy trade-offs, providing important insights

Limitations

Scale Constraints: Experiments primarily conducted on relatively small models; performance on large-scale models remains to be verified
Task Coverage: Mainly focused on vision tasks; applicability in other domains like NLP is unclear
Baseline Comparisons: Comparisons with latest quantization and compression methods are insufficient
Theoretical Analysis: Lacks deeper theoretical explanation for why small sample counts achieve significant improvements

Impact

Academic Value: Provides new ideas and methods for combining probabilistic computing with deep learning
Engineering Significance: Important guidance for AI hardware design, particularly in energy efficiency optimization
Industry Prospects: Broad application prospects in edge computing and mobile device AI applications

Applicable Scenarios

Resource-Constrained Environments: Mobile devices, IoT devices, and other energy-sensitive scenarios
Real-Time Inference: Applications requiring flexible trade-offs between latency and accuracy
Large-Scale Deployment: Data centers and scenarios requiring processing massive requests
Edge Computing: Edge devices with limited network bandwidth and computational resources

References

The paper cites multiple important related works, including:

Li et al. 2025 ISSCC: 65nm ASIC QMC implementation
Hubara et al.: Pioneering work on quantized neural networks
Courbariaux et al.: BinaryConnect binary neural networks
Jacob et al.: Integer quantization training methods

Overall Assessment: This is a high-quality research paper making important contributions at the intersection of probabilistic computing and deep learning. The paper not only proposes innovative technical solutions but also provides a comprehensive theoretical analysis framework and experimental validation, demonstrating strong academic value and practical significance. While there is room for improvement in certain aspects, overall it represents an important advancement in the field.