2025-11-11T08:28:09.570070

Improving deep neural network performance through sampling

Ghantasala, Li, Jaiswal et al.
Energy efficient sampling with probabilistic neurons or p-bits has been demonstrated in the context of Boltzmann machines and it is natural to ask if these approaches can be extended to the field of generative AI where energy costs have become prohibitively large. However, this very active field is dominated by feedforward deep neural networks (DNNs) which primarily use multi-bit deterministic neurons with no role for sampling. In this paper we first show that it is feasible to obtain superior accuracy through the use of multiple samples generated by probabilistic networks. This possibility raises the question of which option is energetically preferable for improving accuracy: generating more samples, or adding more bits to a single deterministic sample. We provide a simple expression that can be used to estimate these energy tradeoffs and illustrate it with results for different algorithms and architectures.
academic

Improving deep neural network performance through sampling

Basic Information

  • Paper ID: 2507.07763
  • Title: Improving deep neural network performance through sampling
  • Authors: Lakshmi A. Ghantasala, Ming-Che Li, Risi Jaiswal, Behtash Behin-Aein, Joseph Makin, Shreyas Sen, Supriyo Datta
  • Classification: cond-mat.dis-nn
  • Publication Date: October 27, 2025 (arXiv preprint)
  • Institution: Purdue University Elmore School of Electrical and Computer Engineering
  • Paper Link: https://arxiv.org/abs/2507.07763

Abstract

This paper explores the possibility of extending energy-efficient sampling methods from probabilistic neurons (p-bits) in Boltzmann machines to the generative AI domain. Addressing the current limitation that deep neural networks primarily use multi-bit deterministic neurons without sampling mechanisms, the paper first demonstrates that multiple samples generated through probabilistic networks can achieve superior accuracy. It then poses a core question: to improve accuracy, is it more energy-efficient to generate more samples or to increase the bit precision of individual deterministic samples? The paper provides a simple energy consumption trade-off estimation expression and validates it through experimental results across different algorithms and architectures.

Research Background and Motivation

Problem Background

  1. Energy Consumption Crisis: The energy cost of generative AI has reached prohibitive levels, urgently requiring energy-efficient optimization solutions
  2. Technical Disparity: Probabilistic neurons (p-bits) in Boltzmann machines have demonstrated significant energy efficiency advantages, yet feedforward deep neural networks still primarily use multi-bit deterministic neurons
  3. Sampling Deficiency: Current mainstream DNN architectures lack sampling mechanisms, limiting their capabilities in probabilistic inference

Research Motivation

  1. Extending p-bits Applications: Extending the energy efficiency advantages of p-bits verified in Ising computing to the machine learning domain
  2. Energy-Accuracy Trade-off: Systematically analyzing the energy consumption trade-off relationship between sampling quantity and bit precision
  3. Unified Evaluation Framework: Establishing a universal energy consumption evaluation framework applicable to different probabilistic DNN implementations

Core Contributions

  1. Proposed the probabilistic DNN (p-DNN) framework: Integrating p-bits into feedforward deep neural networks to enable sampling-based inference
  2. Developed sample-aware training methods: Significantly improving probabilistic network performance through multi-sample averaging training strategies
  3. Established an energy consumption analysis framework: Proposing a universal elementary operation energy model applicable to evaluating energy trade-offs across different architectures and algorithms
  4. Verified practical feasibility: Through FPGA implementation validation, confirming the accuracy of theoretical analysis and demonstrating the practical value of the method
  5. Provided quantitative insights: Demonstrating that only 2 samples are needed to surpass deterministic baselines, and 10 samples can match the accuracy of 3-bit deterministic models

Methodology Details

Task Definition

This paper investigates how to introduce probabilistic sampling mechanisms into deep neural networks to achieve better energy-accuracy trade-offs. Specifically, it includes:

  • Input: Traditional multi-bit deterministic DNN
  • Output: Probabilistic DNN based on p-bits, capable of generating multiple samples and improving performance through averaging
  • Constraints: Optimizing overall energy efficiency while maintaining or improving accuracy

Model Architecture

1. p-DNN Basic Building Blocks

The paper defines the basic operational unit of p-DNN (Figure 1), with its energy model as:

ϵEO=nbwϵwM+(n+1)baϵaM+ϵS(n,ba,bw)+ϵN\epsilon_{EO} = n b_w \epsilon_{wM} + (n+1) b_a \epsilon_{aM} + \epsilon_S(n, b_a, b_w) + \epsilon_N

Where:

  • ϵwM,ϵaM\epsilon_{wM}, \epsilon_{aM}: Weight and activation memory access energy consumption
  • ϵS\epsilon_S: Synaptic computation energy consumption
  • ϵN\epsilon_N: Neuron energy consumption
  • nn: Fan-in connection count
  • bw,bab_w, b_a: Weight and activation bit precision

2. Multi-sample Energy Model

For T samples, the energy model is revised to:

ϵEO=nbwϵwM+T[(n+1)baϵaM+ϵS(n,ba,bw)+ϵN]\epsilon_{EO} = n b_w \epsilon_{wM} + T[(n+1) b_a \epsilon_{aM} + \epsilon_S(n, b_a, b_w) + \epsilon_N]

This indicates that when weight loading energy dominates, the marginal cost of multiple samples is relatively low.

3. Sample-Aware Training Strategy

  • Forward Propagation: Adding randomness to activation functions at each layer to generate multiple samples
  • Loss Computation: Computing loss based on multi-sample averaged results
  • Backpropagation: Using straight-through estimators to handle gradients of stochastic activations

Technical Innovations

1. MAC to AC Simplification

Simplifying traditional multiply-accumulate (MAC) operations to accumulate-only (AC) operations:

  • Deterministic: w1x1+w2x2+...+wnxnw_1x_1 + w_2x_2 + ... + w_nx_n (requires multiplication)
  • Probabilistic: Selectively accumulating weight subsets (addition only)

2. p-bit Activation Function

Employing probabilistic activation of the form b=sign(tanh(W)rand{1,+1})b = \text{sign}(\tanh(W) - \text{rand}\{-1,+1\}), where random numbers provide sampling randomness.

3. Noise Injection Method

Adding noise to already-trained deterministic models to obtain sampling benefits without requiring retraining.

Experimental Setup

Datasets

  1. CIFAR-10: For image classification tasks, 50,000 training images, 10,000 test images
  2. CelebA: For facial image generation, 162,770 training images, scaled to 64×64×3
  3. MNIST: For FPGA verification experiments on digit generation tasks

Evaluation Metrics

  • Classification Tasks: Accuracy
  • Generation Tasks: Fréchet Inception Distance (FID)
  • Energy Metrics: Energy per inference (J/inference), energy gain ratio

Comparison Methods

  • 32-bit deterministic DNN baseline
  • Quantized models with different bit widths (1-bit, 3-bit, etc.)
  • Random bit stream methods

Implementation Details

  • Optimizer: ADAM optimizer
  • Learning Rate: 1e-3 (classification), 1e-4 (generation)
  • Training Epochs: 1000 epochs
  • Batch Size: 64
  • Weight Initialization: Glorot initialization

Experimental Results

Main Results

1. Image Classification Performance

  • 1 Sample: p-DNN matches 32-bit deterministic baseline accuracy
  • 2 Samples: Surpasses deterministic baseline performance
  • 10 Samples: Achieves accuracy level of 3-bit deterministic models

2. Image Generation Quality

  • Sample-Aware Training: Significantly improves generated image quality with FID scores approaching 32-bit baseline
  • Training-Testing Matching: Best results when training and testing use the same number of samples
  • Progressive Improvement: Image quality continuously improves with increasing sample count

3. Energy Analysis Results

  • Memory Dominance: DNN energy consumption is primarily determined by memory access, with computation accounting for a small portion
  • Sampling Advantage: In DRAM scenarios, adding one sample increases energy by only 0.7% but improves accuracy by 2%
  • Overall Gains: Under 1% accuracy tolerance, p-DNN achieves over 2x energy reduction compared to 32-bit DNN

Ablation Studies

1. Activation Function Comparison

  • Sigmoid vs Tanh: Both activation functions perform similarly in probabilistic models
  • Deterministic Differences: Tanh deterministic models perform poorly, highlighting the robustness of probabilistic models

2. Noise Injection Verification

  • No Retraining Required: Simple noise injection achieves performance improvement with 2 samples
  • Monotonic Improvement: Performance improvement is monotonic, demonstrating method stability

FPGA Verification Results

  • Energy Verification: Measured energy highly aligns with theoretical predictions (2.5x vs 2.3x gain)
  • Hardware Efficiency: MAC-related CLB LUT usage reduced by 2.9x
  • RNG Overhead: Random number generator energy and area overhead negligible in the overall system

p-bits and Ising Computing

  • Boltzmann Machine Applications: p-bits have demonstrated significant energy efficiency advantages in optimization and sampling problems
  • Hardware Implementation: Physical p-bits implementations based on s-MTJ, Zener diodes, etc.
  • Architecture Reuse: Existing BM hardware can be directly utilized for p-DNN implementation

Neural Network Quantization

  • Weight Quantization: Extensive work has reduced weight precision to 4 bits or lower
  • Activation Quantization: Activation quantization is relatively difficult, typically challenging to reduce below 8 bits without performance loss
  • Binary Networks: BinaryConnect, Binarized Neural Networks, and other 1-bit network methods

Stochastic Computing

  • Bit Stream Computing: Traditional methods using random bit streams to represent continuous signals
  • Fundamental Differences: p-DNN's sampling mechanism differs in principle from random bit streams

Conclusions and Discussion

Main Conclusions

  1. Feasibility Verification: Probabilistic sampling can effectively improve DNN performance, with significant gains achievable from a small number of samples
  2. Energy Advantages: In memory-dominated modern AI systems, the computational overhead of sampling is nearly negligible
  3. Runtime Adjustability: p-DNN can dynamically adjust sample count at runtime, flexibly balancing energy and accuracy
  4. Hardware Friendliness: Existing p-bit hardware architectures can directly support p-DNN implementation

Limitations

  1. Sample Requirements: Some tasks may require numerous samples to achieve ideal performance
  2. Training Complexity: Sample-aware training increases the complexity of the training process
  3. Memory Dependency: Energy advantages largely depend on memory access costs being dominant
  4. Application Scope: Primarily verified on vision tasks; applicability in other domains requires further investigation

Future Directions

  1. Large Language Model Applications: Extending p-DNN to larger-scale models like LLMs
  2. Analog Implementation: Exploring analog circuit-based p-bit implementations for further energy reduction
  3. In-Memory Computing Integration: Combining with in-memory computing architectures to maximize energy efficiency advantages
  4. Advanced Sampling Strategies: Developing sample combination methods beyond simple averaging

In-Depth Evaluation

Strengths

  1. Strong Innovation: First systematic introduction of p-bits into feedforward DNNs, opening new research directions
  2. Solid Theory: Provides a comprehensive energy analysis framework with strong generality and extensibility
  3. Sufficient Experiments: Covers multiple tasks including classification and generation, with FPGA verification of practical feasibility
  4. High Practical Value: Provides practically viable optimization solutions in the context of current AI energy crisis
  5. In-Depth Analysis: Thoroughly analyzes memory vs. computation energy trade-offs, providing important insights

Limitations

  1. Scale Constraints: Experiments primarily conducted on relatively small models; performance on large-scale models remains to be verified
  2. Task Coverage: Mainly focused on vision tasks; applicability in other domains like NLP is unclear
  3. Baseline Comparisons: Comparisons with latest quantization and compression methods are insufficient
  4. Theoretical Analysis: Lacks deeper theoretical explanation for why small sample counts achieve significant improvements

Impact

  1. Academic Value: Provides new ideas and methods for combining probabilistic computing with deep learning
  2. Engineering Significance: Important guidance for AI hardware design, particularly in energy efficiency optimization
  3. Industry Prospects: Broad application prospects in edge computing and mobile device AI applications

Applicable Scenarios

  1. Resource-Constrained Environments: Mobile devices, IoT devices, and other energy-sensitive scenarios
  2. Real-Time Inference: Applications requiring flexible trade-offs between latency and accuracy
  3. Large-Scale Deployment: Data centers and scenarios requiring processing massive requests
  4. Edge Computing: Edge devices with limited network bandwidth and computational resources

References

The paper cites multiple important related works, including:

  • Li et al. 2025 ISSCC: 65nm ASIC QMC implementation
  • Hubara et al.: Pioneering work on quantized neural networks
  • Courbariaux et al.: BinaryConnect binary neural networks
  • Jacob et al.: Integer quantization training methods

Overall Assessment: This is a high-quality research paper making important contributions at the intersection of probabilistic computing and deep learning. The paper not only proposes innovative technical solutions but also provides a comprehensive theoretical analysis framework and experimental validation, demonstrating strong academic value and practical significance. While there is room for improvement in certain aspects, overall it represents an important advancement in the field.