Energy efficient sampling with probabilistic neurons or p-bits has been demonstrated in the context of Boltzmann machines and it is natural to ask if these approaches can be extended to the field of generative AI where energy costs have become prohibitively large. However, this very active field is dominated by feedforward deep neural networks (DNNs) which primarily use multi-bit deterministic neurons with no role for sampling. In this paper we first show that it is feasible to obtain superior accuracy through the use of multiple samples generated by probabilistic networks. This possibility raises the question of which option is energetically preferable for improving accuracy: generating more samples, or adding more bits to a single deterministic sample. We provide a simple expression that can be used to estimate these energy tradeoffs and illustrate it with results for different algorithms and architectures.
- Paper ID: 2507.07763
- Title: Improving deep neural network performance through sampling
- Authors: Lakshmi A. Ghantasala, Ming-Che Li, Risi Jaiswal, Behtash Behin-Aein, Joseph Makin, Shreyas Sen, Supriyo Datta
- Classification: cond-mat.dis-nn
- Publication Date: October 27, 2025 (arXiv preprint)
- Institution: Purdue University Elmore School of Electrical and Computer Engineering
- Paper Link: https://arxiv.org/abs/2507.07763
This paper explores the possibility of extending energy-efficient sampling methods from probabilistic neurons (p-bits) in Boltzmann machines to the generative AI domain. Addressing the current limitation that deep neural networks primarily use multi-bit deterministic neurons without sampling mechanisms, the paper first demonstrates that multiple samples generated through probabilistic networks can achieve superior accuracy. It then poses a core question: to improve accuracy, is it more energy-efficient to generate more samples or to increase the bit precision of individual deterministic samples? The paper provides a simple energy consumption trade-off estimation expression and validates it through experimental results across different algorithms and architectures.
- Energy Consumption Crisis: The energy cost of generative AI has reached prohibitive levels, urgently requiring energy-efficient optimization solutions
- Technical Disparity: Probabilistic neurons (p-bits) in Boltzmann machines have demonstrated significant energy efficiency advantages, yet feedforward deep neural networks still primarily use multi-bit deterministic neurons
- Sampling Deficiency: Current mainstream DNN architectures lack sampling mechanisms, limiting their capabilities in probabilistic inference
- Extending p-bits Applications: Extending the energy efficiency advantages of p-bits verified in Ising computing to the machine learning domain
- Energy-Accuracy Trade-off: Systematically analyzing the energy consumption trade-off relationship between sampling quantity and bit precision
- Unified Evaluation Framework: Establishing a universal energy consumption evaluation framework applicable to different probabilistic DNN implementations
- Proposed the probabilistic DNN (p-DNN) framework: Integrating p-bits into feedforward deep neural networks to enable sampling-based inference
- Developed sample-aware training methods: Significantly improving probabilistic network performance through multi-sample averaging training strategies
- Established an energy consumption analysis framework: Proposing a universal elementary operation energy model applicable to evaluating energy trade-offs across different architectures and algorithms
- Verified practical feasibility: Through FPGA implementation validation, confirming the accuracy of theoretical analysis and demonstrating the practical value of the method
- Provided quantitative insights: Demonstrating that only 2 samples are needed to surpass deterministic baselines, and 10 samples can match the accuracy of 3-bit deterministic models
This paper investigates how to introduce probabilistic sampling mechanisms into deep neural networks to achieve better energy-accuracy trade-offs. Specifically, it includes:
- Input: Traditional multi-bit deterministic DNN
- Output: Probabilistic DNN based on p-bits, capable of generating multiple samples and improving performance through averaging
- Constraints: Optimizing overall energy efficiency while maintaining or improving accuracy
The paper defines the basic operational unit of p-DNN (Figure 1), with its energy model as:
ϵEO=nbwϵwM+(n+1)baϵaM+ϵS(n,ba,bw)+ϵN
Where:
- ϵwM,ϵaM: Weight and activation memory access energy consumption
- ϵS: Synaptic computation energy consumption
- ϵN: Neuron energy consumption
- n: Fan-in connection count
- bw,ba: Weight and activation bit precision
For T samples, the energy model is revised to:
ϵEO=nbwϵwM+T[(n+1)baϵaM+ϵS(n,ba,bw)+ϵN]
This indicates that when weight loading energy dominates, the marginal cost of multiple samples is relatively low.
- Forward Propagation: Adding randomness to activation functions at each layer to generate multiple samples
- Loss Computation: Computing loss based on multi-sample averaged results
- Backpropagation: Using straight-through estimators to handle gradients of stochastic activations
Simplifying traditional multiply-accumulate (MAC) operations to accumulate-only (AC) operations:
- Deterministic: w1x1+w2x2+...+wnxn (requires multiplication)
- Probabilistic: Selectively accumulating weight subsets (addition only)
Employing probabilistic activation of the form b=sign(tanh(W)−rand{−1,+1}), where random numbers provide sampling randomness.
Adding noise to already-trained deterministic models to obtain sampling benefits without requiring retraining.
- CIFAR-10: For image classification tasks, 50,000 training images, 10,000 test images
- CelebA: For facial image generation, 162,770 training images, scaled to 64×64×3
- MNIST: For FPGA verification experiments on digit generation tasks
- Classification Tasks: Accuracy
- Generation Tasks: Fréchet Inception Distance (FID)
- Energy Metrics: Energy per inference (J/inference), energy gain ratio
- 32-bit deterministic DNN baseline
- Quantized models with different bit widths (1-bit, 3-bit, etc.)
- Random bit stream methods
- Optimizer: ADAM optimizer
- Learning Rate: 1e-3 (classification), 1e-4 (generation)
- Training Epochs: 1000 epochs
- Batch Size: 64
- Weight Initialization: Glorot initialization
- 1 Sample: p-DNN matches 32-bit deterministic baseline accuracy
- 2 Samples: Surpasses deterministic baseline performance
- 10 Samples: Achieves accuracy level of 3-bit deterministic models
- Sample-Aware Training: Significantly improves generated image quality with FID scores approaching 32-bit baseline
- Training-Testing Matching: Best results when training and testing use the same number of samples
- Progressive Improvement: Image quality continuously improves with increasing sample count
- Memory Dominance: DNN energy consumption is primarily determined by memory access, with computation accounting for a small portion
- Sampling Advantage: In DRAM scenarios, adding one sample increases energy by only 0.7% but improves accuracy by 2%
- Overall Gains: Under 1% accuracy tolerance, p-DNN achieves over 2x energy reduction compared to 32-bit DNN
- Sigmoid vs Tanh: Both activation functions perform similarly in probabilistic models
- Deterministic Differences: Tanh deterministic models perform poorly, highlighting the robustness of probabilistic models
- No Retraining Required: Simple noise injection achieves performance improvement with 2 samples
- Monotonic Improvement: Performance improvement is monotonic, demonstrating method stability
- Energy Verification: Measured energy highly aligns with theoretical predictions (2.5x vs 2.3x gain)
- Hardware Efficiency: MAC-related CLB LUT usage reduced by 2.9x
- RNG Overhead: Random number generator energy and area overhead negligible in the overall system
- Boltzmann Machine Applications: p-bits have demonstrated significant energy efficiency advantages in optimization and sampling problems
- Hardware Implementation: Physical p-bits implementations based on s-MTJ, Zener diodes, etc.
- Architecture Reuse: Existing BM hardware can be directly utilized for p-DNN implementation
- Weight Quantization: Extensive work has reduced weight precision to 4 bits or lower
- Activation Quantization: Activation quantization is relatively difficult, typically challenging to reduce below 8 bits without performance loss
- Binary Networks: BinaryConnect, Binarized Neural Networks, and other 1-bit network methods
- Bit Stream Computing: Traditional methods using random bit streams to represent continuous signals
- Fundamental Differences: p-DNN's sampling mechanism differs in principle from random bit streams
- Feasibility Verification: Probabilistic sampling can effectively improve DNN performance, with significant gains achievable from a small number of samples
- Energy Advantages: In memory-dominated modern AI systems, the computational overhead of sampling is nearly negligible
- Runtime Adjustability: p-DNN can dynamically adjust sample count at runtime, flexibly balancing energy and accuracy
- Hardware Friendliness: Existing p-bit hardware architectures can directly support p-DNN implementation
- Sample Requirements: Some tasks may require numerous samples to achieve ideal performance
- Training Complexity: Sample-aware training increases the complexity of the training process
- Memory Dependency: Energy advantages largely depend on memory access costs being dominant
- Application Scope: Primarily verified on vision tasks; applicability in other domains requires further investigation
- Large Language Model Applications: Extending p-DNN to larger-scale models like LLMs
- Analog Implementation: Exploring analog circuit-based p-bit implementations for further energy reduction
- In-Memory Computing Integration: Combining with in-memory computing architectures to maximize energy efficiency advantages
- Advanced Sampling Strategies: Developing sample combination methods beyond simple averaging
- Strong Innovation: First systematic introduction of p-bits into feedforward DNNs, opening new research directions
- Solid Theory: Provides a comprehensive energy analysis framework with strong generality and extensibility
- Sufficient Experiments: Covers multiple tasks including classification and generation, with FPGA verification of practical feasibility
- High Practical Value: Provides practically viable optimization solutions in the context of current AI energy crisis
- In-Depth Analysis: Thoroughly analyzes memory vs. computation energy trade-offs, providing important insights
- Scale Constraints: Experiments primarily conducted on relatively small models; performance on large-scale models remains to be verified
- Task Coverage: Mainly focused on vision tasks; applicability in other domains like NLP is unclear
- Baseline Comparisons: Comparisons with latest quantization and compression methods are insufficient
- Theoretical Analysis: Lacks deeper theoretical explanation for why small sample counts achieve significant improvements
- Academic Value: Provides new ideas and methods for combining probabilistic computing with deep learning
- Engineering Significance: Important guidance for AI hardware design, particularly in energy efficiency optimization
- Industry Prospects: Broad application prospects in edge computing and mobile device AI applications
- Resource-Constrained Environments: Mobile devices, IoT devices, and other energy-sensitive scenarios
- Real-Time Inference: Applications requiring flexible trade-offs between latency and accuracy
- Large-Scale Deployment: Data centers and scenarios requiring processing massive requests
- Edge Computing: Edge devices with limited network bandwidth and computational resources
The paper cites multiple important related works, including:
- Li et al. 2025 ISSCC: 65nm ASIC QMC implementation
- Hubara et al.: Pioneering work on quantized neural networks
- Courbariaux et al.: BinaryConnect binary neural networks
- Jacob et al.: Integer quantization training methods
Overall Assessment: This is a high-quality research paper making important contributions at the intersection of probabilistic computing and deep learning. The paper not only proposes innovative technical solutions but also provides a comprehensive theoretical analysis framework and experimental validation, demonstrating strong academic value and practical significance. While there is room for improvement in certain aspects, overall it represents an important advancement in the field.