End-to-End Neural Network Compression and Deployment for Hardware Acceleration Using PQuant and hls4ml
Authors/Creators
Description
As the demand for efficient machine learning on resource-limited devices grows, model compression techniques like pruning and quantization have become increasingly vital. Despite their importance, these methods are typically developed in isolation, and while some libraries attempt to offer unified interfaces for compression, they often lack support for deployment tools such as hls4ml. To bridge this gap, we developed PQuant, a Python library designed to streamline the process of training and compressing machine learning models. PQuant offers a unified interface for applying a range of pruning and quantization techniques, catering to users with minimal background in compression while still providing detailed configuration options for advanced use. Notably, it features built-in compatibility with hls4ml, enabling seamless deployment of compressed models on FPGA-based accelerators. This makes PQuant a versatile resource for both researchers exploring compression strategies and developers targeting efficient implementation on edge devices or custom hardware platforms. We will present the PQuant library, the performance of several compression algorithms implemented with it, and demonstrate the conversion flow of a neural network model from an uncompressed state to optimized firmware for FPGAs.
Files
PQuant_fastML.pdf
Files
(1.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:ed49469b02ff4ebae43c048f7fa89126
|
1.6 MB | Preview Download |
Additional details
Funding
- Schmidt Family Foundation
Conference
- Title
- Fast Machine Learning for Science Conference 2025
- Dates
- 1-5 September 2025