End-to-end hardware-aware model compression and deployment with PQuant and hls4ml
Authors/Creators
Description
Machine learning model compression techniques—such as pruning and quantization—are becoming increasingly important to optimize model execution, especially for resource-constrained devices. However, these techniques are developed independently of each other, and while there exist libraries that aim to unify these methods under a single interface, none of them offer integration with hardware deployment libraries such as hls4ml. To address this, we introduce PQuant, a Python library that simplifies the training and compression of machine learning models by providing an interface for applying a variety of pruning and quantization methods. PQuant is designed to be accessible to users without specialized knowledge of compression algorithms, while still offering deep configurability. It integrates with hls4ml, allowing compressed models to be directly utilized by FPGA-based accelerators. This makes it a valuable tool for both researchers comparing compression strategies and practitioners targeting efficient deployment on edge devices and custom hardware.
We present a Python library for training pruned and quantized machine learning models. The library includes multiple pruning methods, quantization and high-granularity quantization support, and it integrates with hls4ml for hardware deployment.
Files
PQuant_ACAT.pdf
Files
(1.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:aa4605766d16906c37b8139934cc3ccc
|
1.6 MB | Preview Download |
Additional details
Conference
- Acronym
- ACAT 2025
- Dates
- 8-12 September 2025
- Place
- Hamburg, Germany