Published September 9, 2025 | Version v1
Presentation Open

End-to-end hardware-aware model compression and deployment with PQuant and hls4ml

Authors/Creators

Description

Machine learning model compression techniques—such as pruning and quantization—are becoming increasingly important to optimize model execution, especially for resource-constrained devices. However, these techniques are developed independently of each other, and while there exist libraries that aim to unify these methods under a single interface, none of them offer integration with hardware deployment libraries such as hls4ml. To address this, we introduce PQuant, a Python library that simplifies the training and compression of machine learning models by providing an interface for applying a variety of pruning and quantization methods. PQuant is designed to be accessible to users without specialized knowledge of compression algorithms, while still offering deep configurability. It integrates with hls4ml, allowing compressed models to be directly utilized by FPGA-based accelerators. This makes it a valuable tool for both researchers comparing compression strategies and practitioners targeting efficient deployment on edge devices and custom hardware.

We present a Python library for training pruned and quantized machine learning models. The library includes multiple pruning methods, quantization and high-granularity quantization support, and it integrates with hls4ml for hardware deployment.

Files

PQuant_ACAT.pdf

Files (1.6 MB)

Name Size Download all
md5:aa4605766d16906c37b8139934cc3ccc
1.6 MB Preview Download

Additional details

Conference

Acronym
ACAT 2025
Dates
8-12 September 2025
Place
Hamburg, Germany