End-to-end hardware-aware model compression and deployment with PQuant and hls4ml

Niemi, Roope Oskari

doi:10.17181/fd9sk-qn790

Published September 9, 2025 | Version v1

Presentation Open

End-to-end hardware-aware model compression and deployment with PQuant and hls4ml

Niemi, Roope Oskari

Machine learning model compression techniques—such as pruning and quantization—are becoming increasingly important to optimize model execution, especially for resource-constrained devices. However, these techniques are developed independently of each other, and while there exist libraries that aim to unify these methods under a single interface, none of them offer integration with hardware deployment libraries such as hls4ml. To address this, we introduce PQuant, a Python library that simplifies the training and compression of machine learning models by providing an interface for applying a variety of pruning and quantization methods. PQuant is designed to be accessible to users without specialized knowledge of compression algorithms, while still offering deep configurability. It integrates with hls4ml, allowing compressed models to be directly utilized by FPGA-based accelerators. This makes it a valuable tool for both researchers comparing compression strategies and practitioners targeting efficient deployment on edge devices and custom hardware.

We present a Python library for training pruned and quantized machine learning models. The library includes multiple pruning methods, quantization and high-granularity quantization support, and it integrates with hls4ml for hardware deployment.

Files

PQuant_ACAT.pdf

Files (1.6 MB)

Name	Size	Download all
PQuant_ACAT.pdf md5:aa4605766d16906c37b8139934cc3ccc	1.6 MB	Preview Download

Additional details

Acronym: ACAT 2025
Dates: 8-12 September 2025
Place: Hamburg, Germany

https://indico.cern.ch/event/1488410/contributions/6562822/

	All versions	This version
Views	13	13
Downloads	13	13
Data volume	24.3 MB	24.3 MB

PQuant_ACAT.pdf

Files (1.6 MB)

Conference

References

End-to-end hardware-aware model compression and deployment with PQuant and hls4ml

Authors/Creators

Description

Files

PQuant_ACAT.pdf

Files (1.6 MB)

Additional details

Conference

References