End-to-End Neural Network Compression and Deployment for Hardware Acceleration Using PQuant and hls4ml

Niemi, Roope Oskari

doi:10.17181/sn26t-xpb02

Published September 2, 2025 | Version v1

Presentation Open

End-to-End Neural Network Compression and Deployment for Hardware Acceleration Using PQuant and hls4ml

Niemi, Roope Oskari

As the demand for efficient machine learning on resource-limited devices grows, model compression techniques like pruning and quantization have become increasingly vital. Despite their importance, these methods are typically developed in isolation, and while some libraries attempt to offer unified interfaces for compression, they often lack support for deployment tools such as hls4ml. To bridge this gap, we developed PQuant, a Python library designed to streamline the process of training and compressing machine learning models. PQuant offers a unified interface for applying a range of pruning and quantization techniques, catering to users with minimal background in compression while still providing detailed configuration options for advanced use. Notably, it features built-in compatibility with hls4ml, enabling seamless deployment of compressed models on FPGA-based accelerators. This makes PQuant a versatile resource for both researchers exploring compression strategies and developers targeting efficient implementation on edge devices or custom hardware platforms. We will present the PQuant library, the performance of several compression algorithms implemented with it, and demonstrate the conversion flow of a neural network model from an uncompressed state to optimized firmware for FPGAs.

Files

PQuant_fastML.pdf

Files (1.6 MB)

Name	Size	Download all
PQuant_fastML.pdf md5:ed49469b02ff4ebae43c048f7fa89126	1.6 MB	Preview Download

Additional details

Schmidt Family Foundation

Title: Fast Machine Learning for Science Conference 2025
Dates: 1-5 September 2025

https://indico.cern.ch/event/1496673/contributions/6637938/

	All versions	This version
Views	18	18
Downloads	17	17
Data volume	31.8 MB	31.8 MB

PQuant_fastML.pdf

Files (1.6 MB)

Funding

Conference

References

End-to-End Neural Network Compression and Deployment for Hardware Acceleration Using PQuant and hls4ml

Authors/Creators

Description

Files

PQuant_fastML.pdf

Files (1.6 MB)

Additional details

Funding

Conference

References