Pushing Matrix-Vector Multiplication Performance on AMD AI Engines for Low-Latency Edge Inference
Authors/Creators
Description
Matrix-vector (GEMV) operations are a common building block in many deep learning models, particularly for large dense layers found in convolutional neural networks (CNNs) and multi-layer perceptrons (MLPs). Despite their importance, GEMV kernels have historically underperformed compared to matrix-matrix (GEMM) operations due to their lower arithmetic intensity and limited data reuse, making them harder to scale efficiently. This work presents the first comprehensive analysis and optimization of matrix-vector operations using AMD’s AI Engines on the latest AIE-ML architecture. It addresses key bottlenecks in deploying AI models that rely on such operations targetting low-latency edge inference, such as meeting the tight real-time requirements of the CERN trigger system. Our proposed GEMV kernel achieves high throughput and low latency through exploitation of the AI Engine array, scaling efficiently across tiles both horizontally and vertically through a custom placement strategy. Furthermore, we introduce a novel graph connection mechanism that enables efficient pipelining across multiple layers. The design is modular and can be easily integrated with other frameworks such as hls4ml in a straightforward manner. Our multi-layer implementation achieves close to microsecond-level latency, demonstrating its suitability for ultra-low-latency applications. These results make AMD's AI engines a realistic middle ground solution that can offer the scalability that FPGAs struggle to reach for large models, while maintaining the ultra-low latency that GPUs typically cannot provide.
Files
fastML_2025_dimitrios_danopoulos.pdf
Files
(2.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:55ae5a604d1217f5257e2816725ea467
|
2.0 MB | Preview Download |
Additional details
Funding
- Schmidt Family Foundation
Conference
- Title
- Fast Machine Learning for Science Conference 2025
- Dates
- 1-5 September 2025
- Place
- ETH Zurich