Published September 8, 2025 | Version v1
Presentation Open

TMVA SOFIE: Enhancements in ML Inference through graph optimizations and heterogeneous architectures

  • 1. ROR icon European Organization for Nuclear Research
  • 2. University of Alabama (US)
  • 3. The University of Manchester (GB)
  • 4. CERN

Description

With the upcoming High-Luminosity upgrades at the LHC, data generation rates are expected to increase significantly. This calls for highly efficient architectures for machine learning inference in experimental workflows like event reconstruction, simulation, and data analysis.
At the ML4EP team at CERN, we have developed SOFIE, a tool within the ROOT/TMVA package that translates externally trained deep learning models—such as those in ONNX format or trained in Keras or PyTorch—into an intermediate representation(IR). This IR is then used to generate optimized C++ code for fast and lightweight inference, with BLAS as the only external dependency. The generated code can be embedded in any project, allowing inference functions to be called on event data that also allows user-defined modifications. This makes SOFIE both efficient and flexible for integration into high-energy physics workflows.
SOFIE supports a broad range of ML operators, primarily based on the ONNX standard, as well as additional operations common in other frameworks and custom user-defined functions. It also supports inference for in-memory graph neural network models trained with DeepMind’s Graph Nets library. The tool has been successfully validated on experiment models such as ParticleNet, ATLAS GNNs, and Smart Pixels.
Recent developments in SOFIE include performance gains through Structure-of-Arrays-based memory allocation, enabling memory reuse, extensibility to GPU memory, and support for user-provided memory handlers. Together with operator fusion and kernel-level optimizations, these enhancements significantly reduce data movement and improve inference latency.
SOFIE now also supports portable GPU inference via integrations with SYCL and ALPAKA, using backends such as cuBLAS (for NVIDIA) and rocBLAS (for AMD). This enables users the flexibility to select GPU stacks based on platform preference. We present recent optimizations and heterogeneous computing support in SOFIE, benchmarking its performance against other inference frameworks to demonstrate its efficiency and portability.

Files

ACAT25-SOFIE.pdf

Files (4.5 MB)

Name Size Download all
md5:49f3cd10865a49b498d249d1c92047e5
4.5 MB Preview Download

Additional details

Funding

Schmidt Family Foundation

Conference

Acronym
ACAT2025
Dates
8-12 September 2025
Place
Hamburg, Germany