COLLIDE-2V - 750 Million Dual-View LHC Event Dataset for Low-Latency ML
Description
Modern foundation models (FMs) have pushed the frontiers of language, vision, and multi-model tasks by training ever-larger neural networks (NN) on unprecedented volumes of data. The use of FM models has yet to be established in collider physics, which both lack a comparably sized, general-purpose dataset on which to pre-train universal event representations, and a clear demonstrable need. Real-time event identification presents a possible need due to a requirement for fast event classification and selection of all possible collisions at the LHC. As a result, we construct a dual-view LHC collision dataset (COLLIDE-2V), a 50TB public dataset comprising ~750 million proton-proton events generated with MadGraph + Pythia + Delphes under High-Luminosity LHC conditions (<μ> = 200). Spanning everything from minimum-bias and γ+jets to top, Higgs, di-boson, multi-boson, exotic long-lived signatures and dark showers, the sample covers 50+ distinct processes and >99% of the CMS Run-3 trigger menu in a single coherent format. To allow for effective real-time event interpretation each event is provided twice, as Parquet files which retain physics-critical content:
- Offline - a full CMS-like reconstruction emulated by a tuned Delphes card
- L1T - a low-latency, lower-resolution view obtained via a custom Level-1 Trigger (L1T) parameterisation (degraded vertex, track and calorimeter performance, altered puppi, |η| ≤ 2.5 tracking, pT thresholds, etc.)
As a proof-of-concept, COLLIDE-2V supports a wide spectrum of research applications ranging from few-shot transfer learning, fine-tuning, pileup mitigation, detector-level generative modelling, cross-experiment benchmarking, to fast simulation surrogates and real-time trigger inference, and entirely novel anomaly-detection - thereby accelerating the shift from handcrafted topology cuts to data-driven decision making throughout the HL-LHC program.
Files
FM_Collide2V_EMoreno.pdf
Files
(10.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:065104c42f4d0548a0bba4edfad9707d
|
10.3 MB | Preview Download |
Additional details
Funding
- Schmidt Family Foundation
Conference
- Acronym
- FASTML25
- Dates
- 1-5 September 2025
- Place
- Zurich, Switzerland