Published May 15, 2024
| Version v1
Thesis
Open
Performance Auto-tuning Framework for GPU Applications
Description
Optimizing GPU applications for performance and portability across diverse architectures is challenging due to the complexity of GPU programming and hardware diversity. This thesis presents a performance auto-tuning framework for GPU applications to address these challenges. Efficient search techniques were developed to reduce the search space and computational overhead in auto-tuning GPU kernels. OpenTuner was extended to support GPU kernel autotuning, incorporating advanced search algorithms like basin hopping and Bayesian optimization. A machine learning-based search space reduction method using boosted trees predicted promising kernel parameter configurations. Multi-fidelity optimization balanced exploration and exploitation by evaluating configurations at different fidelity levels. An autotuning interface was developed and loosely integrated with the CMS Software (CMSSW) used in high-energy physics experiments. This loose coupling allows the autotuner to work with other software packages and enables other autotuners to optimize CMSSW, enhancing flexibility and portability. The framework was evaluated using real-world GPU kernels from CMSSW across different GPU architectures. A benchmarking methodology proposed by other researchers was applied to compare different search techniques, providing practical insights into autotuner benchmarking methodologies. The optimized kernels outperformed default configurations, improving execution speed and resource utilization. The framework effectively reduced the search space and computational overhead, meeting the objectives of enhancing performance and portability. Limitations include limited hardware diversity, focus on specific machine learning models, and emphasis on single-objective tuning without considering other factors like power efficiency. Future work involves expanding to other hardware architectures, experimenting with advanced machine learning and reinforcement learning techniques, applying the autotuner to different software packages, developing dynamic tuning mechanisms, and enhancing user interfaces.
Files
CERN-THESIS-2024-344.pdf
Files
(1.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:c2e572fd30e45d429cccd1497ebfc7c4
|
1.8 MB | Preview Download |
Additional details
Additional titles
- Translated title (English)
- إطار عمل للضبط الآلي لأداء برمجيات معالجات الرُّسوم
Identifiers
- CDS
- 2925698
- CDS Report Number
- CERN-THESIS-2024-344
Related works
- Is version of
- Thesis: 2900093 (Inspire)
CERN
- Department
- EP
- Programme
- No program participation
- Accelerator
- CERN LHC
- Experiment
- CMS