Published May 15, 2024 | Version v1
Thesis Open

Performance Auto-tuning Framework for GPU Applications

Authors/Creators

  • 1. University of Bahrain

Contributors

  • 1. ROR icon University of Bahrain

Description

Optimizing GPU applications for performance and portability across diverse architectures is challenging due to the complexity of GPU programming and hardware diversity. This thesis presents a performance auto-tuning framework for GPU applications to address these challenges. Efficient search techniques were developed to reduce the search space and computational overhead in auto-tuning GPU kernels. OpenTuner was extended to support GPU kernel autotuning, incorporating advanced search algorithms like basin hopping and Bayesian optimization. A machine learning-based search space reduction method using boosted trees predicted promising kernel parameter configurations. Multi-fidelity optimization balanced exploration and exploitation by evaluating configurations at different fidelity levels. An autotuning interface was developed and loosely integrated with the CMS Software (CMSSW) used in high-energy physics experiments. This loose coupling allows the autotuner to work with other software packages and enables other autotuners to optimize CMSSW, enhancing flexibility and portability. The framework was evaluated using real-world GPU kernels from CMSSW across different GPU architectures. A benchmarking methodology proposed by other researchers was applied to compare different search techniques, providing practical insights into autotuner benchmarking methodologies. The optimized kernels outperformed default configurations, improving execution speed and resource utilization. The framework effectively reduced the search space and computational overhead, meeting the objectives of enhancing performance and portability. Limitations include limited hardware diversity, focus on specific machine learning models, and emphasis on single-objective tuning without considering other factors like power efficiency. Future work involves expanding to other hardware architectures, experimenting with advanced machine learning and reinforcement learning techniques, applying the autotuner to different software packages, developing dynamic tuning mechanisms, and enhancing user interfaces.

Files

CERN-THESIS-2024-344.pdf

Files (1.8 MB)

Name Size Download all
md5:c2e572fd30e45d429cccd1497ebfc7c4
1.8 MB Preview Download

Additional details

Additional titles

Translated title (English)
إطار عمل للضبط الآلي لأداء برمجيات معالجات الرُّسوم

Identifiers

CDS
2925698
CDS Report Number
CERN-THESIS-2024-344

Related works

Is version of
Thesis: 2900093 (Inspire)

CERN

Department
EP
Programme
No program participation
Accelerator
CERN LHC
Experiment
CMS