Systematic detection of memory related performance bottlenecks in GPGPU programs
Journal of Systems Architecture
Graphics processing units (GPUs) pose an attractive choice for designing high-performance and energy-efficient software systems. This is because GPUs are capable of executing massively parallel applications. However, the performance of GPUs is limited by the contention in memory subsystems, often resulting in substantial delays and effectively reducing the parallelism. In this paper, we propose GRAB, an automated debugger to aid the development of efficient GPU kernels. GRAB systematically detects, classifies and discovers the root causes of memory-performance bottlenecks in GPUs. We have implemented GRAB and evaluated it with several open-source GPU kernels, including two real-life case studies. We show the usage of GRAB through improvement of GPU kernels on a real NVIDIA Tegra K1 hardware – a widely used GPU for mobile and handheld devices. The guidance obtained from GRAB leads to an overall improvement of up to 64%.
[HCEP16] Adrian Horga, Sudipta Chattopadhyay, Petru Eles, Zebo Peng, "Systematic detection of memory related performance bottlenecks in GPGPU programs", Journal of Systems Architecture