Linköping University: Students Alumni Trade and Industry/Society Internal Search

Systematic detection of memory related performance bottlenecks in GPGPU programs

Adrian Horga
Sudipta Chattopadhyay
Petru Eles Author homepage
Zebo Peng Author homepage

Journal of Systems Architecture

Graphics processing units (GPUs) pose an attractive choice for designing high-performance and energy-efficient software systems. This is because GPUs are capable of executing massively parallel applications. However, the performance of GPUs is limited by the contention in memory subsystems, often resulting in substantial delays and effectively reducing the parallelism. In this paper, we propose GRAB, an automated debugger to aid the development of efficient GPU kernels. GRAB systematically detects, classifies and discovers the root causes of memory-performance bottlenecks in GPUs. We have implemented GRAB and evaluated it with several open-source GPU kernels, including two real-life case studies. We show the usage of GRAB through improvement of GPU kernels on a real NVIDIA Tegra K1 hardware – a widely used GPU for mobile and handheld devices. The guidance obtained from GRAB leads to an overall improvement of up to 64%.

[HCEP16] Adrian Horga, Sudipta Chattopadhyay, Petru Eles, Zebo Peng, "Systematic detection of memory related performance bottlenecks in GPGPU programs", Journal of Systems Architecture
( ! ) perl script by Giovanni Squillero with modifications from Gert Jervan   (v3.1, p5.2, September-2002-)