Linköping University: Students Alumni Trade and Industry/Society Internal Search
arima65_DATE2019

Cache-Aware Kernel Tiling: An Approach for System-Level Performance Optimization of GPU-Based Applications

Arian Maghazeh
 
Sudipta Chattopadhyay
Petru Eles Author homepage
 
Zebo Peng Author homepage

2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), Florence, Italy

ABSTRACT
We present a software approach to address the data latency issue for certain GPU applications. Each application is modeled as a kernel graph, where the nodes represent individual GPU kernels and the edges capture data dependencies. Our technique exploits the GPU L2 cache to accelerate parameter passing between the kernels. The key idea is that, instead of having each kernel process the entire input in one invocation, we subdivide the input into fragments (which fit in the cache) and, ideally, process each fragment in one continuous sequence of kernel invocations. Our proposed technique is oblivious to kernel functionalities and requires minimal source code modification. We demonstrate our technique on a full-fledged image processing application and improve the performance on average by 30% over various settings.


[MCEP19] Arian Maghazeh, Sudipta Chattopadhyay, Petru Eles, Zebo Peng, "Cache-Aware Kernel Tiling: An Approach for System-Level Performance Optimization of GPU-Based Applications", 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), Florence, Italy
( ! ) perl script by Giovanni Squillero with modifications from Gert Jervan   (v3.1, p5.2, September-2002-)