Cache-Aware Kernel Tiling: An Approach for System-Level Performance Optimization of GPU-Based Applications
2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), Florence, Italy
We present a software approach to address the data latency issue for certain GPU applications. Each application is modeled as a kernel graph, where the nodes represent individual GPU kernels and the edges capture data dependencies. Our technique exploits the GPU L2 cache to accelerate parameter passing between the kernels. The key idea is that, instead of having each kernel process the entire input in one invocation, we subdivide the input into fragments (which fit in the cache) and, ideally, process each fragment in one continuous sequence of kernel invocations. Our proposed technique is oblivious to kernel functionalities and requires minimal source code modification. We demonstrate our technique on a full-fledged image processing application and improve the performance on average by 30% over various settings.
[MCEP19] Arian Maghazeh, Sudipta Chattopadhyay, Petru Eles, Zebo Peng, "Cache-Aware Kernel Tiling: An Approach for System-Level Performance Optimization of GPU-Based Applications", 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), Florence, Italy