Software Model Checking for GPGPU Programs, Towards a Verification Tool

Unmesh Bordoloi, Ahmed Rezine

The tremendous computing power GPUs are capable of makes of them the epicenter of an unprecedented attention for applications other than graphics and gaming. Apart from the highly parallel nature of the programs to be run on GPUs, the sought after gain in computing power is only achieved with low level tuning at threads level and is therefore veryerror prone. In fact the level of intricacy involved when writing such programs is already a problem and will become a major bottleneck in spreading the technology. Only very recent and rare works started looking into using formal methods for helping GPU programmers avoiding errors like data races, incorrect synchronizations or assertions violations. These are at their infancy and directly import techniques adapted for other (sequential) systems with simple approximations for concurrency. Besides that, theonly help we are aware of right now takes a concrete input and explores a tiny portion of the possible thread scheduling looking for such errors. This easily misses common errors and makes of GPU programming a nightmare task. There is therefore still a lot of work to do in order to come up with helpful and scalable tools for today's and tomorrow's GPGPU software. We state in this paper our intention in building in Linköping a agship verication tool that will take CUDA code and track and report, with minimal assistance from the programmer, errors like data races, incorrect synchronizations or assertions violations. In order to achieve this ambitious and vital goal for the widespread of GPU programming, webuild on our experience using and implementing CUDA and GPU code and on our latest work in the verication of multicore and concurrent programs. In fact, GPU programs like those written in CUDA are suitable for verication as they typically neither manipulate pointer arithmetics nor allowrecursion. This restricts the focus to concurrency and array manipulation, combined with intra and inter procedural analysis. To give a avor of where we start from, we report on our experiments in automatically verifying two synchronization algorithms that appeared in a recent paper proposing effiient barriers for inter-block synchronization. Unlike any other verication approach for GPU programs,we can show that the algorithms neither deadlock nor violate the barrier condition regardless of the number of threads. We also capture bugs in case basic relations are violated between the number of blocks and the number of threads per block.

Published in urn:nbn:se:liu:diva-76411,

Last version (pdf) 2011