Announced May 31, 2012 Master thesis project (30hp): Parallelizing the NEMO ocean model application for GPU-based systems using the SkePU skeleton programming library The highly regarded ocean model code NEMO has been identified by the large pan-European project PRACE as being of particular interest in the supercomputing context and climatology. NEMO has therefore attracted considerable development resources within the PRACE partnership. Consequently NEMO is now a highly parallelized and scalable piece of scientific software fully capable of working in the petascale range offered by the upper echelon of the Top500 supercomputers. Moving to the exascale computing range however, is the next challenge to which the race is on worldwide. In order to reach exascale computing, new programming paradigms must be applied in order to take advantage of the specialised floating point hardware needed to take supercomputing there. Commonly to date this specialised floating point hardware has been based on GPU designs tailored for the computer games industry, but this is changing, both in terms of how the GPUs are designed and the design of other floating point accellerators from non-gaming chip producers. The rapidly changing accelerator designs offer new challenges in terms of programming to make efficient use of them. In order to efficiently take advantage the new emerging chip architectures with a minimum of duplicated programming labour, skeleton programming for floating point accelerators in the form of SkePU, developed by Prof. Christoph Kessler's group at Linköping University, makes an attractive proposition. SkePU offers a high-level programming abstraction layer consisting of an extension to C++ and a separate dynamic optimising runtime system, which currently targets different GPU's and lower-level languages and libraries such as CUDA and OpenCL. This master thesis project is about helping NEMO reach the exascale of computing. This will be done by preparing it for current and upcoming floating point accelerators using the SkePU framework. The computational heavy-lifting in NEMO is performed by a very limited set of Fortran 90 subroutines. In this project, these subroutines will be ported to C++, the SkePU framework and subsequently optimised and benchmarked against the original code. You will work with cutting-edge GPGPU hardware at NSC. NEMO ---- From a software perspective NEMO is a large Fortran 90 package containing several ocean simulation models. The code is available from URL http://www.nemo-ocean.eu. A typical NEMO model compilation deals with around 800 subroutines in 250 source files, in total consisting of around 100 000 lines of source code. The performance of NEMO depends on parallel scaling as well as single node performance. The single node performance depends on I/O performance and computational performance. While I/O can be a bottleneck in some scenarios we are here interested in improving computational performance using the GPU. Profiling of NEMO runs show that most of its runtime is spent in only a few subroutines. Apart from MPI communication time, these routines spend most of their time in nested loops doing some map or map&reduce type of operation. These loops should first be made into separate functions which will internally use SkePU. We would like to explore whether SkePU's programming model can be applied to these loops for effective GPU acceleration. SkePU ----- Skeletons are pre-defined generic components for frequently occurring data and control flow patterns for which parallel and/or accelerator-specific implementations may exist. Skeletons are instantiated to functions by parameterizing them in problem-specific user code, and operate on user data wrapped in special container data structures, such as Vector. Skeleton programming thus hides all platform-specific implementation issues such as thread management, synchronization and communication, leaving a sequential programming interface to the user. However, not all applications can be easily expressed by the provided set of skeletons and container data structures. SkePU (http://www.ida.liu.se/~chrke/skepu) is a C++ template library for skeleton programming developed in the EU FP7 PEPPHER project (http://www.peppher.eu). It provides 6 data-parallel and one task-parallel skeleton, each with different implementations for different platforms, including sequential C, OpenMP, CUDA and OpenCL. SkePU is tunable, i.e. it can automatically adapt to new target platform configurations in order to optimize performance. Project goal and tasks ---------------------- * Porting the most computationally intensive subroutines of NEMO to C++. * Understanding the control and data flow patterns in these subroutines, and rewriting their code by equivalent combinations of SkePU skeletons. * Evaluating the result (e.g. by comparing to CPU-only implementations, and studying various design alternatives). * Demonstrating performance portability for different GPU configurations. * Providing feedback and suggestions for the further design of SkePU. Requirements ------------ * Good C/C++ coding skills are essential. * Good knowledge of Fortran and MPI. * Knowledge of C, Fortran mixed programming * Knowledge of CUDA and/or OpenCL would be useful. * We recommend TDDD56 Multicore and GPU Programming and TDDC78 Programming parallel computers. Contact: ------------ * Supervisor at NSC: Johan Raber, e-Science coordinator, NSC (@nsc.liu.se) * Supervisor at IDA: Usman Dastgeer, IDA (.@liu.se) * Examinator: Christoph Kessler, IDA (.@liu.se)