Announced May 31, 2012

Master thesis project (30hp):


 Parallelizing the NEMO ocean model application for GPU-based systems
           using the SkePU skeleton programming library


The highly regarded ocean model code NEMO has been identified by the
large pan-European project PRACE as being of particular interest in the
supercomputing context and climatology. NEMO has therefore attracted
considerable development resources within the PRACE partnership.
Consequently NEMO is now a highly parallelized and scalable piece of
scientific software fully capable of working in the petascale range
offered by the upper echelon of the Top500 supercomputers. Moving to the
exascale computing range however, is the next challenge to which the
race is on worldwide.

In order to reach exascale computing, new programming paradigms must be
applied in order to take advantage of the specialised floating point
hardware needed to take supercomputing there. Commonly to date this
specialised floating point hardware has been based on GPU designs
tailored for the computer games industry, but this is changing, both in
terms of how the GPUs are designed and the design of other floating
point accellerators from non-gaming chip producers. The rapidly changing
accelerator designs offer new challenges in terms of programming to make
efficient use of them.

In order to efficiently take advantage the new emerging chip
architectures with a minimum of duplicated programming labour, skeleton
programming for floating point accelerators in the form of SkePU,
developed by Prof. Christoph Kessler's group at Linköping University,
makes an attractive proposition. SkePU offers a high-level programming
abstraction layer consisting of an extension to C++ and a separate
dynamic optimising runtime system, which currently targets different
GPU's and lower-level languages and libraries such as CUDA and OpenCL.

This master thesis project is about helping NEMO reach the exascale of
computing. This will be done by preparing it for current and upcoming
floating point accelerators using the SkePU framework. The computational
heavy-lifting in NEMO is performed by a very limited set of Fortran 90
subroutines. In this project, these subroutines will be ported to C++,
the SkePU framework and subsequently optimised and benchmarked against
the original code. You will work with cutting-edge GPGPU hardware at NSC.

NEMO
----

 From a software perspective NEMO is a large Fortran 90 package
containing several ocean simulation models. The code is available from
URL http://www.nemo-ocean.eu. A typical NEMO model compilation deals
with around 800 subroutines in 250 source files, in total consisting of
around 100 000 lines of source code. The performance of NEMO depends on
parallel scaling as well as single node performance. The single node
performance depends on I/O performance and computational performance.
While I/O can be a bottleneck in some scenarios we are here interested
in improving computational performance using the GPU.

Profiling of NEMO runs show that most of its runtime is spent in only a
few subroutines. Apart from MPI communication time, these routines spend
most of their time in nested loops doing some map or map&reduce type of
operation. These loops should first be made into separate functions
which will internally use SkePU. We would like to explore whether SkePU's
programming model can be applied to these loops for effective GPU
acceleration.

SkePU
-----
Skeletons are pre-defined generic components for frequently
occurring data and control flow patterns for which parallel 
and/or accelerator-specific implementations may exist.
Skeletons are instantiated to functions by parameterizing
them in problem-specific user code, and operate on user data
wrapped in special container data structures, such as Vector.
Skeleton programming thus hides all platform-specific implementation
issues such as thread management, synchronization and communication,
leaving a sequential programming interface to the user.
However, not all applications can be easily expressed by the
provided set of skeletons and container data structures.

SkePU (http://www.ida.liu.se/~chrke/skepu) is a C++ template 
library for skeleton programming developed in the EU FP7 PEPPHER
project (http://www.peppher.eu). It provides 6 data-parallel and
one task-parallel skeleton, each with different implementations 
for different platforms, including sequential C, OpenMP, CUDA
and OpenCL. SkePU is tunable, i.e. it can automatically adapt
to new target platform configurations in order to optimize performance.

Project goal and tasks
----------------------
* Porting the most computationally intensive subroutines of NEMO to C++.
* Understanding the control and data flow patterns in these subroutines,
  and rewriting their code by equivalent combinations of SkePU skeletons.
* Evaluating the result (e.g. by comparing to CPU-only implementations,
  and studying various design alternatives).
* Demonstrating performance portability for different GPU configurations.
* Providing feedback and suggestions for the further design of SkePU.

Requirements
------------
* Good C/C++ coding skills are essential.
* Good knowledge of Fortran and MPI.
* Knowledge of C, Fortran mixed programming
* Knowledge of CUDA and/or OpenCL would be useful.
* We recommend TDDD56 Multicore and GPU Programming
  and TDDC78 Programming parallel computers.

Contact:
------------
* Supervisor at NSC:  Johan Raber, e-Science coordinator, NSC  (<lastname>@nsc.liu.se)
* Supervisor at IDA:  Usman Dastgeer, IDA  (<firstname>.<lastname>@liu.se)
* Examinator: Christoph Kessler, IDA  (<firstname>.<lastname>@liu.se)