Fourth Swedish Workshop on Multicore Computing
November 23-25, 2011, Linköping University
Tutorials (Thursday 23/11 afternoon)
StarPU: Exploiting heterogeneous, accelerator-based multicore machines
Speaker: Dr. Samuel Thibault, INRIA / LabRI, University of Bordeaux, FranceAbstract:
Heterogeneous accelerator-based parallel machines, featuring manycore CPUs and with GPU accelerators provide an unprecedented amount of processing power per node. Dealing with such a large number of heterogeneous processing units -- providing a highly unbalanced computing power -- is one of the biggest challenge that developers of HPC applications have to face. To fully tap into the potential of these heterogeneous machines, pure offloading approaches, that consist in running an application on regular processors while offloading part of the code on accelerators, are not sufficient.Speaker's bio:
This talk will present the StarPU project, which aims at providing portable optimized performance on clusters of heterogeneous multicore+accelerator machines to task-based applications. The goal is to relieve the programmer from the technical aspects of data management and task scheduling, while applying theoretical task scheduling algorithms on actual application execution to improve performance. It also provides performance feedback through task profiling and trace analysis. This approach has been used successfully, for instance, for integrating in a few weeks the PLASMA (CPUs) and MAGMA (GPUs) cholesky, QR and LU factorizations into a CPU+GPU implementation whose efficiency is very close to peak performance.
Dr. Samuel Thibault is an Assistant Professor in the University of Bordeaux, part of the Runtime INRIA Team, and one of the main architects of the StarPU, Marcel, and hwloc projects. His main research interests lie in parallel computing, scheduling on heterogeneous multiprocessor architectures (multicore, NUMA, GPU), and leveraging virtualization for HPC.
Offload C++ - Concepts and Application
Speaker: Dr. George Russell, Director of Quality Assurance at Codeplay Software Ltd, UK
This tutorial introduces and motivates Offload C++ ; a language, compiler, libraries and programming model for programming portable, parallel, high performance software for heterogeneous multi-core systems. It outlines some of the challenges of programming for heterogeneous multi-core systems, such as API complexity, portability and ease of development. Motivations for the programming model of Offload will be given, including increasing execution efficiency and reducing power consumption through exploitation of data locality.
A series of examples are given to introduce the fundamentals of Offload, with reference to the Cell BE processor and GPU devices. This will cover a range of programming models including multi-threading, loop-parallelisation, data parallel and task parallel programming models. Further examples will describe how to improve the performance of Offloaded code, and how to achieve performance portability.
The tutorial concludes with an overview of how, and to what extent, C++ programming can be brought to GPU devices, and a discussion of the ongoing research in this area as a part of the EU PEPPHER project and integration with the StarPU system.
George Russell, PhD, has variously been a Compiler Engineer, Team Lead, and Tester for compiler projects targeting heterogeneous multi-core processor and GPU architectures over the past six years at Codeplay. He has been involved in the development of Offload C++ for the Cell Linux platform, and in the research of targeting Offload C++ to OpenCL devices, such as GPUs.