DF21500
Multicore Computing (7.5hp)
Architectures, Models, Methods, Languages

HT1/2013

Due to heavy workload and unexpected changes of course responsibilities, this Multicore Computing course, originally announced for spring 2013, was moved to early fall 2013.
The course starts on monday 26 august 2013 at 13:15 in room Donald Knuth.

Recommended for
Graduate (CUGS, CIS, ...) students interested in the areas of parallel computer architecture, parallel programming, software engineering, optimization, compiler construction, or algorithms and complexity.
The course is hosted by the CUGS graduate school as an Advanced Course.
Undergraduate and master-level students are kindly referred to the similar course TDDD56 Multicore and GPU Programming (ht2) instead.

Registration (internal access only).
(If you have no IDA research account, please contact the CUGS secretary, Anne Moe (annes \at ida.liu.se) for registration.)
List of participants 2013 (internal access only).

Organization
Lectures (ca. 32h), programming exercises, optional theoretical exercises for self-assessment, programming lab assignment, student presentations.
The course will be held as block course with two intensive weeks in Linköping.

The course was last given
in 2011 and 2009.

Goals
The course emphasizes fundamental aspects of shared-memory and accelerator-based parallel programming, such as shared memory parallel architecture concepts, programming models, performance models, parallel algorithmic paradigms, parallelization techniques and strategies, scheduling algorithms, optimization, composition of parallel programs, and concepts of modern parallel programming languages and systems. Practical exercises help to apply the theoretical concepts of the course to solve concrete problems in a real-world multicore system.

Prerequisites
Data structures and algorithms (e.g. TDDB57) are absolutely required; knowledge in complexity theory and compiler construction is useful. Some basic knowledge of computer architecture is assumed. A basic course in concurrent programming (e.g. TDDB68) is recommended.
Programming in C and some familiarity with Linux (or similar OS) is necessary for the practical exercises.

Contents/Schedule
Lecture block in 2 intensive weeks (weeks 35, 36) in Linköping:

monday 26/8 kl 13-16,
tuesday, wednesday 9-12, 13:15-16,
thursday 29/8 kl 9-12

monday 2/9 kl 13-16,
tuesdag, wednesday  kl 9-12, 13:15-16,
thursday 5/9 kl 9-12
Labs start in week 35+36 and continue v37, v38.
Student presentation day: wednesday 18/9 13:15-17:00 (Program)
Written exam: friday 20/9 13:00-16:00

Rooms:
Lectures in room Donald Knuth, Labs in IDA Multicore Lab.

Lecture Notes

are made available here for registered participants. Slides may be updated shortly before each lecture.

Schedule (prel.)

(still under construction, but the dates are fixed)

The mapping of topics to time slots as given below is approximate; if necessary, we might dynamically shift contents back and forth as appropriate.

Lectures are given in the room Donald Knuth.
Lab sessions are given in the room Konrad Zuse (IDA Multicore Lab)

DayTimeRoom LecturerTopic
26 aug
2013
13:15-13:30D. KnuthC. Kessler Organization and overview
13:30-16:00D. KnuthC. Kessler The Multicore Challenge: Introduction to Multicore architecture concepts and trends.
SIMD Computing.
27 aug09:15-12:00D. KnuthC. Kessler Parallel programming with threads and tasks.
Shared memory architecture concepts and performance issues
13:30-15:00D. Knuth C. Kessler Review of parallel computing theory, Part I
PRAM model. Parallel time, cost, speedup. Amdahl's Law, Gustafsson's Law, Brent's Theorem. Scalability. NC. Basic data-parallel algorithmic building blocks: Parallel prefix sums, list ranking. Delay model. BSP model. LogP model. - Algorithmic design pattern: Parallel divide-and-conquer.
15:30-17:00D. Knuth / K. ZuseN. Melot Lab introduction and CPU lab kick-off
28 aug09:15-12:00D. KnuthC. Kessler Parallel computing theory (cont.)
Parallel sorting algorithms
13:15-16:00D. KnuthC. Kessler Nonblocking synchronization
29 aug08:15-10:00D. KnuthI. Ragnemalm GPU Architecture,
CUDA.
29 aug10:15-12:00K. Zuse
(Multicore Lab)
N. Melot CPU lab session
DayTimeRoomLecturerTopic
2 sep13:15-15:00D. Knuth N. Melot Lesson (theory exercise solving session)
Please prepare the theory exercises before the lesson.
2 sep15:15-16:00D. Knuth C. Kessler Algorithmic multithreading with Cilk
3 sep09:15-10:30D. KnuthC. Kessler Generic parallel programming with skeletons: Concepts and generative realization for GPUs with SkePU.
Mid-term evaluation.
3 sep10:45-12:00D. KnuthW. Löwe Bridging models and machines: BSP, LogP. Slides (2011)
3 sep13:15-14:30D. KnuthW. Löwe Simulations between models. Slides (2011)
3 sep14:45-17:00D. KnuthW. Löwe Static clustering and scheduling of task graphs.
Scheduling malleable tasks.
Slides (2011)
4 sep09:15-10:00D. Knuth C. Kessler Algorithmic design pattern: On-Chip pipelining for Cell.
4 sep10:15-12:00D. Knuth /
K. Zuse
U. Dastgeer OpenCL;
GPU-Lesson and
Lab kick-off: GPU programming with CUDA and OpenCL
(continue on your own)
4 sep13:15-16:00D. KnuthC. Kessler Parallelization of sequential programs
Run-time parallelization
5 sep09:15-10:00D. KnuthC. Kessler Guidelines for Paper Presentations.
Distribution of papers for student presentations.
Paper assignment
5 sep10:15-11:45D. KnuthC. Kessler Autotuning overview.
Optimized composition of performance-aware parallel components.
Performance portability for heterogeneous multicore systems: The PEPPHER Approach.
5 sep11:45-12:00D. KnuthC. Kessler Wrap-up and evaluation.
DayTimeRoomLecturerTopic
9 sep 10:00-12:00 K. Zuse N. Melot CPU lab session
13 sep 10:00-12:00 K. Zuse U. Dastgeer GPU lab session
17 sep 10:00-12:00 K. Zuse U. Dastgeer GPU lab session
18 sep 13:15-17:00 D. Knuth Student presentation session
19 sep 10:00-12:00 K. Zuse U. Dastgeer GPU lab session
20 sep 13:00-16:00 D. Knuth Written exam
Remark: Breaks are not shown in this schedule, and will be set as appropriate.


Labs

In 2013 we offer a single lab suite covering both CPU and GPU labs (3hp). Both parts need to be done.

  1. CPU Lab: TBA

    Supervision and examination: Nicolas Melot

  2. GPU-Lab: Programming assignment for GPUs with CUDA or OpenCL.

    Prerequisites for the labs: C programming, Linux, and a basic course in concurrent programming and operating systems (e.g. TDDB68).

    Supervision and examination: Usman Dastgeer

Exercises

Some theory exercises are collected here.

Literature

See also the Literature list for TDDD56.

Background reading on the theory of parallel computation and on parallel algorithms:

Background reading on non-blocking data structures:

Further literature references may be added later.

Teachers

Examination

TEN1: Written exam, 3hp.

UPG1: Both CPU and GPU programming exercises solved correctly (lab reports accepted), 3hp

UPG2: Paper presentation, opposition, and written summary (1-2 pages, deadline 4/10/2013) of the presented paper, 1.5hp.

Credit
7.5hp if all examination moments are fulfilled.
Partial credits can be given, too, but the written exam must be passed.

If you already have taken TDDD56 Multicore and GPU Programming earlier, which overlaps with most of this course, you need only do the presentation moment (UPG2) and can get 1.5hp for the course.

Comments
There is a partial overlap in contents with the earlier course FDA125 Advanced Parallel Programming which was last given 2007, corresponding to 1p = 1.5hp in the written exam.
It overlaps to a major extent (ca. 5hp) with the master-level course TDDD56 Multicore and GPU Programming and to a minor extent (1.5hp) with the master-level course TDDC78 Programming of Parallel Computers.

In contrast to FDA125 and TDDC78, we focus in this course on shared memory and multicore systems, also heterogeneous multicore and GPU architectures, and add more material on software engineering techniques for parallel systems, such as design patterns for synchronization and optimized composition of parallel components. We also give more emphasis to parallelization techniques that are especially useful for multicore systems, such as SIMD programming, speculative parallelization, transactional programming and lock-free synchronization. Scheduling for shared memory systems will be considered in more detail, in particular scheduling algorithms for malleable task graphs.


This page is maintained by Christoph Kessler (chrke \at ida.liu.se)