Due to heavy workload and unexpected changes of course responsibilities,
this Multicore Computing course, originally announced for spring 2013,
was moved to early fall 2013.
The course starts on monday 26 august 2013 at 13:15 in room Donald Knuth.
Graduate (CUGS, CIS, ...) students interested in the areas of parallel computer architecture, parallel programming, software engineering, optimization, compiler construction, or algorithms and complexity.
The course is hosted by the CUGS graduate school as an Advanced Course.
Undergraduate and master-level students are kindly referred to the similar course TDDD56 Multicore and GPU Programming (ht2) instead.
Registration (internal access only).
(If you have no IDA research account, please contact the CUGS secretary, Anne Moe (annes \at ida.liu.se) for registration.)
List of participants 2013 (internal access only).
Lectures (ca. 32h), programming exercises, optional theoretical exercises for self-assessment, programming lab assignment, student presentations.
The course will be held as block course with two intensive weeks in Linköping.
The course was last given
in 2011 and 2009.
The course emphasizes fundamental aspects of shared-memory and accelerator-based parallel programming, such as shared memory parallel architecture concepts, programming models, performance models, parallel algorithmic paradigms, parallelization techniques and strategies, scheduling algorithms, optimization, composition of parallel programs, and concepts of modern parallel programming languages and systems. Practical exercises help to apply the theoretical concepts of the course to solve concrete problems in a real-world multicore system.
Data structures and algorithms (e.g. TDDB57) are absolutely required; knowledge in complexity theory and compiler construction is useful. Some basic knowledge of computer architecture is assumed. A basic course in concurrent programming (e.g. TDDB68) is recommended.
Programming in C and some familiarity with Linux (or similar OS) is necessary for the practical exercises.
Lecture block in 2 intensive weeks (weeks 35, 36) in Linköping:
monday 26/8 kl 13-16, tuesday, wednesday 9-12, 13:15-16, thursday 29/8 kl 9-12 monday 2/9 kl 13-16, tuesdag, wednesday kl 9-12, 13:15-16, thursday 5/9 kl 9-12Labs start in week 35+36 and continue v37, v38.
(still under construction, but the dates are fixed)
The mapping of topics to time slots as given below is approximate; if necessary, we might dynamically shift contents back and forth as appropriate.
Lectures are given in the room
Lab sessions are given in the room Konrad Zuse (IDA Multicore Lab)
|13:15-13:30||D. Knuth||C. Kessler||Organization and overview|
|13:30-16:00||D. Knuth||C. Kessler||
The Multicore Challenge:
Introduction to Multicore architecture concepts and trends.|
|27 aug||09:15-12:00||D. Knuth||C. Kessler||Parallel programming with threads and tasks.
Shared memory architecture concepts and performance issues
|13:30-15:00||D. Knuth||C. Kessler||Review of parallel computing theory, Part I|
PRAM model. Parallel time, cost, speedup. Amdahl's Law, Gustafsson's Law, Brent's Theorem. Scalability. NC. Basic data-parallel algorithmic building blocks: Parallel prefix sums, list ranking. Delay model. BSP model. LogP model. - Algorithmic design pattern: Parallel divide-and-conquer.
|15:30-17:00||D. Knuth / K. Zuse||N. Melot||Lab introduction and CPU lab kick-off|
|28 aug||09:15-12:00||D. Knuth||C. Kessler||
Parallel computing theory (cont.)
Parallel sorting algorithms
|13:15-16:00||D. Knuth||C. Kessler||Nonblocking synchronization|
|29 aug||08:15-10:00||D. Knuth||I. Ragnemalm||
|29 aug||10:15-12:00||K. Zuse|
|N. Melot||CPU lab session|
|2 sep||13:15-15:00||D. Knuth||N. Melot||Lesson (theory exercise solving session)
Please prepare the theory exercises before the lesson.
|2 sep||15:15-16:00||D. Knuth||C. Kessler||Algorithmic multithreading with Cilk|
|3 sep||09:15-10:30||D. Knuth||C. Kessler||
Generic parallel programming with skeletons:
Concepts and generative realization for GPUs with
|3 sep||10:45-12:00||D. Knuth||W. Löwe||Bridging models and machines: BSP, LogP. Slides (2011)|
|3 sep||13:15-14:30||D. Knuth||W. Löwe||Simulations between models. Slides (2011)|
|3 sep||14:45-17:00||D. Knuth||W. Löwe||
Static clustering and scheduling of task graphs.
Scheduling malleable tasks.
|4 sep||09:15-10:00||D. Knuth||C. Kessler||Algorithmic design pattern: On-Chip pipelining for Cell.|
|4 sep||10:15-12:00||D. Knuth /
Lab kick-off: GPU programming with CUDA and OpenCL
(continue on your own)
|4 sep||13:15-16:00||D. Knuth||C. Kessler||Parallelization of sequential programs
|5 sep||09:15-10:00||D. Knuth||C. Kessler||
Guidelines for Paper Presentations.
Distribution of papers for student presentations.
|5 sep||10:15-11:45||D. Knuth||C. Kessler||
Optimized composition of performance-aware parallel components.
Performance portability for heterogeneous multicore systems: The PEPPHER Approach.
|5 sep||11:45-12:00||D. Knuth||C. Kessler||Wrap-up and evaluation.|
|9 sep||10:00-12:00||K. Zuse||N. Melot||CPU lab session|
|13 sep||10:00-12:00||K. Zuse||U. Dastgeer||GPU lab session|
|17 sep||10:00-12:00||K. Zuse||U. Dastgeer||GPU lab session|
|18 sep||13:15-17:00||D. Knuth||Student presentation session|
|19 sep||10:00-12:00||K. Zuse||U. Dastgeer||GPU lab session|
|20 sep||13:00-16:00||D. Knuth||Written exam|
In 2013 we offer a single lab suite covering both CPU and GPU labs (3hp). Both parts need to be done.
Supervision and examination: Nicolas Melot
Prerequisites for the labs: C programming, Linux, and a basic course in concurrent programming and operating systems (e.g. TDDB68).
Supervision and examination: Usman Dastgeer
Some theory exercises are collected here.
See also the Literature list for TDDD56.
Background reading on the theory of parallel computation and on parallel algorithms:
Further literature references may be added later.
TEN1: Written exam, 3hp.
presentation, opposition, and written summary
(1-2 pages, deadline 4/10/2013) of the presented paper, 1.5hp.
7.5hp if all examination moments are fulfilled.
Partial credits can be given, too, but the written exam must be passed.
If you already have taken TDDD56 Multicore and GPU Programming earlier, which overlaps with most of this course, you need only do the presentation moment (UPG2) and can get 1.5hp for the course.
There is a partial overlap in contents with the earlier course FDA125 Advanced Parallel Programming which was last given 2007, corresponding to 1p = 1.5hp in the written exam.
It overlaps to a major extent (ca. 5hp) with the master-level course TDDD56 Multicore and GPU Programming and to a minor extent (1.5hp) with the master-level course TDDC78 Programming of Parallel Computers.
In contrast to FDA125 and TDDC78, we focus in this course on shared memory and multicore systems, also heterogeneous multicore and GPU architectures, and add more material on software engineering techniques for parallel systems, such as design patterns for synchronization and optimized composition of parallel components. We also give more emphasis to parallelization techniques that are especially useful for multicore systems, such as SIMD programming, speculative parallelization, transactional programming and lock-free synchronization. Scheduling for shared memory systems will be considered in more detail, in particular scheduling algorithms for malleable task graphs.