DF21500
Multicore Computing (7.5hp)
Architectures, Models, Methods, Languages

HT1/2013

Due to heavy workload and unexpected changes of course responsibilities, this Multicore Computing course, originally announced for spring 2013, was moved to early fall 2013.
The course starts on monday 26 august 2013 at 13:15 in room Donald Knuth.

Recommended for
Graduate (CUGS, CIS, ...) students interested in the areas of parallel computer architecture, parallel programming, software engineering, optimization, compiler construction, or algorithms and complexity.
The course is hosted by the CUGS graduate school as an Advanced Course.
Undergraduate and master-level students are kindly referred to the similar course TDDD56 Multicore and GPU Programming (ht2) instead.

Registration (internal access only).
(If you have no IDA research account, please contact the CUGS secretary, Anne Moe (annes \at ida.liu.se) for registration.)
List of participants 2013 (internal access only).

Organization
Lectures (ca. 32h), programming exercises, optional theoretical exercises for self-assessment, programming lab assignment, student presentations.
The course will be held as block course with two intensive weeks in Linköping.

The course was last given
in 2011 and 2009.

Goals
The course emphasizes fundamental aspects of shared-memory and accelerator-based parallel programming, such as shared memory parallel architecture concepts, programming models, performance models, parallel algorithmic paradigms, parallelization techniques and strategies, scheduling algorithms, optimization, composition of parallel programs, and concepts of modern parallel programming languages and systems. Practical exercises help to apply the theoretical concepts of the course to solve concrete problems in a real-world multicore system.

Prerequisites
Data structures and algorithms (e.g. TDDB57) are absolutely required; knowledge in complexity theory and compiler construction is useful. Some basic knowledge of computer architecture is assumed. A basic course in concurrent programming (e.g. TDDB68) is recommended.
Programming in C and some familiarity with Linux (or similar OS) is necessary for the practical exercises.

Contents/Schedule
Lecture block in 2 intensive weeks (weeks 35, 36) in Linköping:

monday 26/8 kl 13-16,
tuesday, wednesday 9-12, 13:15-16,
thursday 29/8 kl 9-12

monday 2/9 kl 13-16,
tuesdag, wednesday  kl 9-12, 13:15-16,
thursday 5/9 kl 9-12

Labs start in week 35+36 and continue v37, v38.
Student presentation day: wednesday 18/9 13:15-17:00 (Program)
Written exam: friday 20/9 13:00-16:00

Rooms:
Lectures in room Donald Knuth, Labs in IDA Multicore Lab.

Lecture Notes

are made available here for registered participants. Slides may be updated shortly before each lecture.

Schedule (prel.)

(still under construction, but the dates are fixed)

The mapping of topics to time slots as given below is approximate; if necessary, we might dynamically shift contents back and forth as appropriate.

Lectures are given in the room Donald Knuth.
Lab sessions are given in the room Konrad Zuse (IDA Multicore Lab)

Day	Time	Room	Lecturer	Topic
26 aug 2013	13:15-13:30	D. Knuth	C. Kessler	Organization and overview
	13:30-16:00	D. Knuth	C. Kessler	The Multicore Challenge: Introduction to Multicore architecture concepts and trends. SIMD Computing.
27 aug	09:15-12:00	D. Knuth	C. Kessler	Parallel programming with threads and tasks. Shared memory architecture concepts and performance issues
	13:30-15:00	D. Knuth	C. Kessler	Review of parallel computing theory, Part I PRAM model. Parallel time, cost, speedup. Amdahl's Law, Gustafsson's Law, Brent's Theorem. Scalability. NC. Basic data-parallel algorithmic building blocks: Parallel prefix sums, list ranking. Delay model. BSP model. LogP model. - Algorithmic design pattern: Parallel divide-and-conquer.
	15:30-17:00	D. Knuth / K. Zuse	N. Melot	Lab introduction and CPU lab kick-off
28 aug	09:15-12:00	D. Knuth	C. Kessler	Parallel computing theory (cont.) Parallel sorting algorithms
	13:15-16:00	D. Knuth	C. Kessler	Nonblocking synchronization
29 aug	08:15-10:00	D. Knuth	I. Ragnemalm	GPU Architecture, CUDA.
29 aug	10:15-12:00	K. Zuse (Multicore Lab)	N. Melot	CPU lab session
Day	Time	Room	Lecturer	Topic
2 sep	13:15-15:00	D. Knuth	N. Melot	Lesson (theory exercise solving session) Please prepare the theory exercises before the lesson.
2 sep	15:15-16:00	D. Knuth	C. Kessler	Algorithmic multithreading with Cilk
3 sep	09:15-10:30	D. Knuth	C. Kessler	Generic parallel programming with skeletons: Concepts and generative realization for GPUs with SkePU. Mid-term evaluation.
3 sep	10:45-12:00	D. Knuth	W. Löwe	Bridging models and machines: BSP, LogP. Slides (2011)
3 sep	13:15-14:30	D. Knuth	W. Löwe	Simulations between models. Slides (2011)
3 sep	14:45-17:00	D. Knuth	W. Löwe	Static clustering and scheduling of task graphs. Scheduling malleable tasks. Slides (2011)
4 sep	09:15-10:00	D. Knuth	C. Kessler	Algorithmic design pattern: On-Chip pipelining for Cell.
4 sep	10:15-12:00	D. Knuth / K. Zuse	U. Dastgeer	OpenCL; GPU-Lesson and Lab kick-off: GPU programming with CUDA and OpenCL (continue on your own)
4 sep	13:15-16:00	D. Knuth	C. Kessler	Parallelization of sequential programs Run-time parallelization
5 sep	09:15-10:00	D. Knuth	C. Kessler	Guidelines for Paper Presentations. Distribution of papers for student presentations. Paper assignment
5 sep	10:15-11:45	D. Knuth	C. Kessler	Autotuning overview. Optimized composition of performance-aware parallel components. Performance portability for heterogeneous multicore systems: The PEPPHER Approach.
5 sep	11:45-12:00	D. Knuth	C. Kessler	Wrap-up and evaluation.
Day	Time	Room	Lecturer	Topic
9 sep	10:00-12:00	K. Zuse	N. Melot	CPU lab session
13 sep	10:00-12:00	K. Zuse	U. Dastgeer	GPU lab session
17 sep	10:00-12:00	K. Zuse	U. Dastgeer	GPU lab session
18 sep	13:15-17:00	D. Knuth		Student presentation session
19 sep	10:00-12:00	K. Zuse	U. Dastgeer	GPU lab session
20 sep	13:00-16:00	D. Knuth		Written exam

Remark: Breaks are not shown in this schedule, and will be set as appropriate.

Labs

In 2013 we offer a single lab suite covering both CPU and GPU labs (3hp). Both parts need to be done.

CPU Lab: TBA
- Prerequisites for the labs: C programming, Linux, and a basic course in concurrent programming and operating systems (e.g. TDDB68).
- Lab-kickoff session: Mandatory presence
- Deadline: 23/9/2013
Supervision and examination: Nicolas Melot

GPU-Lab: Programming assignment for GPUs with CUDA or OpenCL.
- Mandatory introductory lecture and lesson
- several scheduled lab sessions, see above
- Lab presentation (PDF).
- Lab assignment (PDF). Deadline: 23/9/2013
- lab_src_code.tar.gz source code snippet for the GPU lab
Prerequisites for the labs: C programming, Linux, and a basic course in concurrent programming and operating systems (e.g. TDDB68).

Supervision and examination: Usman Dastgeer

Exercises

Some theory exercises are collected here.

Literature

See also the Literature list for TDDD56.

Background reading on the theory of parallel computation and on parallel algorithms:

J. Keller, C. Kessler, J. Träff: Practical PRAM Programming. Wiley Interscience, New York, 2001. ISBN 0-471-35351-5.
J. JaJa: An introduction to parallel algorithms. Addison-Wesley, 1992.
T. Cormen, C. Leiserson, R. Rivest: Introduction to Algorithms, Chapter 28 (Sorting Networks) and Chapter 30 (Algorithms for Parallel Computers). MIT press, 1989.
Jordan, Alaghband: Fundamentals of Parallel Processing. Prentice Hall, 2003.

Background reading on non-blocking data structures:

Nir Shavit: Data structures in the multicore age. Comm. ACM 54(3), pp. 76-84, March 2011.

Further literature references may be added later.

Teachers

Christoph Kessler, Linköpings universitet (course leader, lecturer, examiner)
Welf Löwe, Univ. Växjö (lecturer, examiner)
Ingemar Ragnemalm, ISY, Linköpings universitet (guest lecturer)
Usman Dastgeer (lab/lesson assistant)
Nicolas Melot (lab/lesson assistant)

Examination

TEN1: Written exam, 3hp.

Allowed aids: pocket calculator, English dictionary. No books, no slides, no papers.
Old exam (from FDA125 Advanced Parallel Programming, March 2007)

UPG1: Both CPU and GPU programming exercises solved correctly (lab reports accepted), 3hp

UPG2: Paper presentation, opposition, and written summary (1-2 pages, deadline 4/10/2013) of the presented paper, 1.5hp.

Credit
7.5hp if all examination moments are fulfilled.
Partial credits can be given, too, but the written exam must be passed.

If you already have taken TDDD56 Multicore and GPU Programming earlier, which overlaps with most of this course, you need only do the presentation moment (UPG2) and can get 1.5hp for the course.

Comments
There is a partial overlap in contents with the earlier course FDA125 Advanced Parallel Programming which was last given 2007, corresponding to 1p = 1.5hp in the written exam.
It overlaps to a major extent (ca. 5hp) with the master-level course TDDD56 Multicore and GPU Programming and to a minor extent (1.5hp) with the master-level course TDDC78 Programming of Parallel Computers.

In contrast to FDA125 and TDDC78, we focus in this course on shared memory and multicore systems, also heterogeneous multicore and GPU architectures, and add more material on software engineering techniques for parallel systems, such as design patterns for synchronization and optimized composition of parallel components. We also give more emphasis to parallelization techniques that are especially useful for multicore systems, such as SIMD programming, speculative parallelization, transactional programming and lock-free synchronization. Scheduling for shared memory systems will be considered in more detail, in particular scheduling algorithms for malleable task graphs.

This page is maintained by Christoph Kessler (chrke \at ida.liu.se)

DF21500 Multicore Computing (7.5hp) Architectures, Models, Methods, Languages

HT1/2013

Lecture Notes

Schedule (prel.)

Labs

DF21500
Multicore Computing (7.5hp)
Architectures, Models, Methods, Languages