TDDD56 Multicore and GPU Programming
Timetable and Lecture Plan
Schedule (as available on the LiU schedule server)
Certain lecture notes and other handouts with restricted access
are located here.
The lecture notes and other material may be updated during the course as appropriate.
Christoph Kessler (CK), Ingemar Ragnemalm (IR).
Assistants: August Ernstsson (AE), N.N. (NN) Ingemar Ragnemalm (IR),
Those lectures marked by asterisks overlap fully (**) or partly (*) with similar lectures in TDDC78 Programming of Parallel Computers - Methods and Tools. These lectures are optional for those who have already taken TDDC78, but might be a useful repetition anyway. This repetition of the common core topics is necessary to allow you to take the courses individually or in arbitrary order.
- Lecture 1:
Organization, Overview. (Slides PDF 3x2), (Slides PDF 2x1)
Motivation: The Multicore Challenge. Multicore Architecture Concepts. (CK)
Lecture 2: (**)
Shared memory architecture concepts and performance issues. (CK)
- Lecture 3:
Parallel Programming with Threads and Tasks. (CK)
- Lecture 3 (cont., 45min) (CK)
- Lecture 4:
Non-blocking synchronization. (CK)
- Lecture 5: (*)
Theory: Parallel programming and cost models. Analysis of parallel algorithms.
Lecture 6: (*)
Theory (cont.): Brent's Theorem. Speedup anomalies. Amdahl's Law. Fundamental parallel algorithms: parallel prefix sums, parallel list ranking. (CK)
- Lecture 7:
Parallel sorting algorithms: Simple parallel quicksort, Fully parallel quicksort, Parallel samplesort, Bitonic sort, Parallel Mergesort. (CK)
- Lecture 8: (45min)
Parallel algorithmic design patterns: Towards skeleton programming. (AE)
Lesson 2: (45min)
Introduction to skeleton programming in SkePU, and to CPU Lab 3. (AE)
- Lecture 9:
GPU architecture and trends (IR)
- Lecture 10:
Introduction to CUDA programming. (IR)
- Lecture 11:
CUDA programming. GPU lab introduction. (IR)
- Lecture 12:
Sorting on GPU. Advanced CUDA issues. (IR)
- Lecture 13:
Introduction to OpenCL. (IR)
- Lesson 3:
OpenCL. Shader programming.
Selected exercises. (IR)
- Lesson 4:
Selected CPU/theory exercises. (AE)
Please solve suggested exercises in advance to be prepared. See our compendium on Design and Analysis of Parallel Algorithms for background information, important definitions, and further exercises.
- Lecture 14: (**)
Parallelization of sequential programs. (CK)
We have two lab passes,
see the schedule.
During each lab pass there are 3 resp. 2 lab groups in parallel.
- Group_A: up to 48 students in total:
32 students in room System-och-Bild-labbet (2C:525B) (groups A1b, A2)
plus up to 16 students (group A1a in room IDA Multicore-lab (B 327:197, atop Cafe Java),
jointly supervised by August Ernstsson (A1a (max 8), A1b (max 16)) and Alexander Wilkens (A2(max 24)) (CPU labs v46-48),
and by Ingemar Ragnemalm and Alexander Wilkens (GPU labs v49-51, A1a+A1b+A2).
- Group_B: 32 students, room System-och-Bild-labbet (2C:525B), supervised by August Ernstsson (B1) and Alexander Wilkens (B2) (CPU+GPU v46-51).
Remarks: Groups in pass A are recommended for Norrköping-based students (wednesday afternoons 13-17). Group A-subgroups A1a, A1b and A2 run in parallel (A1b and A2 in the same room). Note that A1a and A1b have different assistants in the CPU and GPU part.
Register for one of these groups in webreg by friday in the first week,
thereafter remaining places will be given to the persons on the waiting list.
The maximum course capacity is 64 (now 64+16) students.
Find a lab mate; we will merge any singleton groups and migrate between groups as necessary, as the course is fully booked.
Presence in the lab sessions is mandatory.
Deadlinessee the lab page.
Page responsible: Christoph W Kessler
Last updated: 2018-11-15