Hide menu

TDDD56 Multicore and GPU Programming


Reading directions per lecture topic can be found here.


Since the 2018 edition of the course, we have two new course compendia that cover some (but not all) of the lectures of the course. These are made available to registered course participants via the course homepage.

  • C. Kessler: Design and Analysis of Parallel Algorithms: An Introduction. Compendium, version May 2020, (c) 2020.
    Available for registered participants of TDDD56. Not for general distribution.
    Covers the three TDDD56 lectures on analysis of parallel algorithms and on parallel sorting.

  • I. Ragnemalm: Attack in Packs. Compendium, (c) 2018.
    Available for registered participants of TDDD56. Not for general distribution.
    Covers the TDDD56 GPU lectures.

So far, there is no single textbook (yet) that covers the entire course contents.
In principle you should be able to follow the course based on the OH material, the two course compendia, and the online references (e.g., articles) given further below.
If you prefer to work with a textbook, we recommend that you choose one general introductory text to parallel programming (not needed if you already have taken TDDC78 earlier), such as Wilkinson/Allen or Lin/Snyder, and possibly complement this with a book or online material about GPU programming. If you want to work with a GPU programming book, too, we recommend, in the first hand, Sanders/Kandrot CUDA by example, see below. We also give a selection of reference books for secondary reading below.
All books listed below should be available in the LiU Campus Valla library, some also as reference copy or e-books.

  • C. Lin, L. Snyder: Principles of Parallel Programming. Pearson/Addison Wesley, 2008. 978-0-321-54942.
    (General introduction text. The Campus-Valla library has a copy for loan and will also have a reference copy or e-book. There is an Errata available for the first printing.)
  • T. Mattson, B. Sanders, B. Massingill: Patterns for Parallel Programming. Addison Wesley, 2005. 978-0-321-22811-6
    (More a survey, could be useful as a secondary book. The Campus-Valla library has a copy for loan.)
  • M. Herlihy and N. Shavit: The Art of Multiprocessor Programming. Morgan Kaufmann Publishers, 2008. 978-0-12-370591-4
    (More advanced issues in shared-memory synchronization, esp. about non-blocking concurrent data structures. The Campus-Valla library has one copy for loan and also as e-book.)
  • B. Wilkinson, M. Allen: Parallel Programming, Second Edition. Pearson/Prentice Hall, 2005. 0-13-140563-2
    (General introduction text. The course book in TDDC78. The Campus-Valla library has one reference copy and one copy for loan.)
  • J. Sanders, E. Kandrot: CUDA by example. Addison-Wesley, 2011. 978-0-13-138768-3
    (Recommended further reading, for GPU programming in CUDA. The Campus-Valla library has one copy for loan.)
  • David B. Kirk and Wen-mei W. Hwu: Programming Massively Parallel Processors: A Hands-on Approach.
    Morgan Kaufmann, 2010 (first edition), 2012 (second edition). ISBN 0123814723
    (Only secondary reading, for GPU programming in CUDA. Available in the Campus-Valla library as e-book.)
  • A. Grama, G. Karypis, V. Kumar, A. Gupta: Introduction to Parallel Computing, 2nd Edition. Addison-Wesley, 2003.
    (General introduction text with focus on parallel algorithms. The Campus-Valla library has a copy for loan.)
  • A. Munshi, B. Gaster, T. Mattsson, J. Fung, D. Ginsburg: OpenCL Programming Guide. Addison-Wesley, 2011. Print-edition 978-0-321-74964-2
    (Background reading on OpenCL. The Campus-Valla library has acquired one copy for loan.)
  • B. Gaster, L. Howes, D. Kaeli, P. Mistry Heterogeneous Computing with OpenCL. Morgan Kaufmann Publishers, 2011, 978-0123877666
    (Background reading on OpenCL. The Campus-Valla library will acquire one copy for loan.)

On-line material and references

On shared-memory parallel architecture concepts:
  • J. Hennessy, D. Patterson: Computer Architecture, A Quantitative Approach, 4th ed.. Morgan Kaufmann, 2007.
    The second edition (1996), third edition (2002) and fourth edition (2007) each contain a chapter on multiprocessor architecture. The current fifth edition should also be fine. The Campus-Valla library has copies.
    As further reading:
  • D. Culler, J. Singh, A. Gupta: Parallel Computer Architecture. Morgan Kaufmann, 1998. Contains an in-depth treatment of SMP and CC-NUMA shared-memory architectures.
On non-blocking synchronization: On the design and analysis of parallel algorithms: On GPU architecture and programming: On programming and optimization techniques for multicore CPU architectures: This list may be extended during the course.

Reading Directions per Lecture/Topic

The following list gives directions to background reading especially where the information given on the slides should not suffice.
Bibliographical details of the referenced textbooks can be found in the literature list above.

Background knowledge (Prerequisites)

    Not all participants of this course have a core computer science background, hence some technical terms might not be well known to everyone even if the topic is listed in the course prerequisites in the syllabus. Nevertheless, here are some links to catch up before / during the first days of the course.
  • Big-O notation is covered in the appendix of C. Kessler's compendium. See also this Slide set on the analysis of sequential algorithms and Big-O notation from an earlier course in data structures and algorithms.
  • Central terms about concurrent programming (processes, threads, race conditions, synchronization, mutual exclusion, locks, deadlocks, ...) are recapitulated and explained in the slide set of the third lecture, even though there is no time to go through all this in detail again in TDDD56. If you need more background reading about these concepts, read e.g. Chapters 3, 4 and 6 of (any recent edition of) Galvin, Silberschatz, Gagne: Operating System Concepts, or any other good textbook on operating systems concepts.
  • If there are any further items that are new to you but not to most others, please let us know.

On multicore architecture concepts and trends (Lecture 1)

  • Chapter 2 of Lin/Snyder (until p. 53) provides some introductory material about multicore architecture concepts. (There is nothing about multicore and accelerator hardware concepts in the Wilkinson/Allen book.)
    For a more in-depth review of general concepts in (parallel) computer architecture, such as VLIW, hardware multithreading, multicore, caches, (shared) memory etc., see e.g. Hennessy, Patterson: Computer Architecture, a quantitative approach, 4th edition 2006, 5th edition 2011, or later, or other modern textbooks about computer architecture.

On shared memory architecture, cache coherence, false sharing etc. (Lecture 2)

  • The MSI and MESI coherence protocols (bus snooping) are described e.g. in Section 5.3 of the book by Culler, Pal Singh and Gupta, Parallel Computer Architecture, 1998.
  • Consistency models are summarized e.g. in Wilkinson/Allen Parallel Programming 2e, Section 9.3.

On programming with threads and tasks (Lecture 3)

  • A survey of shared memory programming (processes, threads, critical sections, locks, thread pools, OpenMP) is given e.g. in Wilkinson/Allen Parallel Programming 2e, Chapter 8, or in Chapter 6 of Lin/Snyder Principles of Parallel Programming.
  • Dynamic load balancing is described e.g. in Wilkinson/Allen Parallel Programming 2e, Section 7.2.
  • Programming with light-weight tasks (a la Cilk and Futures) and dynamic scheduling by work-stealing is shortly presented e.g. in Herlihy/Shavit The Art of Multiprocessor Programming, Chapter 16.

On non-blocking synchronization (Lecture 4)

  • Chapters 9, 10 and 11 of Herlihy/Shavit The Art of Multiprocessor Programming give more information about lock-free data structures (linked lists, stacks, queues), non-blocking synchronization, and the ABA problem.

On design and analysis of parallel algorithms (Lecture 5-6)

  • This material is completely covered in C. Kessler's compendium.
  • Most of the theory, including a proof of Brent's Theorem, is presented e.g. in Chapter 2 of Keller, Kessler, Träff: Practical PRAM Programming, Wiley 2001.
  • Speedup etc. is introduced e.g. in Wilkinson/Allen Parallel Programming 2e, Section 1.1, or in Lin/Snyder Principles of Parallel Programming, Chapter 3.
  • The EREW PRAM Prefix Sums algorithm is described e.g. in the Hillis/Steele paper on Data-parallel algorithms or in Wilkinson/Allen Parallel Programming 2e, Section 6.2.1.
    More about parallel prefix algorithms can be found e.g. in the book by JaJa, or in Chapter 2 of Jordan and Alaghband Fundamentals of Parallel Processing.

On parallel sorting (Lecture 7)

  • This material is completely covered in C. Kessler's compendium.
  • Chapter 10 of Wilkinson/Allen Parallel Programming 2e contains a presentation of several parallel sorting algorithms.

On parallel patterns and skeleton programming (Lecture 8)

  • Parallel patterns are partly covered in Chapter 1 of C. Kessler's compendium.
  • Information about programming in SkePU can be found on the SkePU web page.

On dependence analysis, loop transformations and parallelization (Lecture 13)

  • A future chapter (Chapter 5) in the compendium about this topic is currently in preparation; the first pages are now available for pre-view in the 2023 edition of the compendium. Until then, we refer for further details to a standard textbook such as: R. Allen, K. Kennedy: Optimizing Compilers for Modern Architectures. Morgan Kaufmann, 2002.

Page responsible: Christoph W Kessler
Last updated: 2023-12-12