Recommended for
Graduate (CUGS, ECSEL, ...) students interested in the areas of parallel computer
architecture, parallel programming,
compiler construction, or algorithms and complexity.
The course is hosted by the
CUGS graduate school
as an Advanced Course.
Interested undergraduate students are also welcome.
Registration / List of participants (internal access only)
Organization
Lectures (ca. 32h),
programming exercises, optional
theoretical exercises for self-assessment, programming lab assignment,
student presentations.
The course will be held as block course with two intensive weeks
in Linköping.
The course was last given
2003.
This course complements the existing graduate course FDA101 on
Parallel Programming (4p), which is aliased to the undergraduate course
TDDC78.
Goals
The course emphasizes fundamental aspects of parallel programming such as
parallel architectures and programming models, performance models, parallel
complexity classes, parallel algorithmic paradigms, parallelization strategies, and
the design and implementation of parallel programming languages. Practical
exercises help to apply the theoretical concepts of the course to solve concrete
problems in different parallel programming models.
Prerequisites
Data structures and algorithms (e.g. TDDB57) are absolutely required; knowledge
in complexity theory and compiler construction is useful. Processprogramming
(e.g. TDDB63/68/72) and Parallel Programming (e.g. FDA101/TDDC78 or TANA77) are
recommended.
Programming in C is necessary for the practical exercises.
Contents/Schedule
Date | Time | Location | Topic (with links to slides (PS/PDF) where available) | Lecturer |
Week 9: | ||||
Monday Feb 26, 2007 | 9-12 | Knuth | Introduction. I. Parallel computer architecture concepts (*). PDF (2007) |
C. Kessler |
" | 13-16 | Knuth |
II. Basic theory of parallel computation PRAM model. Time, work, cost. NC. Speedup and Amdahl's Law (*), Self-simulation and Brent's Theorem, Scalability and Gustafssons Law (*). Fundamental PRAM algorithms (reduction, parallel prefix, list ranking). PRAM variants, simulation results and separation theorems. PDF (2007) |
C. Kessler |
Tuesday Feb 27, 2007 | 9-12 | Knuth |
III. Lesson 1: An introduction to PRAM programming in Fork. Fork language principles. Using the compiler, PRAM simulator and system software. PDF (2007) |
C. Kessler |
Tuesday Feb 27, 2007 | 13-15:30 | Knuth |
IV. Parallel algorithmic paradigms and programming techniques, with example
problems. Data parallelism. Parallel divide-and-conquer, recursive doubling. Accelerated cascading. Synchronization mechanisms in asynchronous parallel data structures: parallel FIFO queue. Pipelining. Domain decomposition and irregular parallelism. (PDF (2007)) |
C. Kessler |
" | 15:40-16 | " | Load balancing in irregular parallel divide-and-conquer computations. (PDF (2007)) | M. Eriksson |
Wednesday Feb 28, 2007 | 9-12 | Knuth | V. Message passing models: Delay model, classical BSP model of Valiant, BSP-model of McColl, LogP, LogGP. (PDF) | W. Löwe |
" | 13-14 | Knuth | V. (cont.) BSP - LogP simulation (PDF) | W. Löwe |
" | 14-16 | Knuth | VI. Distributed shared memory: realizations in CC-NUMA architectures and software DSM emulations. Memory consistency models. (PDF (2007)) | C. Kessler |
Thursday March 1, 2007 | 10-12 | von Neumann |
VIII. Lesson 2: Message passing programming in MPI. Recapitulation from TDDC78 and examples. (PDF) |
M. Chalabine |
Thursday March 1, 2007 | 13-16 | von Neumann |
VII. Concepts of parallel programming languages and environments Case study: Cilk (PDF (2007)) Survey of parallel language concepts (PDF (2007)) with an introduction to transactional memory |
C. Kessler |
Week 10: | ||||
Monday March 5, 2007 | 9-11 | Knuth |
X. Realization of PRAMs PRAM emulations on distributed-memory architectures: Hashed address space, Pipelining and Multithreading. Ranade's Fluent Machine and its cost-effective massively parallel realization in hardware. Low-level synchronization mechanisms. PDF |
C. Kessler |
" | 14:15-15 | Alan Turing | Performance issues in emulated shared memory computing (also SaS division seminar) (PDF) |
M. Forsell, VTT, guest lecturer |
" | 15:30-17 | Donald Knuth |
Implementing PRAM on a chip Abstract: In this lecture we will give a short overview of benefits and problems of single chip designs in PRAM implementation, describe a single chip sparse mesh based EREW PRAM architecture as well as extensions of it realizing the CRCW PRAM and arbitrary ordered multioperation CRCW PRAM models. Some simulation results are provided. (PDF) |
M. Forsell, VTT, guest lecturer |
Tuesday March 6, 2007 | 14-16 | Knuth |
IX. Generic parallel programming with skeletons. Concepts and skeleton-based programming environments. (PS) |
M. Chalabine |
Tuesday March 6, 2007 | 16-16:30 | Knuth | Distribution of papers for student presentations. | C. Kessler |
Wednesday March 7, 2007 | 9-11 | Knuth | XI. Compiling for parallel computers: Dependence analysis, loop transformations, loop parallelization, idiom recognition. (PDF) | W. Löwe |
11-12 | Run-time parallelization, thread-level speculation. (PDF) | C. Kessler | ||
Wednesday March 7, 2007 | 13-15 | Knuth | XII. Optimization of data layout. (PDF) | W. Löwe |
Thursday Feb 8, 2007 | 09-12 | von Neumann | XIII. Clustering and scheduling of processes (PDF) | W. Löwe |
Thursday Feb 8, 2007 | 13-14 | von Neumann | XIII. Clustering and scheduling of processes (cont.) | W. Löwe |
Thursday | 14-14:30 | " | Wrap-up. Discussion. Self-evaluation. | W. Löwe M. Chalabine |
Friday March 16, 2007 | 9-17? | Knuth |
Student presentation session Preliminary schedule of presentations (original list of papers) | C. Kessler |
Friday March 30, 2007 | 14:00-18:00 | von Neumann | Exam | |
Friday 25/5, 2007 | 14:00-18:00 | Donald Knuth | Exam (second opportunity) | C. Kessler |
Papers for Student Presentations
Here is a preliminary list of papers for the student presentations.
Labs
Exercises
Literature
More references
Teachers
Christoph Kessler, Linköpings universitet (course leader, lecturer, examiner)
Welf Löwe, Univ. Växjö (lecturer, examiner)
Martti Forsell, VTT - Technical Research Center of Finland (guest lecturer)
Mikhail Chalabine, Linköpings universitet (lesson)
Examination
TEN1:
Written exam
(Welf, Christoph)
3p. Date: Friday 30/3/2007, 14-18, room
John von Neumann.
Re-exam: Friday 25/5/2007, 14-18, room Donald Knuth.
UPG1:
Programming exercise, Fork (lab report) 1p. Deadline:
31/03/2007.
UPG2:
Paper presentation, opposition, and written summary (1-2 pages) of the presented
paper, 1p.
Deadline for the written summary: 31/03/2007.
Old exams
Credit
5 credits if all examination moments are fulfilled.
Partial credits can be given, too.
Comments
There is a partial overlap in contents with FDA101 Parallel Programming /
TDDC78. These topics are part of the lectures to make the course
self-contained but will not be taken up in the exam (TEN1).
Accordingly, some of the lectures are optional for
those students who already took FDA101;
these are marked by an asterisk (*) in the Contents section above.
A main difference to FDA101 / TDDC78 / TANA77 is not only in depth but also in scope: While FDA101 emphasizes scientific computing, numerical applications, and tools, this course focuses on theory and non-numerical algorithms.