FDA125
Advanced Parallel Programming: Models, Languages, Algorithms (5p)

VT1/2007

Recommended for
Graduate (CUGS, ECSEL, ...) students interested in the areas of parallel computer architecture, parallel programming, compiler construction, or algorithms and complexity.
The course is hosted by the CUGS graduate school as an Advanced Course.
Interested undergraduate students are also welcome.

Registration / List of participants (internal access only)

Organization
Lectures (ca. 32h), programming exercises, optional theoretical exercises for self-assessment, programming lab assignment, student presentations.
The course will be held as block course with two intensive weeks in Linköping.

The course was last given
2003.
This course complements the existing graduate course FDA101 on Parallel Programming (4p), which is aliased to the undergraduate course TDDC78.

Goals
The course emphasizes fundamental aspects of parallel programming such as parallel architectures and programming models, performance models, parallel complexity classes, parallel algorithmic paradigms, parallelization strategies, and the design and implementation of parallel programming languages. Practical exercises help to apply the theoretical concepts of the course to solve concrete problems in different parallel programming models.

Prerequisites
Data structures and algorithms (e.g. TDDB57) are absolutely required; knowledge in complexity theory and compiler construction is useful. Processprogramming (e.g. TDDB63/68/72) and Parallel Programming (e.g. FDA101/TDDC78 or TANA77) are recommended.
Programming in C is necessary for the practical exercises.

Contents/Schedule

Date Time Location Topic (with links to slides (PS/PDF) where available) Lecturer
Week 9:
Monday
Feb 26, 2007
9-12 Knuth Introduction.
I. Parallel computer architecture concepts (*). PDF (2007)
C. Kessler
" 13-16 Knuth II. Basic theory of parallel computation
PRAM model. Time, work, cost. NC. Speedup and Amdahl's Law (*), Self-simulation and Brent's Theorem, Scalability and Gustafssons Law (*). Fundamental PRAM algorithms (reduction, parallel prefix, list ranking). PRAM variants, simulation results and separation theorems. PDF (2007)
C. Kessler
Tuesday
Feb 27, 2007
9-12 Knuth III. Lesson 1: An introduction to PRAM programming in Fork.
Fork language principles. Using the compiler, PRAM simulator and system software. PDF (2007)
C. Kessler
Tuesday
Feb 27, 2007
13-15:30 Knuth IV. Parallel algorithmic paradigms and programming techniques, with example problems.
Data parallelism. Parallel divide-and-conquer, recursive doubling. Accelerated cascading. Synchronization mechanisms in asynchronous parallel data structures: parallel FIFO queue. Pipelining. Domain decomposition and irregular parallelism. (PDF (2007))
C. Kessler
"15:40-16" Load balancing in irregular parallel divide-and-conquer computations. (PDF (2007)) M. Eriksson
Wednesday
Feb 28, 2007
9-12 Knuth V. Message passing models: Delay model, classical BSP model of Valiant, BSP-model of McColl, LogP, LogGP. (PDF) W. Löwe
" 13-14 Knuth V. (cont.) BSP - LogP simulation (PDF) W. Löwe
" 14-16 Knuth VI. Distributed shared memory: realizations in CC-NUMA architectures and software DSM emulations. Memory consistency models. (PDF (2007)) C. Kessler
Thursday
March 1, 2007
10-12 von Neumann VIII. Lesson 2: Message passing programming in MPI.
Recapitulation from TDDC78 and examples. (PDF)
M. Chalabine
Thursday
March 1, 2007
13-16 von Neumann VII. Concepts of parallel programming languages and environments
Case study: Cilk (PDF (2007))
Survey of parallel language concepts (PDF (2007))
with an introduction to transactional memory
C. Kessler
Week 10:
Monday
March 5, 2007
9-11 Knuth X. Realization of PRAMs
PRAM emulations on distributed-memory architectures: Hashed address space, Pipelining and Multithreading. Ranade's Fluent Machine and its cost-effective massively parallel realization in hardware. Low-level synchronization mechanisms. PDF
C. Kessler
"14:15-15 Alan Turing Performance issues in emulated shared memory computing
(also SaS division seminar) (PDF)
M. Forsell, VTT, guest lecturer
"15:30-17 Donald Knuth Implementing PRAM on a chip
Abstract: In this lecture we will give a short overview of benefits and problems of single chip designs in PRAM implementation, describe a single chip sparse mesh based EREW PRAM architecture as well as extensions of it realizing the CRCW PRAM and arbitrary ordered multioperation CRCW PRAM models. Some simulation results are provided. (PDF)
M. Forsell, VTT, guest lecturer
Tuesday
March 6, 2007
14-16 Knuth IX. Generic parallel programming with skeletons.
Concepts and skeleton-based programming environments. (PS)
M. Chalabine
Tuesday
March 6, 2007
16-16:30 Knuth Distribution of papers for student presentations. C. Kessler
Wednesday
March 7, 2007
9-11 Knuth XI. Compiling for parallel computers: Dependence analysis, loop transformations, loop parallelization, idiom recognition. (PDF) W. Löwe
11-12 Run-time parallelization, thread-level speculation. (PDF) C. Kessler
Wednesday
March 7, 2007
13-15 Knuth XII. Optimization of data layout. (PDF) W. Löwe
Thursday
Feb 8, 2007
09-12 von Neumann XIII. Clustering and scheduling of processes (PDF) W. Löwe
Thursday
Feb 8, 2007
13-14 von Neumann XIII. Clustering and scheduling of processes (cont.) W. Löwe
Thursday14-14:30 " Wrap-up. Discussion. Self-evaluation. W. Löwe
M. Chalabine
Friday
March 16, 2007
9-17?Knuth Student presentation session
Preliminary schedule of presentations
(original list of papers)
C. Kessler
Friday
March 30, 2007
14:00-18:00 von NeumannExam
Friday
25/5, 2007
14:00-18:00 Donald KnuthExam (second opportunity) C. Kessler
Remark: Lectures begin c.t. (cum tempore, i.e. at xx:15) unless otherwise stated.

Papers for Student Presentations

Here is a preliminary list of papers for the student presentations.

Labs

For doing the Fork lab you need (remote) access to a SUN computer running Solaris. Use either any IDA SUN server or my own machine mir28.ida.liu.se.
OBS:
If you install Fork locally: The Makefile used to build the compiler and its documentation uses dvips to create postscript files. However, IDAs version of dvips automatically sends the ps file to the printer, which is not intended. In my own configuration I locally redefined dvips to match this behaviour. Another simple workaround is to comment out these lines in the Makefile or to use the IDA version's flags -o to specify the output file name. - C.K.

Exercises

  1. Exercises for Topic I (Parallel Computer Architecture
  2. Exercises for Topic II (PRAM Model and Theory)
  3. Exercises for Topic III (Programming in Fork)
  4. Exercises for Topic IV (Parallel Algoritmic Techniques)
  5. Exercises for Topic VI (Distributed Shared Memory)
  6. Exercises for Topic VII (Cilk)
  7. Exercises for Topic VIII (MPI)
  8. Exercises for Topic IX (Skeleton programming)
  9. Exercises for Topic X (PRAM emulation)
These exercises are for self-assessment. Printed copies of the current exercises will be distributed at each lecture. Students are encouraged to meet and solve exercises together in small, self-organizing working groups after the lectures (the lecture room is reserved for the whole day). There is no submission of solutions and no correction.

Literature

More references

Teachers
Christoph Kessler, Linköpings universitet (course leader, lecturer, examiner)
Welf Löwe, Univ. Växjö (lecturer, examiner)
Martti Forsell, VTT - Technical Research Center of Finland (guest lecturer)
Mikhail Chalabine, Linköpings universitet (lesson)

Examination
TEN1: Written exam (Welf, Christoph) 3p. Date: Friday 30/3/2007, 14-18, room John von Neumann.
Re-exam: Friday 25/5/2007, 14-18, room Donald Knuth.
UPG1: Programming exercise, Fork (lab report) 1p. Deadline: 31/03/2007.
UPG2: Paper presentation, opposition, and written summary (1-2 pages) of the presented paper, 1p.
Deadline for the written summary: 31/03/2007.

Old exams

Credit
5 credits if all examination moments are fulfilled.
Partial credits can be given, too.

Comments
There is a partial overlap in contents with FDA101 Parallel Programming / TDDC78. These topics are part of the lectures to make the course self-contained but will not be taken up in the exam (TEN1). Accordingly, some of the lectures are optional for those students who already took FDA101; these are marked by an asterisk (*) in the Contents section above.

A main difference to FDA101 / TDDC78 / TANA77 is not only in depth but also in scope: While FDA101 emphasizes scientific computing, numerical applications, and tools, this course focuses on theory and non-numerical algorithms.


This page is maintained by Christoph Kessler (chrke \at ida.liu.se)