Multicore computingDF21500, 2009VT
|
|
Course plan
Lectures
Ca. 32h, given in block format in 2 intensive weeks in February and early March 2009.
Recommended for
Graduate (CUGS, ECSEL, ...) students interested in the areas of parallel computer architecture, parallel programming, software engineering, optimization, compiler construction, or algorithms and complexity.
The course was last given
This is a new course.
It partly overlaps (1p = 1.5hp) with FDA125 Advanced Parallel Programming,
which was last given 2007.
This course also complements the existing graduate course FDA101 on Parallel
Programming (4p), which is aliased to the undergraduate course TDDC78.
Goals
The course emphasizes fundamental aspects of shared-memory parallel programming such as shared memory parallel architecture concepts, programming models, performance models, parallel algorithmic paradigms, parallelization techniques and strategies, scheduling algorithms, optimization, composition of parallel programs, and concepts of modern parallel programming languages and systems. Practical exercises help to apply the theoretical concepts of the course to solve concrete problems in a real-world multicore system.
Prerequisites
Data structures and algorithms (e.g. TDDB57) are absolutely required; knowledge
in complexity theory and compiler construction is useful. Some basic knowledge
of computer architecture is assumed. A basic course in concurrent programming
(e.g. TDDB63/68/72) and parallel programming (e.g. FDA101/TDDC78 or TANA77) are
recommended.
Programming in C and some familiarity with Linux (or similar OS) is necessary
for the practical exercises.
Contents
I. Architecture
* Multicore architecture issues (incl SMT, SMP, CC-NUMA)
* Cache locality and memory hierarchy
* Shared memory emulation and consistency issues
* Reconfigurable parallel systems, FPGA, accelerators
* PRAM-on-chip (?)
* GPU computing (?)
II. Languages and environments
* Cilk
* UPC
* OpenMP 3.0
* New HPC languages: X10, Chapel, Fortress
* Stream processing and GPU languages: Cg, Brook, Cuda, YAPI (?)
III. Parallelization techniques
* Design patterns for concurrency / synchronization
* Dependence analysis
* Automatic parallelization
* Runtime parallelization and speculative parallelization
* Lock-free synchronization
* Transactional programming
* Task scheduling and clustering
IV. Optimizations
* Feedback directed optimization
* Hybrid parallelism OpenMP/MPI
* Skeleton based parallel programming
* Composition of parallel programs from parallel components
* Scheduling malleable task graphs
V. Selected algorithms and applications
* TBA
Organization
Lectures (ca. 32h), programming exercises, optional theoretical exercises for
self-assessment, programming lab assignment, student presentations.
The lecture series of the course will be held in block format with two
intensive weeks in Linköping.
Literature
To be announced on the course homepage.
Lecturers
Christoph Kessler, Linköpings universitet
Welf Löwe, Univ. Växjö
Examiner
Christoph Kessler, Linköpings universitet
Welf Löwe, Univ. Växjö
Examination
TEN1: Written exam (Welf, Christoph) 3p = 4.5hp. Due to partial overlap, 2p =
3hp for those who already have passed the exam in FDA125 Advanced Parallel
Programming earlier.
UPG1: Programming exercise (lab report) 1p = 1.5hp.
UPG2: Paper presentation, opposition, and written summary (1-2 pages) of the
presented paper, 1p = 1.5hp.
Credit
5p = 7.5hp if all examination moments are fulfilled.
Partial credits can be given, too.
Organized by
CUGS
Comments
There is a partial overlap in contents with FDA125 Advanced Parallel
Programming, which corresponds to 1p = 1.5hp in the written exam.
In contrast to FDA125, we focus in this course on shared memory systems and add
more material on software engineering techniques for parallel systems, such as
design patterns for synchronization and optimized composition of parallel
components. We also give more emphasis to parallelization techniques that are
especially useful for multicore systems, such as speculative parallelization,
transactional programming and lock-free synchronization. Scheduling for shared
memory systems will be considered in more detail, in particular scheduling
algorithms for malleable task graphs.
Page responsible: Director of Graduate Studies