Multicore ComputingDF21500, 2013HT
|
|
Course plan
Lectures
Ca. 32h, usually given in block format in 2 intensive weeks in February and early March, next time in 2013.
Recommended for
Graduate (CUGS, CIS, ISY, ...) students interested in the areas of parallel computer architecture, parallel programming, general-purpose GPU programming, software engineering, optimization, compiler construction, or algorithms and complexity.
The course was last given
in VT2011.
Goals
The course emphasizes fundamental aspects of shared-memory parallel programming and accelerator (GPU) programming, such as shared memory parallel architecture concepts, programming models, performance models, parallel algorithmic paradigms, parallelization techniques and strategies, scheduling algorithms, optimization, composition of parallel programs, and concepts of modern parallel programming languages and systems. Practical exercises help to apply the theoretical concepts of the course to solve concrete problems in a real-world multicore system.
Prerequisites
Data structures and algorithms are absolutely required; some knowledge in
complexity theory and compiler construction is useful. Some basic knowledge of
computer architecture is assumed. A basic course in concurrent programming
(e.g. TDDB68) and parallel programming (e.g. TDDC78 or TANA77) are recommended.
Programming in C and some familiarity with Linux (or similar OS) is necessary
for the practical exercises.
Contents
(some advanced topics may be added or changed depending on availability of
guest lecturers)
I. Architecture
* Multicore architecture issues (incl SMT, SMP, CC-NUMA, NCC-NUMA)
* Short repetition: Cache locality and memory hierarchy
* Shared memory emulation and consistency issues
* Heterogeneous multicores
* GPU computing
II. Languages and environments
* pthreads
* Cilk
* UPC
* OpenMP 3.0
* New HPC languages: X10, Chapel, Fortress
* Stream processing and GPU languages: Cg, Brook, Cuda, OpenCL
* Offload C++
III. Parallelization techniques
* Design patterns for concurrency / synchronization
* Dependence analysis
* Automatic parallelization
* Runtime parallelization and speculative parallelization
* Lock-free synchronization
* Transactional programming
* Task scheduling and clustering
IV. Optimizations
* Feedback directed optimization
* Task mapping and on-chip pipelining
* Skeleton based parallel programming
* Optimized composition of parallel programs from parallel components
* Scheduling malleable task graphs
Organization
Lectures (ca. 32h), programming exercises, optional theoretical exercises for
self-assessment, programming lab assignment, student presentations.
The lecture series of the course will be held in block format with two
intensive weeks in Linköping.
Literature
To be announced on the course homepage.
Lecturers
Christoph Kessler, Linköpings universitet
Welf Löwe, Univ. Växjö, and further guest lecturers
Examiner
Christoph Kessler, Linköpings universitet
Examination
TEN1: Written exam 3.5p, mandatory.
UPG3: Programming exercise (lab report) 3p, optional.
UPG2: Paper presentation, opposition, and written summary (1-2 pages) of the
presented paper, 1.5p, optional.
Credit
8p if all examination moments are fulfilled.
Partial credits can be given, too, but the exam must be passed.
Admission to the exam requires presence in at least 50% of lectures and
lessons.
Organized by
CUGS
Comments
OBS - new weights of examination moments, in total now 8p (earlier 7.5p), might
need a new course code?
Lecture and lab contents mostly overlaps with TDDD56 Multicore and GPU
Programming.
Page responsible: Anne Moe