Professor for Computer Science at Linköping University, Sweden
Prof. Dr. Christoph Kessler
PELAB - Programming Environments Laboratory
Software and Systems Division
Department for Computer and Information Science (IDA)
S - 581 83
phone +46 13 28 2406
mobil +46 70 3666687
fax +46 13 28 58 99
email: Christoph.Kessler \at liu.se
URKUND-address: chrke55.liu \at analys.urkund.se
- Parallel computing
- Parallel programming models, languages, compilers, run-time systems,
tools, libraries, algorithms
- especially for heterogeneous multicore / manycore platforms such as Cell, GPU-based systems, NoC
- Composition of parallel programs from parallel components
- Optimized composition, autotuning
- Resource allocation, mapping, frequency scaling, and scheduling
of parallel computations to parallel systems
- Modeling of hardware and software properties for energy and
- Performance portability
- Compiler technology
- Code generation for instruction-level parallel and embedded processors
- especially, clustered VLIW DSP processors
- Optimization problems in code generation
- Program analysis and transformation, including
automatic and semiautomatic parallelization
List of publications
Current or recently completed projects:
EU FP7 project Execution Models for Energy-Efficient Computing Systems (EXCESS)
Sep. 2013-Aug. 2016.
- Application synthesis infrastructure, system modeling,
optimization techniques and autotuning
for holistic energy optimization
- Partly based on our work for PEPPHER
- Leading Workpackage 1 (Execution, Platform and Programming Models for Energy Optimization)
- SkePU: auto-tunable skeleton programming library for Multicore CPU and Multi-GPU systems
- MeterPU: generic, portable measurement abstraction library for Multicore CPU and Multi-GPU systems
- Global Composition Framework
EU ICT COST Action
IC1406 High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)
- On-Chip Pipelining:
Mapping and Scheduling Moldable Streaming Tasks on Many-Core Processors
We consider the combined problem of allocating cores, mapping and discrete
voltage/frequency scaling for moldable (i.e., parallelizable)
streaming tasks to manycore processors, in order to optimize
energy usage given a throughput constraint.
This is partly based on our earlier work on on-chip-pipelining.
- Crown Scheduling
- Fast Crown Scheduling Heuristics for Energy-Efficient Mapping and Scaling of Moldable Streaming Tasks on Many-Core Systems,
recent article accepted for ACM TACO, to be presented at the
SeRC-OpCoReS: Optimized Composition and Runtime Support for e-Science
Partially funded by the Swedish e-Science Research Center (SeRC),
Core section on Parallel and Distributed Algorithms and Tools, since 2011.
Generic parallel components (skeletons):
Auto-tunable skeleton programming library for GPU-based systems.
Skeleton programming library for Cell/B.E.
project (contract research).
This project realizes a CRCW PRAM on a chip. We are developing
high-level language and system support and a compiler backend
for the REPLICA architecture.
- PELAB research group on compiler technology and parallel computing
Integrated Code Generation for Instruction-Level Parallel Architectures
OPTIMIST: Optimization algorithms for integrated code generation
OPTIMIST is a retargetable, highly optimizing code generator
for superscalar, VLIW, clustered VLIW, DSP and embedded processor architectures.
To achieve high code quality,
it simultaneously considers the optimization problems for
instruction selection (including cluster assignment and
and register allocation.
Partially funded 2001-2007 by CENIIT
and 2004-2005 by SSF RISE.
- Integrated Software Pipelining
Optimal code generation for loops, integrating both instruction selection,
scheduling and register allocation including optimal spill code generation and scheduling,
for embedded, VLIW and clustered VLIW processors.
Funded 2006-2008 and 2010-2012
by Vetenskapsrådet (VR)
and 2006-2011 by the CUGS
DSP Platform for Emerging Telecommunication and Multimedia (ePUMA)
Optimizing DSP streaming applications for memory access cost
on a new reconfigurable chip multiprocessor.
WP3: Classification of memory access patterns in DSP applications;
program analysis for memory access structures, and
automatic selection of most suitable network configuration for
parallel memory access.
Funded 2008-2011 by SSF
PRT Pattern Recognition Tool
Generic tool for automated recognition of computational patterns in legacy C programs,
e.g. for pattern-based automatic parallelization.
On-chip pipelining of memory-intensive computations
on multi-/manycore processors (Cell/B.E. and Intel SCC)
Restructuring memory-intensive, streamable computations such as parallel mergesort
to use on-chip forwarding of intermediate data between Cell SPEs
allows to reduce the overall volume of off-chip memory accesses,
making the application less memory bound and resulting in faster computation.
We develop mapping algorithms that optimize trade-offs between computational load balance,
on-chip buffer requirements and on-chip communication volume in on-chip pipelining.
Applied to mergesort on Cell, this speeds up the dominating global
merge phase of CellSort by up to 70% on QS-20 and up to 143% on PlayStation-3,
see our paper at Euro-Par 2010.
Fork95 Language Definition and Compiler
a scalable, massively parallel shared memory MIMD computer
with uniform memory access time that works synchronously at the instruction level.
The complete project is described in my recent
The compiler and tools developed for the SB-PRAM are now used in programming
teaching parallel algorithms.
A tool for automatic detection of sparse matrix computations and data structures
in application programs by static and dynamic pattern matching techniques,
which can be used for automatic parallelization and aggressive program transformations.
(The successor of the former
project at Saarbrücken.)
Funded 1997-2000 by Deutsche Forschungsgemeinschaft (DFG)
Design and implementation of a MIMD parallel global address space (PGAS) language
based on the BSP (bulk-synchronous parallel) programming model,
supporting shared variables and nested parallelism
on top of message passing architectures.
NestStep provides deadlock-free, deterministic parallel execution with
BSP-compliant synchronicity and memory consistency.
NestStep has been implemented for MPI clusters and for the
heterogeneous multicore processor Cell/B.E.
Interactive Invasive Parallelization
User-guided composition of parallel software with an incremental
aspect-oriented parallelization approach.
Covers both automatic parallelization,
skeleton-based structured parallel programming and semiautomatic
Support for automatic roundtrip engineering in aspect weaving.
Part of the RISE project
funded 2002-2005 and 2006-2007
Some recent / upcoming events:
Dagstuhl research seminar 10191 on
Program Composition and Optimization:
Autotuning, Scheduling, Metaprogramming and Beyond, May 9-12, 2010
MCC-2011 Fourth Swedish Multicore Computing Workshop, Linköping, Sweden, 23-25 Nov 2011
International Workshop on Multi-/Many-Core Computing Systems, Sep. 7, 2013, in connection with PACT'13, Edinburgh, Sep. 2013
- MCC-2014 workshop
- HiPEAC-2015 conference
- NPC-2015 conference
Undergraduate and master-level courses
I also give guest lectures in
TDDD93 Large-Scale Distributed Systems and Networks (since 2015).
List of all courses ever given
Master thesis projects
Multicore Lab (since 2012)
IEEE Computer Society
- TCSC Scalable Computing
HiPEAC European Network of Excellence on High Performance and Embedded Architecture and Compilation
EAPLS European Association
for Programming Languages and Systems
GI Gesellschaft für Informatik
- GI/ITG-Fachgruppe PARS Parallel-Algorithmen, -rechnerstrukturen und -systemsoftware
- GI-Fachgruppe 2.1.4 Programmiersprachen und Rechenkonzepte
- GI-Arbeitskreis Software Engineering für parallele Systeme (SEPARS)
VDI Verein Deutscher Ingenieure
The Swedish Multicore Initiative
SeRC Swedish E-Science Research Center
Christoph Kessler (chrke \at ida.liu.se)