Former Projects:
-
Execution Models for Energy-Efficient Computing Systems (EXCESS)
EU FP7 project, Sep. 2013-Aug. 2016.
- Language and tool infrastructure for energy-aware application synthesis,
system modeling, performance and energy modeling,
optimization techniques and autotuning
for holistic energy optimization for heterogeneous multicore systems
- Partly based on our
previous work for FP7 project PEPPHER
- Leading Workpackage 1 (Execution, Platform and Programming Models for Energy Optimization)
- SkePU: auto-tunable skeleton programming library for Multicore CPU and Multi-GPU systems
- MeterPU: generic, portable measurement abstraction library for Multicore CPU and Multi-GPU systems
- Global Composition Framework
- XPDL extensible platform description language
- Automated performance modeling for guiding automatic selection in multi-variant computations
- Short overview of our contributions to EXCESS (Proc. EXCESS workshop Gothenburg, Sweden, Aug. 2016)
-
Performance Portability and Programmability for Heterogeneous
Many-core Architectures (PEPPHER)
EU FP7 project, Jan. 2010 - Dec. 2012.
- Skeleton and Pattern Based Programming Environments
- BlockLib:
Skeleton programming library for Cell/B.E.
-
PRT Pattern Recognition Tool:
Generic tool for automated recognition of computational patterns in legacy C programs,
e.g. for pattern-based automatic parallelization.
-
SeRC-OpCoReS: Optimized Composition and Runtime Support for e-Science, 2011-2018,
Swedish e-Science Research Center (SeRC),
core section on
Parallel and Distributed Algorithms and Tools (2011-2015) and
Parallel Software and Data Engineering (2016-2018).
- Integrated Code Generation for Instruction-Level Parallel Architectures
-
OPTIMIST: Optimization algorithms for integrated code generation
OPTIMIST is a retargetable, highly optimizing code generator
for superscalar, VLIW, clustered VLIW, DSP and embedded processor architectures.
To achieve high code quality,
it simultaneously considers the optimization problems for
instruction selection (including cluster assignment and
resource allocation),
instruction scheduling,
and register allocation.
Partially funded 2001-2007 by CENIIT
and 2004-2005 by SSF RISE.
- Integrated Software Pipelining
Optimal code generation for loops, integrating both instruction selection,
cluster assignment,
scheduling and register allocation including optimal spill code generation and scheduling,
for embedded, VLIW and clustered VLIW processors.
Funded 2006-2008 and 2010-2012
by Vetenskapsrådet (VR)
and 2006-2011 by the CUGS
graduate school.
-
REPLICA
project (contract research).
This VTT project developed a reconfigurable shared memory chip multiprocessor supporting strong memory consistency
(CRCW PRAM on a chip). We developed a
high-level parallel programming language, a compiler backend and system support
for the REPLICA architecture.
-
DSP Platform for Emerging Telecommunication and Multimedia (ePUMA)
Optimizing DSP streaming applications for memory access cost
on a new reconfigurable chip multiprocessor.
WP3: Classification of memory access patterns in DSP applications;
program analysis for memory access structures, and
automatic selection of most suitable network configuration for
parallel memory access.
Funded 2008-2011 by SSF
-
PRT Pattern Recognition Tool
Generic tool for automated recognition of computational patterns in legacy C programs,
e.g. for pattern-based automatic parallelization.
-
On-chip pipelining of memory-intensive computations
on multi-/manycore processors (Cell/B.E. and Intel SCC)
Restructuring memory-intensive, streamable computations such as parallel mergesort
to use on-chip forwarding of intermediate data between Cell SPEs
allows to reduce the overall volume of off-chip memory accesses,
making the application less memory bound and resulting in faster computation.
We develop mapping algorithms that optimize trade-offs between computational load balance,
on-chip buffer requirements and on-chip communication volume in on-chip pipelining.
Applied to mergesort on Cell, this speeds up the dominating global
merge phase of CellSort by up to 70% on QS-20 and up to 143% on PlayStation-3,
see our paper at Euro-Par 2010.
-
Fork:
Fork95 Language Definition and Compiler
for the
SB-PRAM,
a scalable, massively parallel shared memory MIMD computer
with uniform memory access time that works synchronously at the instruction level.
The complete project is described in my recent
book.
The compiler and tools developed for the SB-PRAM are now used in programming
labs for
teaching parallel algorithms.
-
SPARAMAT
A tool for automatic detection of sparse matrix computations and data structures
in application programs by static and dynamic pattern matching techniques,
which can be used for automatic parallelization and aggressive program transformations.
(The successor of the former
PARAMAT
project at Saarbrücken.)
Funded 1997-2000 by Deutsche Forschungsgemeinschaft (DFG)
-
NestStep
Design and implementation of a MIMD parallel global address space (PGAS) language
based on the BSP (bulk-synchronous parallel) programming model,
supporting shared variables and nested parallelism
on top of message passing architectures.
NestStep provides deadlock-free, deterministic parallel execution with
BSP-compliant synchronicity and memory consistency.
NestStep has been implemented for MPI clusters and for the
heterogeneous multicore processor Cell/B.E.
-
Interactive Invasive Parallelization
User-guided composition of parallel software with an incremental
aspect-oriented parallelization approach.
Covers both automatic parallelization,
skeleton-based structured parallel programming and semiautomatic
program restructuring.
Support for automatic roundtrip engineering in aspect weaving.
Part of the RISE project
funded 2002-2005 and 2006-2007
by SSF.
|