OPEN MASTER THESIS PROJECTS
Research Group on Compiler Technology and Parallel Computing
The following master thesis projects are currently available in my group:
All projects on this page require a solid background in either compiler construction
or parallel programming (preferably both); at least one major course (preferably
at master level including programming labs) in these areas should be passed successfully.
Note to non-LIU students (FAQ):
If you want to do a thesis project with us, you must be registered
on a master (or bachelor) program at Linköping university.
It is generally not possible to do such projects remotely.
Performance modeling for multi-kernel GPU computing (30hp)
GPU programming using CUDA is getting popular as GPUs are increasingly becoming
part of mainstream computing.
Already, 62 systems in TOP500
are GPU-based systems (Nov. 2012 listing) and millions of GPUs are sold
every year for mobile and traditional computing domains.
Modern GPUs have already become general-purpose and task-parallel with
introduction of caches and possibility of concurrent execution of multiple
The goal of this master thesis project is to investigate concurrent execution
capabilities of modern NVIDIA (Fermi and Kepler) GPUs.
The idea is to take different applications with different
performance (computation, communication) characteristics and see how
they behave when running concurrently with each other.
Based on experimental findings, we will try to build a model
to predict execution behaviour of a computational kernel when running with other
computational kernels, given the information about GPU resource, and
computational/communication needs of each computational kernel.
Prerequisites: TDDD56 Multicore and GPU Programming, or equivalent course
that includes CUDA and OpenCL programming.
Background in C/C++ programming and computer architecture.
Contact: Usman Dastgeer or Christoph Kessler.
Evaluation of Energy-Efficient Scheduling Algorithms for Streaming Computations on Embedded Parallel Architectures (30hp)
With massively deployed embedded systems in various industrial domains
such as telephony or distributed sensors, the need of more powerful and
power-efficient processing devices is constantly increasing. Performance
and energy consumption depends not only on hardware design, but also on
how software is able to take profit of hardware capabilities. Today,
energy efficiency is largely explored through the design of efficient
and scalable parallel algorithms to provide more throughput without the
need to increase clock frequencies. However, good scheduling techniques for
multiprocessors are also crucial to achieve both throughput and energy
saving. We consider the domain of stream processing,
whose applications range from signal processing in telephony or sensors
to multimedia encoding and decoding tasks.
Streaming pipelines are optimized under a throughput constraint
to minimize energy consumption through parallelism and frequency scaling.
Several energy-aware scheduling techniques are described in the
research literature for streaming computation on embedded
multiprocessor systems, including our own recent work on
However, choosing one for a particular development project is difficult
as they can be assessed with an imperfect theoretical framework, or
assessed through a variety of architectures; in particular, another
architecture than the one chosen for the project. This thesis project
consists in the development of a portable scheduler testing framework
for a subset of common architectures in the parallel and embedded system
research community (e.g., Tilera, SCC, MPPA, Epiphany, etc.). The
framework will be used to compare several schedulers regarding the
energy consumption of the schedules they produce. This work involves
programming on many-core architectures developed to investigate today's
and tomorrow's processors and represent an opportunity to get a first,
but thorough experience with this technology. This thesis work is also a
good first experience in academic research and may lead to scientific
Parallelizing the NEMO ocean model application for GPU-based systems
using the SkePU skeleton programming library (30hp)
Good C/C++ coding skills are essential.
Good knowledge of Fortran and MPI.
Knowledge of C + Fortran mixed programming.
Knowledge of CUDA and/or OpenCL would be useful.
We recommend TDDD56 Multicore and GPU Programming
and TDDC78 Programming parallel computers.
Supervisor at NSC: Johan Raber, e-Science coordinator, NSC (@nsc.liu.se)
Supervisor at IDA: Usman Dastgeer, IDA
Examinator: Christoph Kessler, IDA
- Dynamic Optimization of Interprocessor Communication
in the MPI Back-End of the SkePU skeleton programming library (30hp)
By harnessing the computational power of modern GPUs
via General-Purpose Computing on Graphics Processing Units (GPGPU),
very fast calculations can be performed with a GPU cluster.
This thesis project is about extending an existing MPI
cluster back-end implementation of the
SkePU skeleton programming library
by data types that allow for the dynamic optimization of
and evaluating the implementation with several test programs
including a computationally intensive application.
The overall problem includes developing methods for
determining the optimal partitioning
of the problem, automated performance tuning for the best use of
resources, possibly in a non-dedicated environment;
also, devising new SkePU skeletons for
some computations / communication patterns
in the considered scientific computing problem.
An application from computational fluid dynamics
will be used as a case study.
This Master thesis project covers the following tasks:
- Research survey of related work.
- Design and implementation of new skeleton backends in C/C++, MPI and CUDA/OpenCL.
- Skeleton-based refactoring of the given benchmark application and experimental evaluation.
- Documentation of the results in thesis report.
Prerequisites: Courses in programming of parallel computers and
GPU computing (TDDC78 and TDDD56 or equivalent).
Good background in OpenCL, CUDA, MPI, C/C++, algorithms, Linux.
- Multicore-Programmering med OpenCL (30hp)
Project description by Syntronic AB (PDF)
Prerequisites: TDDD56 Multicore and GPU Programming or TDDC78 Programming of parallel computers, and TDDB68 Concurrent programming and operating systems (or equivalent courses), with completed labs. Good programming skills in C. Good commandment of Swedish language in written and oral communication.
Further information and application:
Åsa Detterfelt, asa.detterfelt (at) telia.com, 013-25 05 01.
Examinator: Christoph Kessler, IDA
- Generating target specific code for automatically detected
algorithmic patterns in C source programs (30 ECTS)
The goal of this project is to combine and extend an already existing
tool for automatic pattern recognition in C code with one or several
code generators for specific target back-ends.
Motivation of using patterns:
Using patterns to describe programs has three main goals.
The first one is that given pattern instance combinations
can easily be mapped to combinations of kernel implementations
for a given architecture; this yields a high level of reuse
of kernel implementations. Secondly the patterns can be seen
as a high level programming abstraction for the architecture,
leading to a component-based programming style.
The third goal is to automatically categorizing legacy C program
parts as occurrences of patterns, using pattern matching techniques
on existing source code to provide an automated migration path
and improved portability.
Project work description
A prototype of a pattern recognition tool has already been developed in an earlier project. Now it is your task is to combine this tool with code generators to be a able to generate target specific code. This work includes among other things:
- Extend the set of already existing patterns to raise the recognition rate.
- Select target architectures, both high-level such as OpenMP, Posix threads, etc and low level such as different hardware architectures, such as GPUs. It can also involve runtime systems such as StarPU.
- Implement the code generators for the selected architectures.
- Introduce performance aware target components.
- Test and evaluate.
(- If the result is satisfactory: write, submit and possibly present a scientific paper about it at a scientific workshop or conference.)
TDDB44 Compiler construction or similar course.
Good programming skills in C and Java.
Computer architecture course.
Course in component based software.
Parallel programming course.
Since this is a cross domain project work you will probably be assigned two supervisors (handledare).
Contact for further information:
Supervisor Erik Hansson
or examiner Christoph Kessler (christoph.kessler (at) liu.se)
- Sparse-Matrix support for the SkePU library for portable CPU/GPU programming (30hp)
This thesis project will extend the functionality of the
SkePU library for high-level, portable programming of
GPU-based systems, which was developed in
an earlier Master thesis project.
A matrix is called sparse if most of its entries are zeroes
such that a compressed storage format is more time and space efficient
than the traditional 2D array representation.
In this master thesis project you will extend SkePU with
support for sparse matrix computations.
In particular, you will design a smart container data structure for
representation of generic 2D sparse matrices and implement several of the
data-parallel skeletons of SkePU so that they can be applied to sparse matrices
in the same way as to dense matrices, with back-ends in sequential C++,
OpenMP, CUDA and OpenCL.
The implementation will be evaluated quantitatively on several GPU based platforms.
Further information is available on request, see the contact information below.
The library is developed in C++, OpenMP, and has implementations for
CUDA and OpenCL. The prerequisites for this Master thesis
project are good C++ programming skills and knowledge of GPU and parallel
programming (e.g., TDDD56 and TDDC78).
This is a research oriented project.
Contact: Usman Dastgeer, Christoph Kessler.
- [prel. taken] Performance Modeling for CUDA Applications (30hp)
Modern programmable graphics processors (GPUs) offer great speedup for
data-parallel computations by providing thousands of threads and low thread
overhead. However, tuning an application for these unconventional
architectures is still a black art to a major extent.
This thesis project is about investigating the main factors
(including micro-architectural and memory locality features)
that affect the performance of running applications on GPUs.
By understanding this, it will be possible to devise a
performance modeling framework to predict an application's performance
based on certain application and device characteristics.
This could then be used to (re)configure a GPU program before execution
in order to automatically tune its performance.
This is a research oriented project. If the result looks publishable, we
will encourage you to jointly write and submit a research paper to a
conference and sponsor presentation if accepted.
Further information is available on request.
Prerequisites: Programming in C, parallel programming (TDDC78/TANA77),
computer architecture and basic compiler knowledge.
Contact: Usman Dastgeer, Christoph Kessler
- Implementation and Performance Investigation for a
Motion Estimation Algorithm on a GPU (30hp)
The development cycles for hard- and software products will
become shorter and shorter. The product introduction time shows a strong
correlation to the success of the product itself.
Nevertheless, in the area of high resolution video processing
it is required to implement complex video algorithms, which operate
on a huge date set. Simulations with general purpose CPUs are time consuming
and hardware solutions are not possible in an early definition phase
of the algorithm. There are however some new generations of processors
on the market, which at least claim to be able to process complex video
algorithms in real time.
The goal of this master thesis shall be to implement an existing
block-matching algorithm for motion estimation between two video images
on a modern GPU. Motion estimation algorithms are for example used
in 120/240 Hz for real time frame rate conversion or in 3D TV applications
to estimate the disparity between the stereo 3D signals or to convert
the S-3D signals to Autostereo-3D signals.
The target is to reach real-time performance for the execution time
of the algorithm. If necessary the algorithm has to be modified for
that purpose. Furthermore the performance shall be evaluated
e.g. in dependence on the video sequence resolution, frame rate or bit width.
The Master thesis project covers the following tasks:
- Study of block matching algorithm for motion estimation
- Programming in C/C++ on GPU (NVIDIA Fermi)
- Implementation and performance investigations of an algorithm
- Documentation of the results
Working Environment: PC, NVIDIA-GPU (Cuda or OpenCL)
Course in programming of parallel computers (TDDC78 or equivalent).
Good background in image processing.
Contact: Christoph Kessler.
The thesis project will be co-supervised by:
Markus Schu, 3D Impact Media, Munich, Germany.
- Source-to-source Translator from Fork to CUDA (30hp)
Modern graphics processing units (GPUs) such as those produced by
NVIDIA and AMD/ATI offer massive computing power for data parallel
computations with hundreds of parallel threads.
At the same time, synchronization is fast.
These are actually properties that are characteristic for the classical
PRAM (Parallel Random Access Machine) model of parallel computation
(see e.g. the book
Practical PRAM Programming).
A previous thesis project in Germany described how
how the classical PRAM model
of parallel execution can be mapped to CUDA GPUs
and how especially the PRAM programming language
(or a subset of it) could be mapped to CUDA.
This project will retarget
the existing Fork compiler to generate code in CUDA,
the current programming platform for modern NVIDIA GPUs,
and develop optimizations in the translation process
to improve performance.
Prerequisites: Programming in C, Reading German language,
Compiler construction (e.g. TDDB44, TDDD16, TDDC86),
Programming parallel computers (e.g. TDDC78).
Integrated Code Generation in the LLVM Compiler System (30hp)
LLVM is a modern open-source compiler framework. This project will explore
how to apply
and implement our OPTIMIST
approach for integrated code generation in the LLVM compiler system.
Further information on request.
Prerequisites: Required: TDDB44 Compiler Construction or similar course.
Recommended: TDDC86 Compiler Optimizations and Code Generation.
Further thesis projects in compiler technology and
on request (chrke at ida.liu.se).
Back to my master thesis students page
More thesis projects at PELAB
Responsible for this page: Christoph Kessler, IDA