OPEN MASTER THESIS PROJECTS
Research Group on Compiler Technology and Parallel Computing
The following master thesis projects are currently available in my group:
All projects on this page require a solid background in either compiler construction or parallel programming (preferably both); at least one major course (preferably at master level including programming labs) in these areas should be passed successfully.
Note to non-LIU students (FAQ): If you want to do a thesis project with us, you must be registered on a master (or bachelor) program at Linköping university. It is generally not possible to do such projects remotely.
Support for generalized stencil computations in SkePU (30hp)
This project will, as a case study, consider an open-source high-performance computing application from medical image processing that is currently implemented in C++ and CUDA, investigate the requirements for expressing its performance-critical parts with existing (SkePU) skeletons, and develop the possibly required extensions to the SkePU library that allow to more conveniently express the application with SkePU skeletons.
Prerequisites: TDDD56 Multicore and GPU Programming, or similar course on parallel programming. Advanced C/C++ programming skills.
Contact: Christoph Kessler, Usman Dastgeer
Systematic Concurrent Debugging (30hp or 2x30hp)Ahmed Rezine or Christoph Kessler
Performance modeling for multi-kernel GPU computing (30hp)
GPU programming using CUDA is getting popular as GPUs are increasingly becoming part of mainstream computing. Already, 62 systems in TOP500 are GPU-based systems (Nov. 2012 listing) and millions of GPUs are sold every year for mobile and traditional computing domains. Modern GPUs have already become general-purpose and task-parallel with introduction of caches and possibility of concurrent execution of multiple computations.
The goal of this master thesis project is to investigate concurrent execution capabilities of modern NVIDIA (Fermi and Kepler) GPUs. The idea is to take different applications with different performance (computation, communication) characteristics and see how they behave when running concurrently with each other. Based on experimental findings, we will try to build a model to predict execution behaviour of a computational kernel when running with other computational kernels, given the information about GPU resource, and computational/communication needs of each computational kernel.
Prerequisites: TDDD56 Multicore and GPU Programming, or equivalent course that includes CUDA and OpenCL programming. Background in C/C++ programming and computer architecture.
Contact: Usman Dastgeer or Christoph Kessler.
[TAKEN] Evaluation of Energy-Efficient Scheduling Algorithms for Streaming Computations on Embedded Parallel Architectures (30hp)
With massively deployed embedded systems in various industrial domains such as telephony or distributed sensors, the need of more powerful and power-efficient processing devices is constantly increasing. Performance and energy consumption depends not only on hardware design, but also on how software is able to take profit of hardware capabilities. Today, energy efficiency is largely explored through the design of efficient and scalable parallel algorithms to provide more throughput without the need to increase clock frequencies. However, good scheduling techniques for multiprocessors are also crucial to achieve both throughput and energy saving. We consider the domain of stream processing, whose applications range from signal processing in telephony or sensors to multimedia encoding and decoding tasks. Streaming pipelines are optimized under a throughput constraint to minimize energy consumption through parallelism and frequency scaling.
Several energy-aware scheduling techniques are described in the research literature for streaming computation on embedded multiprocessor systems, including our own recent work on Crown Scheduling. However, choosing one for a particular development project is difficult as they can be assessed with an imperfect theoretical framework, or assessed through a variety of architectures; in particular, another architecture than the one chosen for the project. This thesis project consists in the development of a portable scheduler testing framework for a subset of common architectures in the parallel and embedded system research community (e.g., Tilera, SCC, MPPA, Epiphany, etc.). The framework will be used to compare several schedulers regarding the energy consumption of the schedules they produce. This work involves programming on many-core architectures developed to investigate today's and tomorrow's processors and represent an opportunity to get a first, but thorough experience with this technology. This thesis work is also a good first experience in academic research and may lead to scientific publications.
Contact: Nicolas Melot, Christoph Kessler
[TAKEN] Parallelizing the NEMO ocean model application for GPU-based systems
using the SkePU skeleton programming library (30hp)
Good C/C++ coding skills are essential. Good knowledge of Fortran and MPI. Knowledge of C + Fortran mixed programming. Knowledge of CUDA and/or OpenCL would be useful. We recommend TDDD56 Multicore and GPU Programming and TDDC78 Programming parallel computers.
Supervisor at NSC: Johan Raber, e-Science coordinator, NSC (
Supervisor at IDA: Usman Dastgeer, IDA
Examinator: Christoph Kessler, IDA
- Dynamic Optimization of Interprocessor Communication
in the MPI Back-End of the SkePU skeleton programming library (30hp)
By harnessing the computational power of modern GPUs via General-Purpose Computing on Graphics Processing Units (GPGPU), very fast calculations can be performed with a GPU cluster.
This thesis project is about extending an existing MPI cluster back-end implementation of the SkePU skeleton programming library by data types that allow for the dynamic optimization of inter-node communication, and evaluating the implementation with several test programs including a computationally intensive application.
The overall problem includes developing methods for determining the optimal partitioning of the problem, automated performance tuning for the best use of resources, possibly in a non-dedicated environment; also, devising new SkePU skeletons for some computations / communication patterns in the considered scientific computing problem. An application from computational fluid dynamics will be used as a case study.
This Master thesis project covers the following tasks:
- Research survey of related work.
- Design and implementation of new skeleton backends in C/C++, MPI and CUDA/OpenCL.
- Skeleton-based refactoring of the given benchmark application and experimental evaluation.
- Documentation of the results in thesis report.
Prerequisites: Courses in programming of parallel computers and GPU computing (TDDC78 and TDDD56 or equivalent). Good background in OpenCL, CUDA, MPI, C/C++, algorithms, Linux.
Contact: Christoph Kessler.
- [TAKEN] Multicore-Programmering med OpenCL (30hp)
Project description by Syntronic AB (PDF)
Prerequisites: TDDD56 Multicore and GPU Programming or TDDC78 Programming of parallel computers, and TDDB68 Concurrent programming and operating systems (or equivalent courses), with completed labs. Good programming skills in C. Good commandment of Swedish language in written and oral communication.
Further information and application: Åsa Detterfelt, asa.detterfelt (at) telia.com, 013-25 05 01.
Examinator: Christoph Kessler, IDA
- Generating target specific code for automatically detected
algorithmic patterns in C source programs (30 ECTS)
The goal of this project is to combine and extend an already existing tool for automatic pattern recognition in C code with one or several code generators for specific target back-ends.
Motivation of using patterns: Using patterns to describe programs has three main goals. The first one is that given pattern instance combinations can easily be mapped to combinations of kernel implementations for a given architecture; this yields a high level of reuse of kernel implementations. Secondly the patterns can be seen as a high level programming abstraction for the architecture, leading to a component-based programming style. The third goal is to automatically categorizing legacy C program parts as occurrences of patterns, using pattern matching techniques on existing source code to provide an automated migration path and improved portability.
Project work description
A prototype of a pattern recognition tool has already been developed in an earlier project. Now it is your task is to combine this tool with code generators to be a able to generate target specific code. This work includes among other things:
- Extend the set of already existing patterns to raise the recognition rate.
- Select target architectures, both high-level such as OpenMP, Posix threads, etc and low level such as different hardware architectures, such as GPUs. It can also involve runtime systems such as StarPU.
- Implement the code generators for the selected architectures.
- Introduce performance aware target components.
- Test and evaluate.
(- If the result is satisfactory: write, submit and possibly present a scientific paper about it at a scientific workshop or conference.)
Prerequisites: TDDB44 Compiler construction or similar course. Good programming skills in C and Java. Computer architecture course. Course in component based software. Parallel programming course.
Since this is a cross domain project work you will probably be assigned two supervisors (handledare).
Contact for further information:
Supervisor Erik Hansson or examiner Christoph Kessler (christoph.kessler (at) liu.se)
- [prel. taken HT14] Sparse-Matrix support for the SkePU library for portable CPU/GPU programming (30hp)
This thesis project will extend the functionality of the SkePU library for high-level, portable programming of GPU-based systems, which was developed in our group.
A matrix is called sparse if most of its entries are zeroes such that a compressed storage format is more time and space efficient than the traditional 2D array representation. In this master thesis project you will extend SkePU with support for sparse matrix computations. In particular, you will design a smart container data structure for representation of generic 2D sparse matrices and implement several of the data-parallel skeletons of SkePU so that they can be applied to sparse matrices in the same way as to dense matrices, with back-ends in sequential C++, OpenMP, CUDA and OpenCL. The implementation will be evaluated quantitatively on several GPU based platforms. Further information is available on request, see the contact information below.
The library is developed in C++, OpenMP, and has implementations for CUDA and OpenCL. The prerequisites for this Master thesis project are good C++ programming skills and knowledge of GPU and parallel programming (e.g., TDDD56 and TDDC78).
This is a research oriented project.
Contact: Usman Dastgeer, Christoph Kessler.
- Source-to-source Translator from Fork to CUDA (30hp)
Modern graphics processing units (GPUs) such as those produced by NVIDIA and AMD/ATI offer massive computing power for data parallel computations with hundreds of parallel threads. At the same time, synchronization is fast.
These are actually properties that are characteristic for the classical PRAM (Parallel Random Access Machine) model of parallel computation (see e.g. the book Practical PRAM Programming).
A previous thesis project in Germany described how how the classical PRAM model of parallel execution can be mapped to CUDA GPUs and how especially the PRAM programming language Fork (or a subset of it) could be mapped to CUDA.
This project will retarget the existing Fork compiler to generate code in CUDA, the current programming platform for modern NVIDIA GPUs, and develop optimizations in the translation process to improve performance.
Prerequisites: Programming in C, Reading German language, Compiler construction (e.g. TDDB44, TDDD16, TDDC86), Programming parallel computers (e.g. TDDC78).
Integrated Code Generation in the LLVM Compiler System (30hp)
LLVM is a modern open-source compiler framework. This project will explore how to apply and implement our OPTIMIST approach for integrated code generation in the LLVM compiler system. Further information on request.
Prerequisites: Required: TDDB44 Compiler Construction or similar course. Recommended: TDDC86 Compiler Optimizations and Code Generation.
Further thesis projects in compiler technology and
on request (chrke at ida.liu.se).
Responsible for this page: Christoph Kessler, IDA
Page responsible: Webmaster
Last updated: 2014-08-18