OPEN MASTER THESIS PROJECTS
Research Group on Compiler Technology and Parallel Computing
The following master thesis projects are currently available in my group:
|
Note:
All projects on this page require a solid background in either compiler construction
or parallel programming (preferably both); at least one major course (preferably
at master level including programming labs) in these areas should be passed successfully.
|
| |
|
Note to non-LIU students:
If you want to do a thesis project with us, you must be registered
on a master (or bachelor) program at Linköping university.
It is generally not possible to do such projects remotely.
|
- Automatic translation between parallel programming languages (30 ECTS)
Today there exist several C-based parallel programming languages which
target different parallel processor architectures. Even though many of these
architectures have the same programming model, usually each
architecture has to be programmed in a (more or less) specific
language. These languages tend to have different levels of abstraction,
and need to be supported by (vendor) specific tool-chains.
The outcome of the project will be a tool that translates program source
code from a (high-level) C-based parallel language, to one or several
different, possibly domain-specific, C based parallel target languages.
The work includes among other things:
- Select the target architectures and source/target languages.
- Extend an existing compiler front-end's intermediate representation
(IR) to support both the source language and the target language.
- Write transformation passes.
- Select test cases.
- Perform evaluations.
Prerequisites: Compiler Construction or Compilers and Interpreters or similar course. Good
programming skills in C and Java. Some understanding of multicore programming.
Contact for further information: Erik Hansson or Christoph Kessler.
- Communication Sensitive Load Balancing of Irregular
MPI Applications on Clusters (30hp)
Message Passing Interface (MPI) is a standardized and portable
message-passing system to function on a wide variety of parallel
computers with distributed memory architectures.
In many parallel applications from scientific computing,
MPI is used to parallelize time consuming sequential code
to decrease the overall execution time.
In an MPI application, different MPI processes execute
the same code on different data (SPMD style) on different machines
or nodes (each containing several cores) and exchange information
with each other using message passing.
Most of the time, the application (or computation) is irregular
such that after partitioning, orchestration, communication and mapping,
different MPI processes get different work loads, causing the load
on different nodes to imbalance.
Furthermore, increased communication of MPI processes across nodes
increases the overall execution time.
The MPI standard provides a way to give hints to the underlying MPI system
for efficiently mapping the MPI processes to the nodes such that those
MPI processes that communicate more are mapped to the same node.
But it does not support the mechanism to guide the mapping of MPI
processes to nodes according to computational load.
The goal of this thesis is to design, implement and evaluate a
communication sensitive load balancing algorithm.
Some graph based, evolutionary algorithms or heuristics based
on swarm intelligence can also be considered.
The algorithm will be evaluated using different benchmark
applications on supercomputers.
The Master thesis project covers the following tasks:
- Research survey of related work.
- Design and implementation of algorithm in C/C++ on NSC supercomputing center.
- Evaluation on different benchmark applications.
- Documentation of the results in thesis report.
Begin: ASAP.
Prerequisites: Course in programming of parallel computers
(TDDC78 or equivalent). Good background in C/C++, algorithms, Linux.
Contact: Christoph Kessler,
Mudassar Majeed (mudassar.majeed@liu.se)
- Multicore-Programmering med OpenCL (30hp)
Project description by Syntronic AB (PDF)
Prerequisites: TDDD56 Multicore and GPU Programming or TDDC78 Programming of parallel computers, and TDDB68 Concurrent programming and operating systems (or equivalent courses), with completed labs. Good programming skills in C. Good commandment of Swedish language in written and oral communication.
Further information and application:
Åsa Detterfelt, asa.detterfelt (at) telia.com, 013-25 05 01.
Examinator: Christoph Kessler, IDA
- Generating target specific code for automatically detected
algorithmic patterns in C source programs (30 ECTS)
The goal of this project is to combine and extend an already existing
tool for automatic pattern recognition in C code with one or several
code generators for specific target back-ends.
Motivation of using patterns:
Using patterns to describe programs has three main goals.
The first one is that given pattern instance combinations
can easily be mapped to combinations of kernel implementations
for a given architecture; this yields a high level of reuse
of kernel implementations. Secondly the patterns can be seen
as a high level programming abstraction for the architecture,
leading to a component-based programming style.
The third goal is to automatically categorizing legacy C program
parts as occurrences of patterns, using pattern matching techniques
on existing source code to provide an automated migration path
and improved portability.
Project work description
A prototype of a pattern recognition tool has already been developed in an earlier project. Now it is your task is to combine this tool with code generators to be a able to generate target specific code. This work includes among other things:
- Extend the set of already existing patterns to raise the recognition rate.
- Select target architectures, both high-level such as OpenMP, Posix threads, etc and low level such as different hardware architectures, such as GPUs. It can also involve runtime systems such as StarPU.
- Implement the code generators for the selected architectures.
- Introduce performance aware target components.
- Test and evaluate.
(- If the result is satisfactory: write, submit and possibly present a scientific paper about it at a scientific workshop or conference.)
Prerequisites:
TDDB44 Compiler construction or similar course.
Good programming skills in C and Java.
Computer architecture course.
Course in component based software.
Parallel programming course.
Since this is a cross domain project work you will probably be assigned two supervisors (handledare).
Contact for further information:
Supervisor Erik Hansson
or examiner Christoph Kessler (christoph.kessler (at) liu.se)
- Efficient sorting on the ePUMA Multicore DSP Processor (30hp)
Sorting is a classical problem in computer science and often used as a benchmark
for evaluating the performance of novel computer architectures.
The goal of this project is to develop, implement and evaluate efficient
sorting kernels
for the novel ePUMA processor, an energy-efficient multicore DSP
processor for emerging applications in telecommunication and multimedia,
which is being developed in a research project at Linköping University.
We will start from two classical sorting algorithms, mergesort and radix
sort, for integer and fixed-point sorting, first at the local (intra-core) level
and then integrated with inter-core parallelism and off-chip memory access.
The project will find out which algorithm(s) is/are more suitable for this
architecture and how they can possibly be combined and adapted to
exploit this unique architecture effectively.
This is a research-oriented project. If successful, we will support you
in writing a conference paper and, if accepted, presenting it.
Prerequisites: Parallel programming (e.g., TDDC78),
computer architecture.
Experience in C and assembler-level programming.
Data structures and algorithms.
Contact: Erik Hansson, Christoph Kessler
- Extension of the SkePU library for portable CPU/GPU programming (30hp)
This thesis is about extending the functionality of the
SkePU library for high-level, portable programming of
GPU-based systems, which was developed in
an earlier Master thesis project.
Support for different combinations of data parallel skeletons for sparse
2D (Matrix) data structures will be a major contribution of this Master
thesis project. Furthermore, support for some task parallel skeletons
(Farm, Pipe) can be considered.
Further information is available on request, see the contact information below.
The library is developed in C++, OpenMP, and has implementations for
CUDA and OpenCL. The prerequisites for this Master thesis
project are good C++ programming skills and knowledge of GPU and parallel
programming (e.g., TDDD56 and TDDC78).
This is a research oriented project.
Contact: Usman Dastgeer, Christoph Kessler.
- [prel. taken] Performance Modeling for CUDA Applications (30hp)
Modern programmable graphics processors (GPUs) offer great speedup for
data-parallel computations by providing thousands of threads and low thread
overhead. However, tuning an application for these unconventional
architectures is still a black art to a major extent.
This thesis project is about investigating the main factors
(including micro-architectural and memory locality features)
that affect the performance of running applications on GPUs.
By understanding this, it will be possible to devise a
performance modeling framework to predict an application's performance
based on certain application and device characteristics.
This could then be used to (re)configure a GPU program before execution
in order to automatically tune its performance.
This is a research oriented project. If the result looks publishable, we
will encourage you to jointly write and submit a research paper to a
conference and sponsor presentation if accepted.
Further information is available on request.
Prerequisites: Programming in C, parallel programming (TDDC78/TANA77),
computer architecture and basic compiler knowledge.
Contact: Usman Dastgeer, Christoph Kessler
- Implementation and Performance Investigation for a
Motion Estimation Algorithm on a GPU (30hp)
The development cycles for hard- and software products will
become shorter and shorter. The product introduction time shows a strong
correlation to the success of the product itself.
Nevertheless, in the area of high resolution video processing
it is required to implement complex video algorithms, which operate
on a huge date set. Simulations with general purpose CPUs are time consuming
and hardware solutions are not possible in an early definition phase
of the algorithm. There are however some new generations of processors
on the market, which at least claim to be able to process complex video
algorithms in real time.
The goal of this master thesis shall be to implement an existing
block-matching algorithm for motion estimation between two video images
on a modern GPU. Motion estimation algorithms are for example used
in 120/240 Hz for real time frame rate conversion or in 3D TV applications
to estimate the disparity between the stereo 3D signals or to convert
the S-3D signals to Autostereo-3D signals.
The target is to reach real-time performance for the execution time
of the algorithm. If necessary the algorithm has to be modified for
that purpose. Furthermore the performance shall be evaluated
e.g. in dependence on the video sequence resolution, frame rate or bit width.
The Master thesis project covers the following tasks:
- Study of block matching algorithm for motion estimation
- Programming in C/C++ on GPU (NVIDIA Fermi)
- Implementation and performance investigations of an algorithm
- Documentation of the results
Working Environment: PC, NVIDIA-GPU (Cuda or OpenCL)
Begin: asap.
Prerequisites:
Course in programming of parallel computers (TDDC78 or equivalent).
Good background in image processing.
C/C++, Linux.
Contact: Christoph Kessler.
The thesis project will be co-supervised by:
Markus Schu, 3D Impact Media, Munich, Germany.
- [taken] Optimized composition of parallel components on a Linux Cluster
(30hp)
Further information on request.
Prerequisites: Programming in C, Linux, some background in parallel programming in MPI, e.g. TDDC78/TANA77.
- Optimized composition of parallel components on a multicore server
(30hp)
Further information on request.
Prerequisites: Good programming skills in C, Linux; Concurrent programming and operating systems e.g. TDDB68, some background in parallel programming with pthreads or OpenMP, e.g. TDDC78/TANA77.
- ePUMA compiler backend (several topics), 30hp
Task description:
ePUMA is
a large joint research project with the
division of computer architecture
at ISY.
This project develops a novel embedded chip multiprocessor design
optimized for the DSP and multimedia domain;
the processor will be used for future radio base
station, radar, and HDTV applications.
We offer several final year projects in the software toolchain
of the ePUMA project; for instance,
designing a LLVM compiler back-end for either the
host processor or the SIMD cores.
The projects will be mainly carried out at the computer architecture
division at ISY but co-supervised and examinated by me.
The benefits students can get for these projects:
You will be paid (after successful
completion of the project) according to your
contributions. You will learn the methods of design,
verification, and implementation of compilers for embedded
processors.
Prerequisites:
Good knowledge of compilers (e.g., TDDB44, TDDD16 or TDDC86)
and hardware (e.g., you should have taken TSEA44 or an equivalent course,
and it is an advantage if you have taken the course TSEA26.)
Contact: Joar Sohl (joar \at isy.liu.se) and Erik Hansson.
- Bulk-synchronous parallel programming with NestStep on GPGPU (30hp)
NestStep is a C-based PGAS (partitioned global address space)
language for bulk-synchronous parallel (BSP) programming, a
parallel programming model that leads to code that is well-structured,
deadlock-free, and relatively easy to analyze for performance.
In this project, you will port the existing NestStep frontend and
run-time system (which exists today for Linux clusters and Cell BE)
to a GPGPU system, using either CUDA or OpenCL as technical platform.
Prerequisites: TDDC78/TANA77 Programming of parallel computers or
similar course, with completed lab series.
TDDB44 Compiler construction or similar course.
Good programming skills in C/C++, and in Java.
- Source-to-source Translator from Fork to CUDA (30hp)
Modern graphics processing units (GPUs) such as those produced by
NVIDIA and AMD/ATI offer massive computing power for data parallel
computations with hundreds of parallel threads.
At the same time, synchronization is fast.
These are actually properties that are characteristic for the classical
PRAM (Parallel Random Access Machine) model of parallel computation
(see e.g. the book
Practical PRAM Programming).
A previous thesis project in Germany described how
how the classical PRAM model
of parallel execution can be mapped to CUDA GPUs
and how especially the PRAM programming language
Fork
(or a subset of it) could be mapped to CUDA.
This project will retarget
the existing Fork compiler to generate code in CUDA,
the current programming platform for modern NVIDIA GPUs,
and develop optimizations in the translation process
to improve performance.
Prerequisites: Programming in C, Reading German language,
Compiler construction (e.g. TDDB44, TDDD16, TDDC86),
Programming parallel computers (e.g. TDDC78).
-
Parallel PRAM simulation with thread migration on a Multicore server (30 hp)
pramsim is a Solaris/Linux-based simulator for the
SB-PRAM,
a parallel computer with the strongest parallel programming
model ever realized, a so-called Combining CRCW-PRAM.
The PRAM model postulates that a set of
of processors run synchronously at the level of assembler
instructions. They have uniform access to a shared memory, and in addition each
can access a local memory. The differentiation between both
memories is by the most significant address bit. The shared memory
can be accessed also by combining instructions such as global reductions
and multiprefix operations. The instruction
memory is separate and local to each processor. More details about
the SB-PRAM and pramsim can be found in
Keller, Kessler, Träff: Practical PRAM Programming, Wiley, 2001.
Today, pramsim is used e.g. for
teaching and research on parallel algorithms.
Simulating a parallel computer on a sequential one results in very long
simulation times, and naturally leads to the idea to run pramsim
on a parallel computer. Various approaches for parallelization on
shared-memory and
distributed-memory systems such as Linux clusters have been proposed
in the literature, those for shared memory systems look most promising,
but the problem is challenging.
In this project, an alternative approach based on thread
migration should be implemented, evaluated, and compared to the
previous approaches.
Prerequisites:
TDDC78 Programming of parallel computers, or equivalent background
in parallel computing. Good programming skills in C.
Some knowledge of Linux.
-
Integrated Code Generation in the LLVM Compiler System (30hp)
LLVM is a modern open-source compiler framework. This project will explore
how to apply
and implement our OPTIMIST
approach for integrated code generation in the LLVM compiler system.
Further information on request.
Prerequisites: Required: TDDB44 Compiler Construction or similar course.
Recommended: TDDC86 Compiler Optimizations and Code Generation.
Further thesis projects in compiler technology and
parallel programming
on request (chrke at ida.liu.se).
Back to my master thesis students page
More thesis projects at PELAB
Responsible for this page: Christoph Kessler, IDA