OPEN MASTER THESIS PROJECTS
Research Group on Compiler Technology and Parallel Computing
The following master thesis projects are currently available in my group:
|
Note:
All projects on this page require a solid background in either compiler construction
or parallel programming (preferably both); at least one major course (preferably
at master level including programming labs) in these areas should be passed successfully.
Note to non-LIU students (FAQ):
If you want to do a thesis project with us, you must be registered
on a master (or bachelor) program at Linköping university.
It is generally not possible to do such projects remotely.
|
-
OpenMP for the REPLICA architecture (30hp)
OpenMP is a popular extension of e.g. the programming language C
for thread-based parallel programming. It allows the programmer
to (re-)write applications to utilize modern multicore processors
with shared memory in a relatively easy and high-level way (as
compared to threading libraries).
In OpenMP the programmer can annotate regions of a sequential program,
in particular loops, with directives (pragmas) to instruct the compiler
to generate code that can be executed in parallel by multiple threads.
REPLICA is an new PRAM-like multicore hardware architecture with
configurable emulated shared memory (CESM).
The goal of this master thesis is to write an OpenMP to REPLICA translator/compiler,
so that existing OpenMP programs can be compiled for and be executed on
the REPLICA architecture. The work may also include defining new REPLICA
specific extensions to OpenMP to make use of the hardware features
in an efficient way.
The implementation can be based on an already existing compiler/translator framework.
Prerequisites: Parallel programming.
Background in C/C++, assembly programming, and compiler construction.
Contact: Erik Hansson or Christoph Kessler.
-
Parallelizing the NEMO ocean model application for GPU-based systems
using the SkePU skeleton programming library (30hp)
Requirements:
Good C/C++ coding skills are essential.
Good knowledge of Fortran and MPI.
Knowledge of C + Fortran mixed programming.
Knowledge of CUDA and/or OpenCL would be useful.
We recommend TDDD56 Multicore and GPU Programming
and TDDC78 Programming parallel computers.
Contact:
Supervisor at NSC: Johan Raber, e-Science coordinator, NSC (@nsc.liu.se)
Supervisor at IDA: Usman Dastgeer, IDA
Examinator: Christoph Kessler, IDA
-
Evaluating and Comparing Parallel Programming Models, Languages and Hardware Platforms
for Scalable Shared-Memory Computing (30hp)
The purpose of this master thesis is to quantitatively evaluate the REPLICA architecture,
a family of novel scalable chip multiprocessors with configurable emulated
shared memory (CESM)
architecture, whose computation model is based on the PRAM model.
By investigating REPLICAs programming model and programming language and
comparing it both to similar parallel architectures and more diverse ones,
the goal is find out which type of algorithms are highly (or less) suitable for REPLICA,
but also show how REPLICA is positioned compared to existing architectures,
both in performance and complexity of programming.
The work includes (but is not limited to):
- Select different architectures to compare with, such as SB-PRAM, XMT,
GPGPUs, Multicore CPU etc.
- Select a set of parallel computational kernels from existing
well-known benchmarks and literature.
- Implement/port the kernels to the selected architectures/languages.
- Evaluate and compare.
You will gain both theoretical and practical experience in programming
a wide range of existing parallel architectures.
Prerequisites: Parallel programming. Computer architecture. Good background in C/C++, assembly programming, and basic compiler construction.
Contact for further information: Erik Hansson or Christoph Kessler.
- Extension and evaluation of a run-time system for synchronous parallel tasks (30 ECTS)
In an earlier pilot study we demonstrated the feasibility of a task-based runtime system
for a PRAM-like shared memory parallel system
that handles the creation, scheduling, resource allocation and dispatching
of single-threaded and multi-threaded (parallel) tasks.
This project will improve the runtime system design and its implementation,
add an automated tuning feature,
experiment with various resource allocation and scheduling strategies,
and systematically evaluate it with a number of irregular parallel
example applications. The implementation will use the C-based parallel
language Fork
and the SBPRAM simulator for execution.
Literature:
J. Keller, C. Kessler, J. Träff: Practical PRAM Programming, Wiley Interscience, 2001.
C. Kessler, E. Hansson:
Flexible Scheduling and Thread Allocation for Synchronous Parallel Tasks.
Proc. PASA-2012 workshop, München, Germany, Feb. 2012.
Prerequisites: TDDB68 Concurrent programming and operating systems and
a course in parallel/multicore programming such as TDDD56 or TDDC78.
Good C programming skills. Linux or Solaris.
Some familiarity with assembler programming is helpful.
Contact: Erik Hansson, Christoph Kessler
- [taken] A Cluster Back-End for the SkePU skeleton programming library (30hp)
By harnessing the computational power of modern GPUs
via General-Purpose Computing on Graphics Processing Units (GPGPU),
very fast calculations can be performed with a GPU cluster.
This thesis project is about implementing
the data-parallel skeletons
in the
SkePU skeleton programming library
for execution
on a MPI Linux cluster of (multicore) nodes each with one or more GPUs,
and evaluating the implementation with several test programs
including a computationally intensive application.
The overall problem includes developing methods for
determining the optimal partitioning
of the problem, automated performance tuning for the best use of
resources, possibly in a non-dedicated environment;
also, devising new SkePU skeletons for
some computations / communication patterns
in the considered scientific computing problem.
An application from computational fluid dynamics
will be used as a case study.
This Master thesis project covers the following tasks:
- Research survey of related work.
- Design and implementation of new skeleton backends in C/C++, MPI and CUDA/OpenCL.
- Skeleton-based refactoring of the given benchmark application and experimental evaluation.
- Documentation of the results in thesis report.
Begin: ASAP.
Prerequisites: Courses in programming of parallel computers and
GPU computing (TDDC78 and TDDD56 or equivalent).
Good background in OpenCL, CUDA, MPI, C/C++, algorithms, Linux.
Contact: Mudassar Majeed,
Christoph Kessler.
- Communication Sensitive Load Balancing of Irregular
MPI Applications on Clusters (30hp)
Message Passing Interface (MPI) is a standardized and portable
message-passing system to function on a wide variety of parallel
computers with distributed memory architectures.
In many parallel applications from scientific computing,
MPI is used to parallelize time consuming sequential code
to decrease the overall execution time.
In an MPI application, different MPI processes execute
the same code on different data (SPMD style) on different machines
or nodes (each containing several cores) and exchange information
with each other using message passing.
Most of the time, the application (or computation) is irregular
such that after partitioning, orchestration, communication and mapping,
different MPI processes get different work loads, causing the load
on different nodes to imbalance.
Furthermore, increased communication of MPI processes across nodes
increases the overall execution time.
The MPI standard provides a way to give hints to the underlying MPI system
for efficiently mapping the MPI processes to the nodes such that those
MPI processes that communicate more are mapped to the same node.
But it does not support the mechanism to guide the mapping of MPI
processes to nodes according to computational load.
The goal of this thesis is to design, implement and evaluate a
communication sensitive load balancing algorithm.
Some graph based, evolutionary algorithms or heuristics based
on swarm intelligence can also be considered.
The algorithm will be evaluated using different benchmark
applications on supercomputers.
The Master thesis project covers the following tasks:
- Research survey of related work.
- Design and implementation of algorithm in C/C++ on NSC supercomputing center.
- Evaluation on different benchmark applications.
- Documentation of the results in thesis report.
Begin: ASAP.
Prerequisites: Course in programming of parallel computers
(TDDC78 or equivalent). Good background in C/C++, algorithms, Linux.
Contact: Mudassar Majeed,
Christoph Kessler.
- Multicore-Programmering med OpenCL (30hp)
Project description by Syntronic AB (PDF)
Prerequisites: TDDD56 Multicore and GPU Programming or TDDC78 Programming of parallel computers, and TDDB68 Concurrent programming and operating systems (or equivalent courses), with completed labs. Good programming skills in C. Good commandment of Swedish language in written and oral communication.
Further information and application:
Åsa Detterfelt, asa.detterfelt (at) telia.com, 013-25 05 01.
Examinator: Christoph Kessler, IDA
- Generating target specific code for automatically detected
algorithmic patterns in C source programs (30 ECTS)
The goal of this project is to combine and extend an already existing
tool for automatic pattern recognition in C code with one or several
code generators for specific target back-ends.
Motivation of using patterns:
Using patterns to describe programs has three main goals.
The first one is that given pattern instance combinations
can easily be mapped to combinations of kernel implementations
for a given architecture; this yields a high level of reuse
of kernel implementations. Secondly the patterns can be seen
as a high level programming abstraction for the architecture,
leading to a component-based programming style.
The third goal is to automatically categorizing legacy C program
parts as occurrences of patterns, using pattern matching techniques
on existing source code to provide an automated migration path
and improved portability.
Project work description
A prototype of a pattern recognition tool has already been developed in an earlier project. Now it is your task is to combine this tool with code generators to be a able to generate target specific code. This work includes among other things:
- Extend the set of already existing patterns to raise the recognition rate.
- Select target architectures, both high-level such as OpenMP, Posix threads, etc and low level such as different hardware architectures, such as GPUs. It can also involve runtime systems such as StarPU.
- Implement the code generators for the selected architectures.
- Introduce performance aware target components.
- Test and evaluate.
(- If the result is satisfactory: write, submit and possibly present a scientific paper about it at a scientific workshop or conference.)
Prerequisites:
TDDB44 Compiler construction or similar course.
Good programming skills in C and Java.
Computer architecture course.
Course in component based software.
Parallel programming course.
Since this is a cross domain project work you will probably be assigned two supervisors (handledare).
Contact for further information:
Supervisor Erik Hansson
or examiner Christoph Kessler (christoph.kessler (at) liu.se)
- Efficient sorting on the ePUMA Multicore DSP Processor (30hp)
Sorting is a classical problem in computer science and often used as a benchmark
for evaluating the performance of novel computer architectures.
The goal of this project is to develop, implement and evaluate efficient
sorting kernels
for the novel ePUMA processor, an energy-efficient multicore DSP
processor for emerging applications in telecommunication and multimedia,
which is being developed in a research project at Linköping University.
We will start from two classical sorting algorithms, mergesort and radix
sort, for integer and fixed-point sorting, first at the local (intra-core) level
and then integrated with inter-core parallelism and off-chip memory access.
The project will find out which algorithm(s) is/are more suitable for this
architecture and how they can possibly be combined and adapted to
exploit this unique architecture effectively.
This is a research-oriented project. If successful, we will support you
in writing a conference paper and, if accepted, presenting it.
Prerequisites: Parallel programming (e.g., TDDC78),
computer architecture.
Experience in C and assembler-level programming.
Data structures and algorithms.
Contact: Erik Hansson, Christoph Kessler
- Sparse-Matrix support for the SkePU library for portable CPU/GPU programming (30hp)
This thesis project will extend the functionality of the
SkePU library for high-level, portable programming of
GPU-based systems, which was developed in
an earlier Master thesis project.
A matrix is called sparse if most of its entries are zeroes
such that a compressed storage format is more time and space efficient
than the traditional 2D array representation.
In this master thesis project you will extend SkePU with
support for sparse matrix computations.
In particular, you will design a smart container data structure for
representation of generic 2D sparse matrices and implement several of the
data-parallel skeletons of SkePU so that they can be applied to sparse matrices
in the same way as to dense matrices, with back-ends in sequential C++,
OpenMP, CUDA and OpenCL.
The implementation will be evaluated quantitatively on several GPU based platforms.
Further information is available on request, see the contact information below.
The library is developed in C++, OpenMP, and has implementations for
CUDA and OpenCL. The prerequisites for this Master thesis
project are good C++ programming skills and knowledge of GPU and parallel
programming (e.g., TDDD56 and TDDC78).
This is a research oriented project.
Contact: Usman Dastgeer, Christoph Kessler.
- [prel. taken] Performance Modeling for CUDA Applications (30hp)
Modern programmable graphics processors (GPUs) offer great speedup for
data-parallel computations by providing thousands of threads and low thread
overhead. However, tuning an application for these unconventional
architectures is still a black art to a major extent.
This thesis project is about investigating the main factors
(including micro-architectural and memory locality features)
that affect the performance of running applications on GPUs.
By understanding this, it will be possible to devise a
performance modeling framework to predict an application's performance
based on certain application and device characteristics.
This could then be used to (re)configure a GPU program before execution
in order to automatically tune its performance.
This is a research oriented project. If the result looks publishable, we
will encourage you to jointly write and submit a research paper to a
conference and sponsor presentation if accepted.
Further information is available on request.
Prerequisites: Programming in C, parallel programming (TDDC78/TANA77),
computer architecture and basic compiler knowledge.
Contact: Usman Dastgeer, Christoph Kessler
- Implementation and Performance Investigation for a
Motion Estimation Algorithm on a GPU (30hp)
The development cycles for hard- and software products will
become shorter and shorter. The product introduction time shows a strong
correlation to the success of the product itself.
Nevertheless, in the area of high resolution video processing
it is required to implement complex video algorithms, which operate
on a huge date set. Simulations with general purpose CPUs are time consuming
and hardware solutions are not possible in an early definition phase
of the algorithm. There are however some new generations of processors
on the market, which at least claim to be able to process complex video
algorithms in real time.
The goal of this master thesis shall be to implement an existing
block-matching algorithm for motion estimation between two video images
on a modern GPU. Motion estimation algorithms are for example used
in 120/240 Hz for real time frame rate conversion or in 3D TV applications
to estimate the disparity between the stereo 3D signals or to convert
the S-3D signals to Autostereo-3D signals.
The target is to reach real-time performance for the execution time
of the algorithm. If necessary the algorithm has to be modified for
that purpose. Furthermore the performance shall be evaluated
e.g. in dependence on the video sequence resolution, frame rate or bit width.
The Master thesis project covers the following tasks:
- Study of block matching algorithm for motion estimation
- Programming in C/C++ on GPU (NVIDIA Fermi)
- Implementation and performance investigations of an algorithm
- Documentation of the results
Working Environment: PC, NVIDIA-GPU (Cuda or OpenCL)
Begin: asap.
Prerequisites:
Course in programming of parallel computers (TDDC78 or equivalent).
Good background in image processing.
C/C++, Linux.
Contact: Christoph Kessler.
The thesis project will be co-supervised by:
Markus Schu, 3D Impact Media, Munich, Germany.
- [taken] Optimized composition of parallel components on a Linux Cluster
(30hp)
Further information on request.
Prerequisites: Programming in C, Linux, some background in parallel programming in MPI, e.g. TDDC78/TANA77.
- Optimized composition of parallel components on a multicore server
(30hp)
Further information on request.
Prerequisites: Good programming skills in C, Linux; Concurrent programming and operating systems e.g. TDDB68, some background in parallel programming with pthreads or OpenMP, e.g. TDDC78/TANA77.
- ePUMA compiler backend (several topics), 30hp
Task description:
ePUMA is
a large joint research project with the
division of computer architecture
at ISY.
This project develops a novel embedded chip multiprocessor design
optimized for the DSP and multimedia domain;
the processor will be used for future radio base
station, radar, and HDTV applications.
We offer several final year projects in the software toolchain
of the ePUMA project; for instance,
designing a LLVM compiler back-end for either the
host processor or the SIMD cores.
The projects will be mainly carried out at the computer architecture
division at ISY but co-supervised and examinated by me.
The benefits students can get for these projects:
You will be paid (after successful
completion of the project) according to your
contributions. You will learn the methods of design,
verification, and implementation of compilers for embedded
processors.
Prerequisites:
Good knowledge of compilers (e.g., TDDB44, TDDD16 or TDDC86)
and hardware (e.g., you should have taken TSEA44 or an equivalent course,
and it is an advantage if you have taken the course TSEA26.)
Contact: Joar Sohl (joar \at isy.liu.se) and Erik Hansson.
- Bulk-synchronous parallel programming with NestStep on GPGPU (30hp)
NestStep is a C-based PGAS (partitioned global address space)
language for bulk-synchronous parallel (BSP) programming, a
parallel programming model that leads to code that is well-structured,
deadlock-free, and relatively easy to analyze for performance.
In this project, you will port the existing NestStep frontend and
run-time system (which exists today for Linux clusters and Cell BE)
to a GPGPU system, using either CUDA or OpenCL as technical platform.
Prerequisites: TDDC78/TANA77 Programming of parallel computers or
similar course, with completed lab series.
TDDB44 Compiler construction or similar course.
Good programming skills in C/C++, and in Java.
- Source-to-source Translator from Fork to CUDA (30hp)
Modern graphics processing units (GPUs) such as those produced by
NVIDIA and AMD/ATI offer massive computing power for data parallel
computations with hundreds of parallel threads.
At the same time, synchronization is fast.
These are actually properties that are characteristic for the classical
PRAM (Parallel Random Access Machine) model of parallel computation
(see e.g. the book
Practical PRAM Programming).
A previous thesis project in Germany described how
how the classical PRAM model
of parallel execution can be mapped to CUDA GPUs
and how especially the PRAM programming language
Fork
(or a subset of it) could be mapped to CUDA.
This project will retarget
the existing Fork compiler to generate code in CUDA,
the current programming platform for modern NVIDIA GPUs,
and develop optimizations in the translation process
to improve performance.
Prerequisites: Programming in C, Reading German language,
Compiler construction (e.g. TDDB44, TDDD16, TDDC86),
Programming parallel computers (e.g. TDDC78).
-
Integrated Code Generation in the LLVM Compiler System (30hp)
LLVM is a modern open-source compiler framework. This project will explore
how to apply
and implement our OPTIMIST
approach for integrated code generation in the LLVM compiler system.
Further information on request.
Prerequisites: Required: TDDB44 Compiler Construction or similar course.
Recommended: TDDC86 Compiler Optimizations and Code Generation.
Further thesis projects in compiler technology and
parallel programming
on request (chrke at ida.liu.se).
Back to my master thesis students page
More thesis projects at PELAB
Responsible for this page: Christoph Kessler, IDA