Hide menu
PELAB

OPEN MASTER THESIS PROJECTS

Research Group on Compiler Technology and Parallel Computing

Christoph Kessler

The following master thesis projects are currently available in my group:

Note:

All projects on this page require a solid background in either compiler construction or parallel programming (preferably both); at least one major course (preferably at master level including programming labs) in these areas should be passed successfully.

Note to non-LIU students (FAQ): If you want to do a thesis project with us, you must be registered on a master (or bachelor) program at Linköping university. It is generally not possible to do such projects remotely.

  • OpenMP for the REPLICA architecture (30hp)

    OpenMP is a popular extension of e.g. the programming language C for thread-based parallel programming. It allows the programmer to (re-)write applications to utilize modern multicore processors with shared memory in a relatively easy and high-level way (as compared to threading libraries). In OpenMP the programmer can annotate regions of a sequential program, in particular loops, with directives (pragmas) to instruct the compiler to generate code that can be executed in parallel by multiple threads.
    REPLICA is an new PRAM-like multicore hardware architecture with configurable emulated shared memory (CESM).
    The goal of this master thesis is to write an OpenMP to REPLICA translator/compiler, so that existing OpenMP programs can be compiled for and be executed on the REPLICA architecture. The work may also include defining new REPLICA specific extensions to OpenMP to make use of the hardware features in an efficient way.
    The implementation can be based on an already existing compiler/translator framework.
    Prerequisites: Parallel programming. Background in C/C++, assembly programming, and compiler construction.
    Contact: Erik Hansson or Christoph Kessler.

  • Parallelizing the NEMO ocean model application for GPU-based systems using the SkePU skeleton programming library (30hp) Requirements:
    Good C/C++ coding skills are essential. Good knowledge of Fortran and MPI. Knowledge of C + Fortran mixed programming. Knowledge of CUDA and/or OpenCL would be useful. We recommend TDDD56 Multicore and GPU Programming and TDDC78 Programming parallel computers.
    Contact:
    Supervisor at NSC: Johan Raber, e-Science coordinator, NSC (@nsc.liu.se)
    Supervisor at IDA: Usman Dastgeer, IDA
    Examinator: Christoph Kessler, IDA
  • Evaluating and Comparing Parallel Programming Models, Languages and Hardware Platforms for Scalable Shared-Memory Computing (30hp)
    The purpose of this master thesis is to quantitatively evaluate the REPLICA architecture, a family of novel scalable chip multiprocessors with configurable emulated shared memory (CESM) architecture, whose computation model is based on the PRAM model.
    By investigating REPLICAs programming model and programming language and comparing it both to similar parallel architectures and more diverse ones, the goal is find out which type of algorithms are highly (or less) suitable for REPLICA, but also show how REPLICA is positioned compared to existing architectures, both in performance and complexity of programming.
    The work includes (but is not limited to):
    - Select different architectures to compare with, such as SB-PRAM, XMT, GPGPUs, Multicore CPU etc.
    - Select a set of parallel computational kernels from existing well-known benchmarks and literature.
    - Implement/port the kernels to the selected architectures/languages.
    - Evaluate and compare.
    You will gain both theoretical and practical experience in programming a wide range of existing parallel architectures.
    Prerequisites: Parallel programming. Computer architecture. Good background in C/C++, assembly programming, and basic compiler construction.
    Contact for further information: Erik Hansson or Christoph Kessler.
  • Extension and evaluation of a run-time system for synchronous parallel tasks (30 ECTS)
    In an earlier pilot study we demonstrated the feasibility of a task-based runtime system for a PRAM-like shared memory parallel system that handles the creation, scheduling, resource allocation and dispatching of single-threaded and multi-threaded (parallel) tasks. This project will improve the runtime system design and its implementation, add an automated tuning feature, experiment with various resource allocation and scheduling strategies, and systematically evaluate it with a number of irregular parallel example applications. The implementation will use the C-based parallel language Fork and the SBPRAM simulator for execution.
    Literature:
    J. Keller, C. Kessler, J. Träff: Practical PRAM Programming, Wiley Interscience, 2001.
    C. Kessler, E. Hansson: Flexible Scheduling and Thread Allocation for Synchronous Parallel Tasks. Proc. PASA-2012 workshop, München, Germany, Feb. 2012.
    Prerequisites: TDDB68 Concurrent programming and operating systems and a course in parallel/multicore programming such as TDDD56 or TDDC78. Good C programming skills. Linux or Solaris. Some familiarity with assembler programming is helpful.
    Contact: Erik Hansson, Christoph Kessler
  • [taken] A Cluster Back-End for the SkePU skeleton programming library (30hp)
    By harnessing the computational power of modern GPUs via General-Purpose Computing on Graphics Processing Units (GPGPU), very fast calculations can be performed with a GPU cluster.
    This thesis project is about implementing the data-parallel skeletons in the SkePU skeleton programming library for execution on a MPI Linux cluster of (multicore) nodes each with one or more GPUs, and evaluating the implementation with several test programs including a computationally intensive application.
    The overall problem includes developing methods for determining the optimal partitioning of the problem, automated performance tuning for the best use of resources, possibly in a non-dedicated environment; also, devising new SkePU skeletons for some computations / communication patterns in the considered scientific computing problem. An application from computational fluid dynamics will be used as a case study.
    This Master thesis project covers the following tasks:
    - Research survey of related work.
    - Design and implementation of new skeleton backends in C/C++, MPI and CUDA/OpenCL.
    - Skeleton-based refactoring of the given benchmark application and experimental evaluation.
    - Documentation of the results in thesis report.
    Begin: ASAP.
    Prerequisites: Courses in programming of parallel computers and GPU computing (TDDC78 and TDDD56 or equivalent). Good background in OpenCL, CUDA, MPI, C/C++, algorithms, Linux.
    Contact: Mudassar Majeed, Christoph Kessler.
  • Communication Sensitive Load Balancing of Irregular MPI Applications on Clusters (30hp)
    Message Passing Interface (MPI) is a standardized and portable message-passing system to function on a wide variety of parallel computers with distributed memory architectures. In many parallel applications from scientific computing, MPI is used to parallelize time consuming sequential code to decrease the overall execution time. In an MPI application, different MPI processes execute the same code on different data (SPMD style) on different machines or nodes (each containing several cores) and exchange information with each other using message passing. Most of the time, the application (or computation) is irregular such that after partitioning, orchestration, communication and mapping, different MPI processes get different work loads, causing the load on different nodes to imbalance. Furthermore, increased communication of MPI processes across nodes increases the overall execution time. The MPI standard provides a way to give hints to the underlying MPI system for efficiently mapping the MPI processes to the nodes such that those MPI processes that communicate more are mapped to the same node. But it does not support the mechanism to guide the mapping of MPI processes to nodes according to computational load.
    The goal of this thesis is to design, implement and evaluate a communication sensitive load balancing algorithm. Some graph based, evolutionary algorithms or heuristics based on swarm intelligence can also be considered. The algorithm will be evaluated using different benchmark applications on supercomputers.
    The Master thesis project covers the following tasks:
    - Research survey of related work.
    - Design and implementation of algorithm in C/C++ on NSC supercomputing center.
    - Evaluation on different benchmark applications.
    - Documentation of the results in thesis report.
    Begin: ASAP.
    Prerequisites: Course in programming of parallel computers (TDDC78 or equivalent). Good background in C/C++, algorithms, Linux.
    Contact: Mudassar Majeed, Christoph Kessler.
  • Multicore-Programmering med OpenCL (30hp)
    Project description by Syntronic AB (PDF)
    Prerequisites: TDDD56 Multicore and GPU Programming or TDDC78 Programming of parallel computers, and TDDB68 Concurrent programming and operating systems (or equivalent courses), with completed labs. Good programming skills in C. Good commandment of Swedish language in written and oral communication.
    Further information and application: Åsa Detterfelt, asa.detterfelt (at) telia.com, 013-25 05 01.
    Examinator: Christoph Kessler, IDA
  • Generating target specific code for automatically detected algorithmic patterns in C source programs (30 ECTS)
    The goal of this project is to combine and extend an already existing tool for automatic pattern recognition in C code with one or several code generators for specific target back-ends.
    Motivation of using patterns: Using patterns to describe programs has three main goals. The first one is that given pattern instance combinations can easily be mapped to combinations of kernel implementations for a given architecture; this yields a high level of reuse of kernel implementations. Secondly the patterns can be seen as a high level programming abstraction for the architecture, leading to a component-based programming style. The third goal is to automatically categorizing legacy C program parts as occurrences of patterns, using pattern matching techniques on existing source code to provide an automated migration path and improved portability.
    Project work description
    A prototype of a pattern recognition tool has already been developed in an earlier project. Now it is your task is to combine this tool with code generators to be a able to generate target specific code. This work includes among other things:
    - Extend the set of already existing patterns to raise the recognition rate.
    - Select target architectures, both high-level such as OpenMP, Posix threads, etc and low level such as different hardware architectures, such as GPUs. It can also involve runtime systems such as StarPU.
    - Implement the code generators for the selected architectures.
    - Introduce performance aware target components.
    - Test and evaluate.
    (- If the result is satisfactory: write, submit and possibly present a scientific paper about it at a scientific workshop or conference.)
    Prerequisites: TDDB44 Compiler construction or similar course. Good programming skills in C and Java. Computer architecture course. Course in component based software. Parallel programming course.
    Since this is a cross domain project work you will probably be assigned two supervisors (handledare).
    Contact for further information:
    Supervisor Erik Hansson or examiner Christoph Kessler (christoph.kessler (at) liu.se)
  • Efficient sorting on the ePUMA Multicore DSP Processor (30hp)
    Sorting is a classical problem in computer science and often used as a benchmark for evaluating the performance of novel computer architectures.
    The goal of this project is to develop, implement and evaluate efficient sorting kernels for the novel ePUMA processor, an energy-efficient multicore DSP processor for emerging applications in telecommunication and multimedia, which is being developed in a research project at Linköping University.
    We will start from two classical sorting algorithms, mergesort and radix sort, for integer and fixed-point sorting, first at the local (intra-core) level and then integrated with inter-core parallelism and off-chip memory access. The project will find out which algorithm(s) is/are more suitable for this architecture and how they can possibly be combined and adapted to exploit this unique architecture effectively.
    This is a research-oriented project. If successful, we will support you in writing a conference paper and, if accepted, presenting it.
    Prerequisites: Parallel programming (e.g., TDDC78), computer architecture. Experience in C and assembler-level programming. Data structures and algorithms.
    Contact: Erik Hansson, Christoph Kessler
  • Sparse-Matrix support for the SkePU library for portable CPU/GPU programming (30hp)
    This thesis project will extend the functionality of the SkePU library for high-level, portable programming of GPU-based systems, which was developed in an earlier Master thesis project.
    A matrix is called sparse if most of its entries are zeroes such that a compressed storage format is more time and space efficient than the traditional 2D array representation. In this master thesis project you will extend SkePU with support for sparse matrix computations. In particular, you will design a smart container data structure for representation of generic 2D sparse matrices and implement several of the data-parallel skeletons of SkePU so that they can be applied to sparse matrices in the same way as to dense matrices, with back-ends in sequential C++, OpenMP, CUDA and OpenCL. The implementation will be evaluated quantitatively on several GPU based platforms. Further information is available on request, see the contact information below.
    The library is developed in C++, OpenMP, and has implementations for CUDA and OpenCL. The prerequisites for this Master thesis project are good C++ programming skills and knowledge of GPU and parallel programming (e.g., TDDD56 and TDDC78).
    This is a research oriented project.
    Contact: Usman Dastgeer, Christoph Kessler.
  • [prel. taken] Performance Modeling for CUDA Applications (30hp)
    Modern programmable graphics processors (GPUs) offer great speedup for data-parallel computations by providing thousands of threads and low thread overhead. However, tuning an application for these unconventional architectures is still a black art to a major extent.
    This thesis project is about investigating the main factors (including micro-architectural and memory locality features) that affect the performance of running applications on GPUs. By understanding this, it will be possible to devise a performance modeling framework to predict an application's performance based on certain application and device characteristics. This could then be used to (re)configure a GPU program before execution in order to automatically tune its performance.
    This is a research oriented project. If the result looks publishable, we will encourage you to jointly write and submit a research paper to a conference and sponsor presentation if accepted.
    Further information is available on request.
    Prerequisites: Programming in C, parallel programming (TDDC78/TANA77), computer architecture and basic compiler knowledge.
    Contact: Usman Dastgeer, Christoph Kessler
  • Implementation and Performance Investigation for a Motion Estimation Algorithm on a GPU (30hp)
    The development cycles for hard- and software products will become shorter and shorter. The product introduction time shows a strong correlation to the success of the product itself. Nevertheless, in the area of high resolution video processing it is required to implement complex video algorithms, which operate on a huge date set. Simulations with general purpose CPUs are time consuming and hardware solutions are not possible in an early definition phase of the algorithm. There are however some new generations of processors on the market, which at least claim to be able to process complex video algorithms in real time.
    The goal of this master thesis shall be to implement an existing block-matching algorithm for motion estimation between two video images on a modern GPU. Motion estimation algorithms are for example used in 120/240 Hz for real time frame rate conversion or in 3D TV applications to estimate the disparity between the stereo 3D signals or to convert the S-3D signals to Autostereo-3D signals.
    The target is to reach real-time performance for the execution time of the algorithm. If necessary the algorithm has to be modified for that purpose. Furthermore the performance shall be evaluated e.g. in dependence on the video sequence resolution, frame rate or bit width.
    The Master thesis project covers the following tasks:
    - Study of block matching algorithm for motion estimation
    - Programming in C/C++ on GPU (NVIDIA Fermi)
    - Implementation and performance investigations of an algorithm
    - Documentation of the results
    Working Environment: PC, NVIDIA-GPU (Cuda or OpenCL)
    Begin: asap.
    Prerequisites: Course in programming of parallel computers (TDDC78 or equivalent). Good background in image processing. C/C++, Linux.
    Contact: Christoph Kessler.
    The thesis project will be co-supervised by:
    Markus Schu, 3D Impact Media, Munich, Germany.

  • [taken] Optimized composition of parallel components on a Linux Cluster (30hp)
    Further information on request.
    Prerequisites: Programming in C, Linux, some background in parallel programming in MPI, e.g. TDDC78/TANA77.

  • Optimized composition of parallel components on a multicore server (30hp)
    Further information on request.
    Prerequisites: Good programming skills in C, Linux; Concurrent programming and operating systems e.g. TDDB68, some background in parallel programming with pthreads or OpenMP, e.g. TDDC78/TANA77.
  • ePUMA compiler backend (several topics), 30hp
    Task description: ePUMA is a large joint research project with the division of computer architecture at ISY. This project develops a novel embedded chip multiprocessor design optimized for the DSP and multimedia domain; the processor will be used for future radio base station, radar, and HDTV applications.
    We offer several final year projects in the software toolchain of the ePUMA project; for instance, designing a LLVM compiler back-end for either the host processor or the SIMD cores.
    The projects will be mainly carried out at the computer architecture division at ISY but co-supervised and examinated by me.
    The benefits students can get for these projects: You will be paid (after successful completion of the project) according to your contributions. You will learn the methods of design, verification, and implementation of compilers for embedded processors.
    Prerequisites: Good knowledge of compilers (e.g., TDDB44, TDDD16 or TDDC86) and hardware (e.g., you should have taken TSEA44 or an equivalent course, and it is an advantage if you have taken the course TSEA26.)
    Contact: Joar Sohl (joar \at isy.liu.se) and Erik Hansson.
  • Bulk-synchronous parallel programming with NestStep on GPGPU (30hp)
    NestStep is a C-based PGAS (partitioned global address space) language for bulk-synchronous parallel (BSP) programming, a parallel programming model that leads to code that is well-structured, deadlock-free, and relatively easy to analyze for performance.
    In this project, you will port the existing NestStep frontend and run-time system (which exists today for Linux clusters and Cell BE) to a GPGPU system, using either CUDA or OpenCL as technical platform.
    Prerequisites: TDDC78/TANA77 Programming of parallel computers or similar course, with completed lab series. TDDB44 Compiler construction or similar course. Good programming skills in C/C++, and in Java.
  • Source-to-source Translator from Fork to CUDA (30hp)
    Modern graphics processing units (GPUs) such as those produced by NVIDIA and AMD/ATI offer massive computing power for data parallel computations with hundreds of parallel threads. At the same time, synchronization is fast.
    These are actually properties that are characteristic for the classical PRAM (Parallel Random Access Machine) model of parallel computation (see e.g. the book Practical PRAM Programming).
    A previous thesis project in Germany described how how the classical PRAM model of parallel execution can be mapped to CUDA GPUs and how especially the PRAM programming language Fork (or a subset of it) could be mapped to CUDA.
    This project will retarget the existing Fork compiler to generate code in CUDA, the current programming platform for modern NVIDIA GPUs, and develop optimizations in the translation process to improve performance.
    Prerequisites: Programming in C, Reading German language, Compiler construction (e.g. TDDB44, TDDD16, TDDC86), Programming parallel computers (e.g. TDDC78).
  • Integrated Code Generation in the LLVM Compiler System (30hp)
    LLVM is a modern open-source compiler framework. This project will explore how to apply and implement our OPTIMIST approach for integrated code generation in the LLVM compiler system. Further information on request.
    Prerequisites: Required: TDDB44 Compiler Construction or similar course. Recommended: TDDC86 Compiler Optimizations and Code Generation.

Further thesis projects in compiler technology and parallel programming
on request (chrke at ida.liu.se).

Back to my master thesis students page

More thesis projects at PELAB



Responsible for this page: Christoph Kessler, IDA

Page responsible: Webmaster
Last updated: 2012-10-09