Responsible for this page: Webmaster, webmaster@ida.liu.se
Page last updated: 2012-01-20
LiU » IDA » SaS » PELAB » Research group on compiler technology and parallel computing » Thesis projects


A - Z | More search functions

[ Go to content ]
Go to LiU.se



PELAB

OPEN MASTER THESIS PROJECTS

Research Group on Compiler Technology and Parallel Computing

Christoph Kessler

The following master thesis projects are currently available in my group:

Note: All projects on this page require a solid background in either compiler construction or parallel programming (preferably both); at least one major course (preferably at master level including programming labs) in these areas should be passed successfully.

Note to non-LIU students: If you want to do a thesis project with us, you must be registered on a master (or bachelor) program at Linköping university. It is generally not possible to do such projects remotely.

  • Automatic translation between parallel programming languages (30 ECTS)
    Today there exist several C-based parallel programming languages which target different parallel processor architectures. Even though many of these architectures have the same programming model, usually each architecture has to be programmed in a (more or less) specific language. These languages tend to have different levels of abstraction, and need to be supported by (vendor) specific tool-chains.
    The outcome of the project will be a tool that translates program source code from a (high-level) C-based parallel language, to one or several different, possibly domain-specific, C based parallel target languages.
    The work includes among other things:
    - Select the target architectures and source/target languages.
    - Extend an existing compiler front-end's intermediate representation (IR) to support both the source language and the target language.
    - Write transformation passes.
    - Select test cases.
    - Perform evaluations.
    Prerequisites: Compiler Construction or Compilers and Interpreters or similar course. Good programming skills in C and Java. Some understanding of multicore programming.
    Contact for further information: Erik Hansson or Christoph Kessler.
  • Communication Sensitive Load Balancing of Irregular MPI Applications on Clusters (30hp)
    Message Passing Interface (MPI) is a standardized and portable message-passing system to function on a wide variety of parallel computers with distributed memory architectures. In many parallel applications from scientific computing, MPI is used to parallelize time consuming sequential code to decrease the overall execution time. In an MPI application, different MPI processes execute the same code on different data (SPMD style) on different machines or nodes (each containing several cores) and exchange information with each other using message passing. Most of the time, the application (or computation) is irregular such that after partitioning, orchestration, communication and mapping, different MPI processes get different work loads, causing the load on different nodes to imbalance. Furthermore, increased communication of MPI processes across nodes increases the overall execution time. The MPI standard provides a way to give hints to the underlying MPI system for efficiently mapping the MPI processes to the nodes such that those MPI processes that communicate more are mapped to the same node. But it does not support the mechanism to guide the mapping of MPI processes to nodes according to computational load.
    The goal of this thesis is to design, implement and evaluate a communication sensitive load balancing algorithm. Some graph based, evolutionary algorithms or heuristics based on swarm intelligence can also be considered. The algorithm will be evaluated using different benchmark applications on supercomputers.
    The Master thesis project covers the following tasks:
    - Research survey of related work.
    - Design and implementation of algorithm in C/C++ on NSC supercomputing center.
    - Evaluation on different benchmark applications.
    - Documentation of the results in thesis report.
    Begin: ASAP.
    Prerequisites: Course in programming of parallel computers (TDDC78 or equivalent). Good background in C/C++, algorithms, Linux.
    Contact: Christoph Kessler, Mudassar Majeed (mudassar.majeed@liu.se)
  • Multicore-Programmering med OpenCL (30hp)
    Project description by Syntronic AB (PDF)
    Prerequisites: TDDD56 Multicore and GPU Programming or TDDC78 Programming of parallel computers, and TDDB68 Concurrent programming and operating systems (or equivalent courses), with completed labs. Good programming skills in C. Good commandment of Swedish language in written and oral communication.
    Further information and application: Åsa Detterfelt, asa.detterfelt (at) telia.com, 013-25 05 01.
    Examinator: Christoph Kessler, IDA
  • Generating target specific code for automatically detected algorithmic patterns in C source programs (30 ECTS)
    The goal of this project is to combine and extend an already existing tool for automatic pattern recognition in C code with one or several code generators for specific target back-ends.
    Motivation of using patterns: Using patterns to describe programs has three main goals. The first one is that given pattern instance combinations can easily be mapped to combinations of kernel implementations for a given architecture; this yields a high level of reuse of kernel implementations. Secondly the patterns can be seen as a high level programming abstraction for the architecture, leading to a component-based programming style. The third goal is to automatically categorizing legacy C program parts as occurrences of patterns, using pattern matching techniques on existing source code to provide an automated migration path and improved portability.
    Project work description
    A prototype of a pattern recognition tool has already been developed in an earlier project. Now it is your task is to combine this tool with code generators to be a able to generate target specific code. This work includes among other things:
    - Extend the set of already existing patterns to raise the recognition rate.
    - Select target architectures, both high-level such as OpenMP, Posix threads, etc and low level such as different hardware architectures, such as GPUs. It can also involve runtime systems such as StarPU.
    - Implement the code generators for the selected architectures.
    - Introduce performance aware target components.
    - Test and evaluate.
    (- If the result is satisfactory: write, submit and possibly present a scientific paper about it at a scientific workshop or conference.)
    Prerequisites: TDDB44 Compiler construction or similar course. Good programming skills in C and Java. Computer architecture course. Course in component based software. Parallel programming course.
    Since this is a cross domain project work you will probably be assigned two supervisors (handledare).
    Contact for further information:
    Supervisor Erik Hansson or examiner Christoph Kessler (christoph.kessler (at) liu.se)
  • Efficient sorting on the ePUMA Multicore DSP Processor (30hp)
    Sorting is a classical problem in computer science and often used as a benchmark for evaluating the performance of novel computer architectures.
    The goal of this project is to develop, implement and evaluate efficient sorting kernels for the novel ePUMA processor, an energy-efficient multicore DSP processor for emerging applications in telecommunication and multimedia, which is being developed in a research project at Linköping University.
    We will start from two classical sorting algorithms, mergesort and radix sort, for integer and fixed-point sorting, first at the local (intra-core) level and then integrated with inter-core parallelism and off-chip memory access. The project will find out which algorithm(s) is/are more suitable for this architecture and how they can possibly be combined and adapted to exploit this unique architecture effectively.
    This is a research-oriented project. If successful, we will support you in writing a conference paper and, if accepted, presenting it.
    Prerequisites: Parallel programming (e.g., TDDC78), computer architecture. Experience in C and assembler-level programming. Data structures and algorithms.
    Contact: Erik Hansson, Christoph Kessler
  • Extension of the SkePU library for portable CPU/GPU programming (30hp)
    This thesis is about extending the functionality of the SkePU library for high-level, portable programming of GPU-based systems, which was developed in an earlier Master thesis project. Support for different combinations of data parallel skeletons for sparse 2D (Matrix) data structures will be a major contribution of this Master thesis project. Furthermore, support for some task parallel skeletons (Farm, Pipe) can be considered. Further information is available on request, see the contact information below.
    The library is developed in C++, OpenMP, and has implementations for CUDA and OpenCL. The prerequisites for this Master thesis project are good C++ programming skills and knowledge of GPU and parallel programming (e.g., TDDD56 and TDDC78).
    This is a research oriented project.
    Contact: Usman Dastgeer, Christoph Kessler.
  • [prel. taken] Performance Modeling for CUDA Applications (30hp)
    Modern programmable graphics processors (GPUs) offer great speedup for data-parallel computations by providing thousands of threads and low thread overhead. However, tuning an application for these unconventional architectures is still a black art to a major extent.
    This thesis project is about investigating the main factors (including micro-architectural and memory locality features) that affect the performance of running applications on GPUs. By understanding this, it will be possible to devise a performance modeling framework to predict an application's performance based on certain application and device characteristics. This could then be used to (re)configure a GPU program before execution in order to automatically tune its performance.
    This is a research oriented project. If the result looks publishable, we will encourage you to jointly write and submit a research paper to a conference and sponsor presentation if accepted.
    Further information is available on request.
    Prerequisites: Programming in C, parallel programming (TDDC78/TANA77), computer architecture and basic compiler knowledge.
    Contact: Usman Dastgeer, Christoph Kessler
  • Implementation and Performance Investigation for a Motion Estimation Algorithm on a GPU (30hp)
    The development cycles for hard- and software products will become shorter and shorter. The product introduction time shows a strong correlation to the success of the product itself. Nevertheless, in the area of high resolution video processing it is required to implement complex video algorithms, which operate on a huge date set. Simulations with general purpose CPUs are time consuming and hardware solutions are not possible in an early definition phase of the algorithm. There are however some new generations of processors on the market, which at least claim to be able to process complex video algorithms in real time.
    The goal of this master thesis shall be to implement an existing block-matching algorithm for motion estimation between two video images on a modern GPU. Motion estimation algorithms are for example used in 120/240 Hz for real time frame rate conversion or in 3D TV applications to estimate the disparity between the stereo 3D signals or to convert the S-3D signals to Autostereo-3D signals.
    The target is to reach real-time performance for the execution time of the algorithm. If necessary the algorithm has to be modified for that purpose. Furthermore the performance shall be evaluated e.g. in dependence on the video sequence resolution, frame rate or bit width.
    The Master thesis project covers the following tasks:
    - Study of block matching algorithm for motion estimation
    - Programming in C/C++ on GPU (NVIDIA Fermi)
    - Implementation and performance investigations of an algorithm
    - Documentation of the results
    Working Environment: PC, NVIDIA-GPU (Cuda or OpenCL)
    Begin: asap.
    Prerequisites: Course in programming of parallel computers (TDDC78 or equivalent). Good background in image processing. C/C++, Linux.
    Contact: Christoph Kessler.
    The thesis project will be co-supervised by:
    Markus Schu, 3D Impact Media, Munich, Germany.

  • [taken] Optimized composition of parallel components on a Linux Cluster (30hp)
    Further information on request.
    Prerequisites: Programming in C, Linux, some background in parallel programming in MPI, e.g. TDDC78/TANA77.

  • Optimized composition of parallel components on a multicore server (30hp)
    Further information on request.
    Prerequisites: Good programming skills in C, Linux; Concurrent programming and operating systems e.g. TDDB68, some background in parallel programming with pthreads or OpenMP, e.g. TDDC78/TANA77.
  • ePUMA compiler backend (several topics), 30hp
    Task description: ePUMA is a large joint research project with the division of computer architecture at ISY. This project develops a novel embedded chip multiprocessor design optimized for the DSP and multimedia domain; the processor will be used for future radio base station, radar, and HDTV applications.
    We offer several final year projects in the software toolchain of the ePUMA project; for instance, designing a LLVM compiler back-end for either the host processor or the SIMD cores.
    The projects will be mainly carried out at the computer architecture division at ISY but co-supervised and examinated by me.
    The benefits students can get for these projects: You will be paid (after successful completion of the project) according to your contributions. You will learn the methods of design, verification, and implementation of compilers for embedded processors.
    Prerequisites: Good knowledge of compilers (e.g., TDDB44, TDDD16 or TDDC86) and hardware (e.g., you should have taken TSEA44 or an equivalent course, and it is an advantage if you have taken the course TSEA26.)
    Contact: Joar Sohl (joar \at isy.liu.se) and Erik Hansson.
  • Bulk-synchronous parallel programming with NestStep on GPGPU (30hp)
    NestStep is a C-based PGAS (partitioned global address space) language for bulk-synchronous parallel (BSP) programming, a parallel programming model that leads to code that is well-structured, deadlock-free, and relatively easy to analyze for performance.
    In this project, you will port the existing NestStep frontend and run-time system (which exists today for Linux clusters and Cell BE) to a GPGPU system, using either CUDA or OpenCL as technical platform.
    Prerequisites: TDDC78/TANA77 Programming of parallel computers or similar course, with completed lab series. TDDB44 Compiler construction or similar course. Good programming skills in C/C++, and in Java.
  • Source-to-source Translator from Fork to CUDA (30hp)
    Modern graphics processing units (GPUs) such as those produced by NVIDIA and AMD/ATI offer massive computing power for data parallel computations with hundreds of parallel threads. At the same time, synchronization is fast.
    These are actually properties that are characteristic for the classical PRAM (Parallel Random Access Machine) model of parallel computation (see e.g. the book Practical PRAM Programming).
    A previous thesis project in Germany described how how the classical PRAM model of parallel execution can be mapped to CUDA GPUs and how especially the PRAM programming language Fork (or a subset of it) could be mapped to CUDA.
    This project will retarget the existing Fork compiler to generate code in CUDA, the current programming platform for modern NVIDIA GPUs, and develop optimizations in the translation process to improve performance.
    Prerequisites: Programming in C, Reading German language, Compiler construction (e.g. TDDB44, TDDD16, TDDC86), Programming parallel computers (e.g. TDDC78).
  • Parallel PRAM simulation with thread migration on a Multicore server (30 hp)
    pramsim is a Solaris/Linux-based simulator for the SB-PRAM, a parallel computer with the strongest parallel programming model ever realized, a so-called Combining CRCW-PRAM. The PRAM model postulates that a set of of processors run synchronously at the level of assembler instructions. They have uniform access to a shared memory, and in addition each can access a local memory. The differentiation between both memories is by the most significant address bit. The shared memory can be accessed also by combining instructions such as global reductions and multiprefix operations. The instruction memory is separate and local to each processor. More details about the SB-PRAM and pramsim can be found in Keller, Kessler, Träff: Practical PRAM Programming, Wiley, 2001. Today, pramsim is used e.g. for teaching and research on parallel algorithms.
    Simulating a parallel computer on a sequential one results in very long simulation times, and naturally leads to the idea to run pramsim on a parallel computer. Various approaches for parallelization on shared-memory and distributed-memory systems such as Linux clusters have been proposed in the literature, those for shared memory systems look most promising, but the problem is challenging. In this project, an alternative approach based on thread migration should be implemented, evaluated, and compared to the previous approaches.
    Prerequisites: TDDC78 Programming of parallel computers, or equivalent background in parallel computing. Good programming skills in C. Some knowledge of Linux.

  • Integrated Code Generation in the LLVM Compiler System (30hp)
    LLVM is a modern open-source compiler framework. This project will explore how to apply and implement our OPTIMIST approach for integrated code generation in the LLVM compiler system. Further information on request.
    Prerequisites: Required: TDDB44 Compiler Construction or similar course. Recommended: TDDC86 Compiler Optimizations and Code Generation.

Further thesis projects in compiler technology and parallel programming
on request (chrke at ida.liu.se).

Back to my master thesis students page

More thesis projects at PELAB



Responsible for this page: Christoph Kessler, IDA