Note:
Most projects in this list require a solid background in either compiler construction or parallel programming (some both); at least one major course (preferably at master level including programming labs) in these areas should be passed successfully. Specific prerequisites are listed below. Note to non-LIU students (FAQ): If you want to do a thesis project with us, you must be registered on a master (or bachelor) program at Linköping University. It is generally not possible to do such projects remotely. |
[TAKEN (F.B.)] SkePU backend for CUDA tensor cores (30hp)
Background:
High-level parallel programming aims to abstract challenging aspects of
parallel and heterogeneous systems for non-expert programmers. Algorithmic
skeletons is an interface approach based on computational patterns, such
as map, reduce, and stencil operations. These patterns can be instantiated
by providing a custom operator ("user-function"), which is then applied to
a supplied dataset in parallel according to the particular pattern semantics.
Skeleton programming framworks and libraries such as SkePU implement
skeletons as C++ constructs and provides "backends" for parallelism in
multi-core CPUs, GPU accelerators, and multi-node clusters. The skeletons
are typically provided as libraries, or in the case of SkePU, as a framework
with both library and a custom compiler toolchain.
The SkePU library is implemented in modern C++ and involves template metaprogramming.
SkePU is a long-term open-source effort at PELAB, Linköping University.
This project will explore AI accelerator architectures as new SkePU targets,
such as Google TPU, the NPU, and Nvidia Tensor Cores. Implementation and experimentation
work will be limited to one of these platforms, namely, Tensor Cores.
Task:
This thesis project will explore several different parallel AI accelerators,
identify methods for efficiently mapping BLAS and CNN operations to these,
and develop a new SkePU backend for Nvidia tensor cores
and evaluate its performance. The possibilities and performance implications of hybrid computing
involving both CUDA and tensor cores in parallel [Ho et al. 2022]
should be investigated, too, possibly taking inspiration from earlier work demonstrating
hybrid computing on CPU and CUDA cores [Öhberg et al. 2020].
Prerequisites: TDDD56 Multicore and GPU Programming (mandatory),
Advanced Programming in C++ (mandatory), TDDE65 Programming of parallel computers (recommended). Linux programming skills.
Contact:
August Ernstsson,
Christoph Kessler
High-level program optimization in the SkePU precompiler (30hp)
Background:
High-level parallel programming aims to abstract challenging aspects of
parallel and heterogeneous systems for non-expert programmers. Algorithmic
skeletons is an interface approach based on computational patterns, such
as map, reduce, and stencil operations. These patterns can be instantiated
by providing a custom operator ("user-function"), which is then applied to
a supplied dataset in parallel according to the particular pattern semantics.
Skeleton programming framworks and libraries such as SkePU implement
skeletons as C++ constructs and provides "backends" for parallelism in
multi-core CPUs, GPU accelerators, and multi-node clusters. The skeletons
are typically provided as libraries, or in the case of SkePU, as a framework
with both library and a custom pre-compiler toolchain.
The pre-compiler is a source-to-source compiler based on LLVM clang.
It performs a rather light-weight source code transformation; in particular,
generates platform-specific code variants from the user-functions that
can be used with the different SkePU back-ends.
SkePU is a long-term open-source effort at PELAB, Linköping University.
Task:
In this project, the pre-compiler will be extended by more advanced
code transformations. For example, pattern-matching on the clang intermediate
program representation can be applied to rewrite identified code structures
into equivalent ones that are better supported in SkePU or in platform-specific libraries.
Please contact us directly for further information.
Prerequisites: TDDB44 Compiler Construction (mandatory),
TDDD56 Multicore and GPU Programming (mandatory),
Advanced Programming in C++ (recommended). Programming skills in Linux.
Some background in artificial neural networks (DNN, CNN, ...) can be useful.
Contact:
August Ernstsson,
Christoph Kessler
Editor integration with source code analysis and debugging for the SkePU high-level parallel programming framework (30hp or 2x30hp)
Background:
High-level parallel programming aims to abstract challenging aspects of
parallel and heterogeneous systems for non-expert programmers. Algorithmic
skeletons is an interface approach based on computational patterns, such
as map, reduce, and stencil operations. These patterns can be instantiated
by providing a custom operator ("user-function"), which is then applied to
a supplied dataset in parallel according to the particular pattern semantics.
Skeleton programming framworks and libraries such as SkePU implement
skeletons as C++ constructs and provides "backends" for parallelism in
multi-core CPUs, GPU accelerators, and multi-node clusters. The skeletons
are typically provided as libraries, or in the case of SkePU, as a framework
with both library and a custom compiler toolchain. In effect, SkePU forms a
skeleton programming language "embedded" in C++. While mostly C++-compatible,
a SkePU program (when executing in a parallel context) introduces additional
rules and semantics for certain programming constructs, in
particular the user-functions. If the programmer violates these requirements,
the result is a run-time fault such as aborted execution or non-deterministic output.
In contrast, errors in the source code syntax will result in compile-time faults.
However, as SkePU's library component is implemented as a header-only template
metaprogramming library, compiler errors tend to be very long,
deeply nested and with unintelligible implementation details exposed
to the high-level user.
Task: In this project, we aim to develop source-code editor integration for the
SkePU framework. The programming environment shall be aware of fundamental
SkePU constructs such as skeletons, user-functions, and smart data-containers.
This integration is intended to help the programmer to
write correct source code from the start (e.g. by providing code completion)
as well as to simplify debugging of already written code.
The main candidate approach for implementing the editor integration
is by conforming to the Language Server Protocol (LSP). LSP is an
open source JSON-based message specification for communication between
IDEs (or other editors) and "language servers", separate binaries or libraries
providing language-specific information to the editor about the files being
processed. Using LSP for this project has two main benefits:
1. The resulting implementation is open and editor-agnostic.
2. An LSP server is available in LLVM/clang by the clangd project.
SkePU's precompiler is already based on LLVM/clang, and may be
possible to integrate with clangd.
Prerequisites:
Mandatory: Advanced C++ programming; Compiler construction fundamentals; Basic understanding of parallel programming concepts.
Useful: Experience with Linux, LLVM, JSON, CUDA, OpenCL.
Contact August Ernstsson
or Christoph Kessler
for further information on this project.
Nested Parallelism in Algorithmic Skeleton Programming Frameworks (30hp or 2x30hp)
Background:
High-level parallel programming aims to abstract challening aspects of parallel
and heterogeneous systems for non-expert programmers. Algorithmic skeletons
is an interface approach based on computational patterns, such
as map, reduce, and stencil operations. These patterns can be instantiated
by providing a custom operator ("user-function"), which is then applied to
a supplied dataset in parallel according to the particular pattern semantics.
Skeleton programming framworks and libraries such as SkePU and
Muesli implement skeletons as C++ constructs and provides "backends"
for parallelism in multi-core CPUs, GPU accelerators, and multi-node clus-
ters. This can result in a very high degree of available parallelism in the
target system. For simple programs, the skeleton abstraction works well and
can utilize the parallelism expressedciently with very few lines of code. However,
with more complex applications the choice of the right skeleton patterns to
use can be differentcult, and sometimes there are no suitable patterns available
in the provided skeleton set.
Task: This project aims to extend the skeleton abstraction in SkePU and/or
Muesli with multi-level or "nested" parallelism. The goal is to investigate
whether the option to invoke new skeleton patterns from within a user-function
can improve parallelization efficiency, programmer productivity, or
both, and in which type(s) of applications this feature is advantageous.
The execution context outside and within a skeleton/user-function dif-
fer greatly in the implementation of SkePU, which makes the addition of
nested parallelism nontrivial. There are open questions regarding the syntax
of nested skeleton calls, whether the set of available skeleton calls should
be restricted for nested calls (likely to be the case), and how allocation of
resources is acted by the introduction of nested parallelism.
The aim is to incur no overhead from this feature when it is not used, and minimal
overhead also for programs using nested parallelism.
(It is therefore not advised to dynamically allocate resources during execution of a nested
skeleton. Heuristics, static analysis, or other tools could be used to predict
a sufficient amount of resources beforehand.)
International collaboration:
This project can, depending on the time frame, be conducted in collaboration
with researchers from the University of Münster, Germany.
Prerequisites:
Mandatory: Advanced C/C++ programming; Good understanding of parallel programming concepts;
Basic GPU programming with CUDA and/or OpenCL.
Useful: Prior experience with SkePU, e.g. through the TDDD56 lab series.
Contact August Ernstsson
or Christoph Kessler
for further information on this project.
[taken]Software Testing Methodology and Framework for High-Level Parallel Programs (30hp, 2x30hp, or 16hp)
SkePU is an open-source programming framework for portable, high-level, single-source programming of heterogeneous parallel computer systems, such as systems with GPU-accelerated multicore CPUs.
In SkePU programs, parallelism is expressed using so-called (algorithmic) skeletons,
which are generic, high-level programming constructs derived from higher-order functions
such as map, reduce, scan, stencil etc., that can be instantiated by customization
in problem-specific sequential code, and for which efficient parallel and
accelerator-specific implementations are provided. SkePU programs look like
well-structured sequential C++ code; instantiated skeletons can
be invoked like any manually written C++ function, but inherit all parallel implementations
from the different generic parallel implementations (also known as back-ends) of the skeleton.
Different from most other high-level parallel programming frameworks,
the SkePU skeletons are variadic (can take any number of data-container operands)
and polymorphic in both operand shape
(accepting data-container operands of any shape, i.e., vectors, matrices,
tensors) and element type. In addition, many skeletons can also be configured
to specialize their behavior.
Hence, many possible such combinations may occur in practice.
However, only a few of these combinations are currently actually tested for.
For SkePU development, it is nevertheless desirable to automatically check that after
changes made to a specific data-container shape or a specific skeleton type, SkePU still works
consistently across all/many possible combinations.
A possible approach to automatizing this is fuzz-testing.
This thesis project will develop a methodology for systematically generating test cases for SkePU
programs and, depending on scope, also realize distributed parallel testing on GPU clusters.
The project scope and depth can be configured to match a 16hp, 30hp or 2x30hp project.
This is a research-oriented project. If the result looks publishable,
we will encourage you to jointly write and submit a research paper to
a conference and support your presentation.
Prerequisites: Multithreaded (OpenMP) and GPU (CUDA, OpenCL) programming
(e.g. TDDD56), advanced C++ programming skills;
good background in software engineering, esp. software testing. Linux.
Contact:
Christoph Kessler.
[RESERVED (V.E.)] Skeleton computing as a service (30hp)
Background:
High-level parallel programming aims to abstract challenging aspects of
parallel and heterogeneous systems for non-expert programmers. Algorithmic
skeletons is an interface approach based on computational patterns, such
as map, reduce, and stencil operations. These patterns can be instantiated
by providing a custom operator ("user-function"), which is then applied to
a supplied dataset in parallel according to the particular pattern semantics.
Skeleton programming frameworks and libraries such as SkePU implement
skeletons as C++ constructs and provide "backends" for parallelism in
multi-core CPUs, GPU accelerators, and multi-node clusters.
The skeletons are typically provided as libraries, or in the case of SkePU, as a framework
with both library and a custom compiler toolchain.
The SkePU library is implemented in modern C++ and involves template metaprogramming.
SkePU is a long-term open-source effort at PELAB, Linköping University.
Task:
This thesis project will develop a method for setting up SkePU skeleton instantiations
and computations as microservices on heterogeneous parallel computing resources
in the cloud or in edge computing resources for portable remote execution.
This includes the specification and generation of efficient interfaces and
efficient operand data transfer,
the remote deployment of a SkePU microservice with skeleton instantiation and
invocation mechanisms, and the evaluation of the implementation for performance,
portability, ease of use, and for security weaknesses.
The project should also elaborate on suitable (remotely verifiable) restrictions
on user functions to be used with such services to avoid security loopholes,
and implement a simple rule-based source code checker for user functions
to statically verify absence of "dangerous" constructs, or at least
avoid known attack patterns with high probability.
Inspiration for the service implementation can be taken from CORBA
and subsequent component-based frameworks, from MapReduce and Spark,
and from a recent master thesis project that extended
SkePU for execution of stream-parallel applications in distributed systems.
An experimental testbed with a number of Raspberry Pi units
and GPU-accelerated servers is available for the evaluation.
Prerequisites: TDDD56 Multicore and GPU Programming (mandatory),
TDDD25 Distributed Systems (mandatory), Advanced Programming in C++ (mandatory), Linux, operating systems, network programming.
Contact:
Christoph Kessler,
August Ernstsson.
Parallel I/O for skeleton programs in SkePU (30hp or 16hp)
The C++ based portable skeleton programming framework
SkePU for heterogeneous
multicore systems and clusters is designed to work on data types
usually residing in main memory, so-called data-containers.
This thesis project will investigate how SkePU data-containers can
efficiently interface with the Hadoop Distributed File System HDFS
in order to provide distributed parallel I/O on large distributed
files. The solution will be prototypically implemented in the
open-source SkePU framework and
evaluated with several simple big-data analytics computations.
The project can be configured for Master or Bachelor thesis level.
Prerequisites: Advanced C++ (esp., template metaprogramming),
Big-Data Analytics and/or parallel programming courses,
some familiarity with Linux, git, cmake, HDFS.
Contact:
Christoph Kessler.
Further thesis projects for LiU-based students with interest in compiler technology and/or parallel programming are available on request, please contact me.
Back to my master thesis students page
Responsible for this page: Christoph Kessler, IDA