Open thesis projects at PELAB by Daniel Varro

Internal Thesis Projects

Code smell detection in machine learning programs (30hp)

Code smells and anti-patterns identify poor solutions to recurring design problems in complex software systems. While such solutions do not necessarily contain bugs, their presence significantly deteriorates various software quality metrics such as maintainability. As many machine learning (ML) applications are developed by programming non-experts, ML programs are probably even more susceptible to code smells. Recently, an extensive catalogue of code smells have been developed by researchers ( https://arxiv.org/pdf/2203.13746.pdf). The main objective of the thesis is to develop efficient static source code analysis techniques to automatically detect such code smells in ML programs.

This is a research-intensive master's thesis project (which is the dominant style of MSc thesis projects in US/Canadian universities). If you reach major progress during your thesis, you will be encouraged and supported to write a joint research paper and submit it to a leading conference in software engineering, and then present the paper, if accepted.

Prerequisites: advanced programming skills in Python and/or Java, past experience in developing machine learning programs, foundations of compiler construction (especially, parsers), any experience in program analysis is advantageous, but not required.

Contact: Prof. Daniel Varro

A verification benchmark for evaluating scene graph generation approaches (30hp)

Scene graph generation (SGG) is a common challenge in vision-based machine learning components used in autonomous systems. SGG takes a camera image as input and derives a scene graph with nodes representing relevant objects (with their attributes) and edges capturing key relations between objects. ML approaches developed for SGG often use public benchmarks with ground truth scene graphs (such as CLEVR, CLEVR-XAI) for evaluating their effectiveness. Existing benchmarks primarily focus on providing data to train SGG components, but the use of such realistic (in-distribution) datasets is insufficient from a safety assurance perspective. The main objective of this thesis is to develop a verification benchmark for the evaluating SGG approaches, by automatically synthesizing images for the dataset from a semantically diverse set of scene graphs as ground truth.

This is a research-intensive master's thesis project. It may involve international collaboration with graduate students at McGill University (Canada). If you reach major progress during your thesis, you will be encouraged and supported to write a joint research paper and submit it to a leading conference in software engineering, and then present the paper, if accepted.

Prerequisites: programming skills in Python and/or Java, experience in machine learning, past experience in rendering frameworks (e.g. Blender) or simulators (like Carla or iGibson) is a plus (but not required).

Contact: Prof. Daniel Varro

Page responsible: Kristian Sandahl
Last updated: 2022-10-18

IDA - Department of Computer and Information Science

Open thesis projects at PELAB by Daniel Varro

Internal Thesis Projects

Code smell detection in machine learning programs (30hp)

A verification benchmark for evaluating scene graph generation approaches (30hp)