Hide menu
Teaching in IDA courses

Available Thesis Projects

Below are some suggested topics for master's thesis projects. Don't hesitate to contact me if you are interested!

Generating a fuzzing evaluation suite

Fuzzing is a technique for testing a program's resillience to malformed inputs using semi-random input generation. In recent years, fuzzing has become one of the most important means of discovering security vulnerabilities in software. Consequently, there is a very active research effort to develop more powerful fuzzers. However, a widely recognized impediment to fuzzing research is the lack of high-quality test suites for evaluating the effectiveness of fuzzers. Such a test suite could be used to evaluate how well one fuzzer fares compared to other fuzzers in terms of bug-finding effectiveness. While it is trivial to just use an old known-buggy version of a program for evaluation, the lack of ground truth means that there is no automatic way to know how many unique bugs were triggered by a fuzzer when it finds some crashing inputs. (For example, even if a fuzzer finds 100 unique inputs that crash a piece of software, these crashes could all be due to the same bug.) This project aims to remedy the problem by automatically creating a set of deliberately vulnerable versions of a program, where each version contains only one known vulnerability. This way, a fuzz testing input that crashes the program can easily be triaged to find out which unique bug it triggers, and accurate statistics on a fuzzer's ability to find unique bugs could be gathered. The goal of the project would be to devise a method that can, in a controlled way, re-introduce a specific bug from a previous version into a newer version of a program.

Prerequisites: Good programming skills, experience in software development on Linux, experience with C/C++ programming. Some basic familiarity with fuzzing is recommended (e.g. from TDDC90). Having taken a course on compiler construction is also recommended.

Detecting anti-reverse engineering (disassembly desynchronization) using deep learning

One challenge when analyzing malicious code is that malware authors often use anti-reverse engineering tricks to hinder analysis of malicious binaries. One commonly used technique is disassembly desynchronization, which works by inserting special code constructs that confuse disassemblers to try to disassemble junk data that is intersperesed with the actual machine code. This "derails" the disassembly process, so that an incorrect disassembly is produced for a region of code. As part of a research project on using machine learning to detect disassembly desyncronization, we have two proposed topics for master's theses. In both projects you would be working together with me (Ulf) and a PhD student that does research on machine learning for malware defense.

  1. An important first step towards our research goal is to create accurate ground truth that can be used for training and evaluating the machine learning model. To this end, the aim of this project is to design and implement a tool that can generate Linux binaries obfuscated with disassembly desynchronization at known locations in the code. The proposed approach for achieving this is to implement a tool that applies disassembly desynchronization at the intermediate assembly-code level during compilation from, for example, C code. The project entails selecting suitable software libraries to base the implementation on, designing and implementing the algorithm to apply disassembly desynchronization, and evaluating the correctness and effectiveness of the prototype tool.

  2. The aim of this project is to train a machine learning system to automatically detect incorrectly disassembled regions of code, so that the disassembly can be repaired. The project entails studying in-the-wild techniques for disassembly desynchronization, generating and compiling training data for the machine learning system, and designing and evaluating the machine learning approach toghether with your supervisors.

Prerequisites: Good programming skills, some knowledge of assembly-level programming. Some familiarity with machine learning is highly recommended for the second project.

Page responsible: Ulf Kargen
Last updated: 2020-03-30