Hide menu

Tuning large language models to generate diverse and non-flaky software test-cases

Large-scale software engineering uses continuous practices to accomplish a fast and smooth integration and deployment of software changes. A vital part of this is automated testing for quality assurance of the software. If this part fails, there will be large delays and queues of software changes. A common impediment to continuous practices is so called flaky tests, that is tests that change their verdict from pass to fail in an indeterministic way even if there are no changes made to the software to be tested. In previous research we have used machine learning to detect flaky tests, and in the proposed project we want to adapt a large language model to generate tests that

  • have low probability of becoming flaky,
  • have low similarity to existing tests, and
  • have a high probability to cover the parts of the software that were recently changed.

Participants:

Kristian Sandahl

Azeem Ahmad

Xin Sun