Large Language Models for Software Engineering

2025HT

Status	Cancelled
School	IDA-gemensam (IDA)
Division	PELAB
Owner	José Antonio Hernández López


	Log in

Course plan

No of lectures

5 lectures, 4 lab sessions, 3 seminars

Recommended for

All PhD students interested in applying LLMs to software engineering. The course may be also interesting to PhD students interested in applying LLMs to other areas of computer science or engineering.

The course was last given

Goals

The aim of the course is to give the student a basic overview of how LLMs work and how they are tuned to perform software engineering tasks.

Prerequisites

- Good knowledge of Python. All the assignments will be in Python.
- Basics in probability, statistics, and algebra.
- Foundations of Machine Learning: one previous introductory MSc level course taken on Machine Learning.

Organization

Lectures, labs, student presentations, and discussions.

Content

- Natural language processing and deep learning basics: neural networks and transformers architectures.
- Pre-training and fine-tuning for Software Engineering: CodeBERT.
- The first benchmark for code intelligence: CodeXGlue.
- The foundational large language models (LLMs) of code: StarCoder, GitHub Copilot, etc.
- From foundational models to code assistants: ChatGPT, WizardCoder, etc.
- State of the art benchmarks for LLMs of code.
- LLM agents for Software Engineering.

Literature

- Tunstall, L., Von Werra, L., & Wolf, T. (2022). Natural language processing with transformers. " O'Reilly Media, Inc.".
- Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., ... & Zhou, M. (2020).
- Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155.
- Lu, S., Guo, D., Ren, S., Huang, J., Svyatkovskiy, A., Blanco, A., ... & Liu, S. (2021). Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664.
- Li, R., Allal, L. B., Zi, Y., Muennighoff, N., Kocetkov, D., Mou, C., ... & de Vries, H. (2023). Starcoder: may the source be with you!. arXiv preprint arXiv:2305.06161.
- Auffarth, B. (2023). Generative AI with LangChain. Packt Publishing.

Lectures

- Lecture: Introduction and part of the background
- Lecture: Neural networks and transformers architectures.
- Lab session/tutorial

- Lecture: The first code LLM, benchmark, and metrics; CodeBERT and CodeXGlue
- Lab session/tutorial

- Lecture: Foundational code models and code assistants; GitHub Copilot StarCoder, WizardCoder, etc.
- Lab session/tutorial

- LLM agents for Software engineering
- Lab session/tutorial

- 3 seminar sessions for student presentations and discussions

Examination

3 assignments after the lab sessions.

1 reading assignment: select and read one LLM4SE paper from a pool of papers and present it. The presentations will take place in one seminar.
Mandatory student presentations and active participation in seminar discussions.

Examiner

José Antonio Hernández López / Dániel Varró

Credits

4.5hp

Comments

Page responsible: Anne Moe

IDA - Department of Computer and Information Science