Large Language Models for Software Engineering2025HT
|
|
Course plan
No of lectures
5 lectures, 4 lab sessions, 3 seminars
Recommended for
All PhD students interested in applying LLMs to software engineering. The course may be also interesting to PhD students interested in applying LLMs to other areas of computer science or engineering.
The course was last given
Goals
The aim of the course is to give the student a basic overview of how LLMs work and how they are tuned to perform software engineering tasks.
Prerequisites
- Good knowledge of Python. All the assignments will be in Python.
- Basics in probability, statistics, and algebra.
- Foundations of Machine Learning: one previous introductory MSc level course
taken on Machine Learning.
Organization
Lectures, labs, student presentations, and discussions.
Content
- Natural language processing and deep learning basics: neural networks and
transformers architectures.
- Pre-training and fine-tuning for Software Engineering: CodeBERT.
- The first benchmark for code intelligence: CodeXGlue.
- The foundational large language models (LLMs) of code: StarCoder, GitHub
Copilot, etc.
- From foundational models to code assistants: ChatGPT, WizardCoder, etc.
- State of the art benchmarks for LLMs of code.
- LLM agents for Software Engineering.
Literature
- Tunstall, L., Von Werra, L., & Wolf, T. (2022). Natural language processing
with transformers. " O'Reilly Media, Inc.".
- Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., ... & Zhou, M.
(2020).
- Codebert: A pre-trained model for programming and natural languages. arXiv
preprint arXiv:2002.08155.
- Lu, S., Guo, D., Ren, S., Huang, J., Svyatkovskiy, A., Blanco, A., ... & Liu,
S. (2021). Codexglue: A machine learning benchmark dataset for code
understanding and generation. arXiv preprint arXiv:2102.04664.
- Li, R., Allal, L. B., Zi, Y., Muennighoff, N., Kocetkov, D., Mou, C., ... &
de Vries, H. (2023). Starcoder: may the source be with you!. arXiv preprint
arXiv:2305.06161.
- Auffarth, B. (2023). Generative AI with LangChain. Packt Publishing.
Lectures
- Lecture: Introduction and part of the background
- Lecture: Neural networks and transformers architectures.
- Lab session/tutorial
- Lecture: The first code LLM, benchmark, and metrics; CodeBERT and CodeXGlue
- Lab session/tutorial
- Lecture: Foundational code models and code assistants; GitHub Copilot
StarCoder, WizardCoder, etc.
- Lab session/tutorial
- LLM agents for Software engineering
- Lab session/tutorial
- 3 seminar sessions for student presentations and discussions
Examination
3 assignments after the lab sessions.
1 reading assignment: select and read one LLM4SE paper from a pool of papers
and present it. The presentations will take place in one seminar.
Mandatory student presentations and active participation in seminar
discussions.
Examiner
José Antonio Hernández López / Dániel Varró
Credits
4.5hp
Comments
Page responsible: Anne Moe
