Trustworthy Machine Learning

2026HT, 6.0 credits

Status	Open for interest registrations
School	IDA-gemensam (IDA)
Division	CYBER
Owner	Buse Atli
Homepage	TBD


	Log in

Course plan

No of lectures

6-8 seminars (Depends on the number of participants)

Recommended for

This course is intended for PhD students from AI/ML or Cybersecurity disciplines and wants to gain deeper knowledge about security and privacy vulnerabilities in current machine learning systems, model and data governance, as well as compliance with legal frameworks.

The course was last given

This is a new course.

Goals

- Apply threat modeling methods to identify and analyze vulnerabilities within machine learning (ML) systems.
- Explain and evaluate key security and privacy risks associated with ML systems, as well as solutions to mitigate such risks.
- Critically analyze and present seminal papers that contribute different aspects of trustworthy machine learning, including robustness, privacy, truthfulness, and integrity.
- Explore and assess relevant legal and governance frameworks that influence the development and deployment of ML systems.
- Evaluate contemporary ML systems from the perspective of trustworthy ML principles, identify gaps between theory and practice.

Prerequisites

- Some background in machine learning/deep learning or related courses.

Organization

The course mainly consist of student presentations and seminar discussions focused on recent research articles.

Content

The course introduces PhD students to methods and frameworks for analyzing and ensuring trustworthiness in machine learning (ML) systems. The first two sessions are delivered in lecture format and use case studies, providing an overview of threat modeling, security, and privacy considerations in ML systems. The remaining sessions consist of student presentations and seminar discussions focused on recent research articles addressing various aspects of trustworthy machine learning, including robustness, data and model confidentiality, truthfulness, integrity, verifiability, and auditability.

Literature

The course literature consists of a number of peer-reviewed papers. A detailed list of research articles will be provided in the beginning of the course.

Three examples of papers are provided below. Note that although these papers are one of the most cited seminal papers, they may not be included when the course starts as the field is rapidly evolving.

- Goodfellow, I., Shlens, J., and Szegedy, C. (2015) "Explaining and Harnessing Adversarial Examples". 2015 International Conference on Learning Representations. https://doi.org/10.48550/arXiv.1412.6572
- Adi, Yossi, et al. (2018) "Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring". 27th USENIX Security Symposium https://www.usenix.org/conference/usenixsecurity18/presentation/adi
- Tabassi, E. (2023), Artificial Intelligence Risk Management Framework (AI RMF 1.0), NIST Trustworthy and Responsible AI, National Institute of Standards and Technology, https://doi.org/10.6028/NIST.AI.100-1.

Lectures

The course consists of two lectures followed by a series of student-led seminars. The schedule may be adjusted according to the presentation preferences and the selected topics.

Lectures:
1. Course Introduction and Threat Modeling in Machine Learning: Overview of trustworthy ML concepts, threat modeling methodologies, and common vulnerabilities in ML pipelines.
2. Case Studies and AI Governance: Discussion of real-world case studies illustrating security and privacy challenges in ML systems, and an introduction to governance and regulatory frameworks in ML systems.

Seminars:
1. Test-Time Integrity: Attacks and defenses related to inference-time manipulation, adversarial examples, and reliability.
2. Training-Time Integrity: Data poisoning, backdoor attacks, and robustness of training pipelines.
3. Model Confidentiality: Model extraction, intellectual property protection, and secure model sharing.
4. Data Confidentiality: Privacy-preserving machine learning, membership inference, and differential privacy mechanisms.
5. AI Governance, Verification and Auditability: Technical measures to ensure compliance, accountability, truthfulness and transparency.
6. Trustworthiness in Practice: Trustworthiness in real-world applications and as a multi-objective problem.

Examination

Examination consists of 80% attendance, a mandatory oral presentation, active participation in discussions following the presentations, weekly reading assignments as a preparation for the seminar sessions, and a final deliverable.

The final deliverable is a written essay (5-6 pages) on a chosen aspect of the trustworthiness requirements in machine learning, in which the student is expected to define the selected concept, review the state of the art, critically analyze existing approaches, and discuss promising research directions or open challenges.

Examiner

Buse Atli

Credits

6hp

Comments

The course can be given on Zoom/Hybrid if there are non-local participants.

Ethics statement: The course includes topics that might involve techniques capable of causing harm. In this course, we emphasize the ethical use of these techniques, and strictly for non-commercial research and educational purposes. Any unethical activities (using lecture materials, seminal papers, or assignments for harmful purposes, spreading or exploiting vulnerabilities in AI/ML services) are strictly prohibited.

Page responsible: Anne Moe

IDA - Department of Computer and Information Science