Hide menu

Parallel and Distributed Processing of Large Data Streams and Graphs

2024HT

Status Open for interest registrations
School IDA-gemensam (IDA)
Division PELAB
Owner Christoph Kessler
Homepage https://www.ida.liu.se/~chrke55/courses/PDPGS/

  Log in  




Course plan

Recommended for

Graduate students with some background in parallel computing and/or big-data analytics, who are interested in domain-specific programming abstractions and frameworks for parallel and distributed computing on big graphs and/or data streams, and for efficient implementation of such computations on parallel, distributed and heterogeneous computer systems.

Organization and Schedule

The course is planned for early autumn (ht1) 2024.

- Lectures (ca. 6-8h)
- Written or oral exam TBD, ht1/2024, 1.5p
- Paper presentation by participants and opposition, 1.5hp
- Programming miniproject, with demo and a summary report, 2hp

The course was last given

This is a new course.

Goals/Contents

Graphs are a common abstraction in mathematics and computer science used to represent real-world entities and their relations, such as web pages, persons in social networks, persons or places on maps, cells in biological networks, objects or particles in complex computer graphics, finite element meshes, documents with citations, and many other scenarios that may easily scale up to many billions of vertices and trillions of edges. Processing data represented as large graphs requires suitable domain-specific programming abstractions as well as techniques for efficient execution on parallel, distributed and accelerator-based computing resources, which is currently a hot topic of research.

Data stream processing is another fundamental form of big-data processing that calls for parallel and distributed computing solutions to satisfy application-specific throughput and latency requirements. Also here, domain-specific frameworks and implementation techniques exist.

In this course, we consider techniques for distributed and parallel computing on large graphs and data streams, including the theoretical foundations as well as domain-specific programming models and implementation techniques for parallel, distributed and heterogenous systems.

Prerequisites

We expect some background in programming for parallel, accelerator and/or cluster systems from a previous course, such as TDDC78 or TDDD56.
Having taken a course on big-data analytics such as TDDE31/732A54 is also useful but not required.
Programming skills and use of Linux is useful for the miniproject.

Literature

Will be announced later.

Examination

- Active participation in the introductory lectures and the presentation sessions.
- Written or oral exam TBD (1.5hp)
- Successful individual paper presentation with opposition (1.5hp)
- Successfully completed miniproject with demo and report (2hp)

Credit

5hp if all examination moments are fulfilled.

Related Courses

This course builds upon the IDA master-level courses on Programming of parallel computers - methods and tools (TDDC78) resp. Multicore and GPU programming (TDDD56) and Big Data Analytics (TDDE31/732A54).


Page responsible: Director of Graduate Studies