Parallel and Distributed Processing of Large Data Streams and Graphs2024HT
|
|
Course plan
Recommended for
Graduate students with some background in parallel computing and/or big-data analytics, who are interested in domain-specific programming abstractions and frameworks for parallel and distributed computing on big graphs and/or data streams, and for efficient implementation of such computations on parallel, distributed and heterogeneous computer systems.
Organization and Schedule
The course is planned for early autumn (ht1) 2024.
- Lectures (ca. 6-8h)
- Written or oral exam TBD, ht1/2024, 1.5p
- Paper presentation by participants and opposition, 1.5hp
- Programming miniproject, with demo and a summary report, 2hp
The course was last given
This is a new course.
Goals/Contents
Graphs are a common abstraction in mathematics and computer science used to
represent real-world entities and their relations, such as web pages, persons
in social networks, persons or places on maps, cells in biological networks,
objects or particles in complex computer graphics, finite element meshes,
documents with citations, and many other scenarios that may easily scale up to
many billions of vertices and trillions of edges. Processing data represented
as large graphs requires suitable domain-specific programming abstractions as
well as techniques for efficient execution on parallel, distributed and
accelerator-based computing resources, which is currently a hot topic of
research.
Data stream processing is another fundamental form of big-data processing that
calls for parallel and distributed computing solutions to satisfy
application-specific throughput and latency requirements. Also here,
domain-specific frameworks and implementation techniques exist.
In this course, we consider techniques for distributed and parallel computing
on large graphs and data streams, including the theoretical foundations as well
as domain-specific programming models and implementation techniques for
parallel, distributed and heterogenous systems.
Prerequisites
We expect some background in programming for parallel, accelerator and/or
cluster systems from a previous course, such as TDDC78 or TDDD56.
Having taken a course on big-data analytics such as TDDE31/732A54 is also
useful but not required.
Programming skills and use of Linux is useful for the miniproject.
Literature
Will be announced later.
Examination
- Active participation in the introductory lectures and the presentation
sessions.
- Written or oral exam TBD (1.5hp)
- Successful individual paper presentation with opposition (1.5hp)
- Successfully completed miniproject with demo and report (2hp)
Credit
5hp if all examination moments are fulfilled.
Related Courses
This course builds upon the IDA master-level courses on Programming of parallel computers - methods and tools (TDDC78) resp. Multicore and GPU programming (TDDD56) and Big Data Analytics (TDDE31/732A54).
Page responsible: Director of Graduate Studies