6th International Workshop on Multi-/Many-core Computing Systems
September 7, 2013, Edinburgh, Scotland, UK
in conjunction with the
22nd International Conference on
Parallel Architectures and Compilation Techniques
Program as PDF
Keynote session - Chair: Francois Bodin (IRISA)
Dataflow Language Compilation for a Single Chip Massively Parallel Processor
Contributed Papers Session 1: Performance Optimization - Chair: Guang Gao (Univ. of Delaware)
Automatic Extraction of Multi-Objective Aware Parallelism for
Optimizing Sparse Matrix Vector Multiplication on Emerging
Quantifying the Performance Impacts of Using Local Memory for Many-Core
Topology-aware Equipartitioning with Coscheduling on Multicore Systems
Keynote session - Chair: Lasse Natvig (NTNU Trondheim)
Contributed Papers Session 2: Portability - Chair: Lasse Natvig (NTNU Trondheim)
One OpenCL to Rule Them All?
Contributed Papers Session 3: Compiler/Run-time Support and Data Structures - Chair: Benoit Dupont de Dinechin (Kalray)
Algorithmic Species Revisited: A Program Code Classification Based on
Towards a Compiler/Runtime Synergy to Predict the Scalability of
ELB-Trees: An Efficient and Lock-free B-tree Derivative
Paper presentation is limited to 25 minutes + 5 minutes for discussion.
Note that registration for the
includes all PACT workshops and tutorials on 7 and 8 september.
See also the program of the PACT conference itself, on 9-11 september 2013.
Keynote presentation (morning)
Benoit Dupont de Dinechin (CTO, Kalray, France):
"Dataflow Language Compilation for a Single Chip Massively Parallel Processor"
The Kalray MPPA-256 processor (Multi-Purpose Processing Array) integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These cores are distributed across 16 compute clusters and 4 I/O subsystems. On-chip communications and synchronizations are supported by an explicitly addressed dual network-on-chip (NoC), with one node per compute cluster and 4 nodes per 4 I/O subsystem.
The Kalray MPPA software development kit includes a complete programming environment for a C-based dataflow language, whose compiler fully automates the distributed execution of tasks across the processing, memory, communication and synchronization resources of the MPPA architecture.
We first introduce the model of computation of the Kalray dataflow language, which is based on cyclostatic dataflow with extensions such as the firing thresholds of Karp & Miller computation graphs. We then describe the main steps of dataflow compilation to a distributed execution platform. These include: task sequencing, communication buffer sizing, task clustering, DMA engine exploitation, place & route, NoC bandwidth allocation, and generation of run-time tables. Finally, we discuss the suitability and restrictions of this and related static dataflow models of computations with regards to the dynamic and real-time requirements of embedded applications targeted by the MPPA processor.
About the speaker:
Benoit Dupont de Dinechin is the CTO of Kalray and one of the MPPA MANYCORE main architects. He joined Kalray in 2009 as head of the software development group. Prior to Kalray, he was leading the development of production compilers and architecture description tools for DSP and VLIW cores at STMicroelectronics. Benoit contributed to the production compiler of the Cray T3E while working at the Cray Research park between 1995 and 1998. He holds an engineering degree from the Ecole Nationale Supérieure de l'Aéronautique et de l'Espace, and earned a PhD from University of Paris 6 under the supervision of Paul Feautrier.
Keynote presentation (afternoon)
Oliver Pell (Vice President of Engineering, Maxeler, UK):
"Multiscale Dataflow Computing"
Complexity of computation is a function of the underlying representation. We are extending this basic concept to consider representation of computational problems on the application level, the model level, the architecture level, arithmetic level and gate level of computation. In particular, the first step is to consider and optimize the discretization of a problem in time, space and value. Discretization of value is particularly painful, both in Physics where atomic discretization ruins many nice theories, and in computation, where most people just blindly use IEEE double precision floating point so they don't have to worry about details, until they do. Multiscale Dataflow Computing provides a process by which one can optimize the discretization of time, space and value based on a particular underlying computer architecture, and in fact, iterate the molding of the computer architecture and the discretization of the computational challenge.
The above methods have been able to achieve 10-50x faster computation per cubic foot and per Watt, resulting in less nodes per computation and therefore exponentially improved reliability and resiliency. Results published by users worldwide include financial modelling (American Finance Technology Award for most cutting edge technology, 2011), commercial deployment in the Oil&Gas industry (see Society of Exploration Geophysicists meetings and reports), weather modelling (reducing time to compute a Local Area Model - LAM from 2 hours to 2 minutes) and even sparse matrix solvers which can not be parallelized, running 20-40x faster.
About the speaker:
Oliver Pell is Vice President of Engineering at Maxeler, London, UK.
Page responsible: Christoph Kessler
Last updated: 2013-11-10