Hide menu
Program for MuCoCoS-2013 6th Int. Workshop on Multi-/Many-core Computing Systems

6th International Workshop on Multi-/Many-core Computing Systems

September 7, 2013, Edinburgh, Scotland, UK

in conjunction with the 22nd International Conference on
Parallel Architectures and Compilation Techniques (PACT-2013)


Program as PDF




Christoph Kessler (Linköping University) and Sabri Pllana (Linnaeus University)
[Message from the MuCoCoS-2013 Workshop Chairs (PDF)]


Keynote session - Chair: Francois Bodin (IRISA)

Keynote: Dataflow Language Compilation for a Single Chip Massively Parallel Processor
Benoit Dupont de Dinechin (Kalray)
[Abstract (PDF)]


(Coffee break)


Contributed Papers Session 1: Performance Optimization - Chair: Guang Gao (Univ. of Delaware)

Automatic Extraction of Multi-Objective Aware Parallelism for Heterogeneous MPSoCs
Daniel Cordes, Michael Engel, Olaf Neugebauer, and Peter Marwedel (TU Dortmund)
[Slides (PDF)] [Paper in IEEE Xplore]

Optimizing Sparse Matrix Vector Multiplication on Emerging Multicores
Orhan Kislal, Wei Ding, Mahmut Kandemir (Pennsylvania State University) and Ilteris Demirkiran (Embry-Riddle Aeronautical University)
[Slides (PDF)] [Paper in IEEE Xplore]

Quantifying the Performance Impacts of Using Local Memory for Many-Core Processors
Jianbin Fang (TU Delft), Ana Lucia Varbanescu (Univ. of Amsterdam) and Henk Sips (TU Delft)
[Slides (PDF)] [Paper in IEEE Xplore]

Topology-aware Equipartitioning with Coscheduling on Multicore Systems
Jan H. Schönherr, Ben Juurlink and Jan Richling (TU Berlin)
[Paper in IEEE Xplore]


(Lunch break)


Keynote session - Chair: Lasse Natvig (NTNU Trondheim)

Keynote: Multiscale Dataflow Computing
Oliver Pell (Maxeler)
[ Abstract (PDF)]


Contributed Papers Session 2: Portability - Chair: Lasse Natvig (NTNU Trondheim)

One OpenCL to Rule Them All?
Romain Dolbeau (CAPS entreprise), Francois Bodin (IRISA), and Guillaume Colin de Verdiere (CEA, DAM, DIF)
[Paper in IEEE Xplore]


(Coffee break)


Contributed Papers Session 3: Compiler/Run-time Support and Data Structures - Chair: Benoit Dupont de Dinechin (Kalray)

Algorithmic Species Revisited: A Program Code Classification Based on Array References
Cedric Nugteren, Rosilde Corvino, and Henk Corporaal (Eindhoven University of Technology)
[Slides (PDF)] [Paper in IEEE Xplore]

Towards a Compiler/Runtime Synergy to Predict the Scalability of Parallel Loops
Georgios Chatzopoulos (National Techn. Univ. of Athens), Kornilios Kourtis (ETH Zurich), Nectarios Koziris and Georgios Goumas (National Techn. Univ. of Athens)
[Slides (PDF)] [Paper in IEEE Xplore]

ELB-Trees: An Efficient and Lock-free B-tree Derivative
Lars F. Bonnichsen, Sven Karlsson, Christian W. Probst (Technical University of Denmark)
[Paper in IEEE Xplore]


Paper presentation is limited to 25 minutes + 5 minutes for discussion.

Note that registration for the pre-PACT program, including MuCoCoS-2013, includes all PACT workshops and tutorials on 7 and 8 september.
See also the program of the PACT conference itself, on 9-11 september 2013.

Keynote presentation (morning)

Benoit Dupont de Dinechin (CTO, Kalray, France):

"Dataflow Language Compilation for a Single Chip Massively Parallel Processor"

The Kalray MPPA-256 processor (Multi-Purpose Processing Array) integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These cores are distributed across 16 compute clusters and 4 I/O subsystems. On-chip communications and synchronizations are supported by an explicitly addressed dual network-on-chip (NoC), with one node per compute cluster and 4 nodes per 4 I/O subsystem.
The Kalray MPPA software development kit includes a complete programming environment for a C-based dataflow language, whose compiler fully automates the distributed execution of tasks across the processing, memory, communication and synchronization resources of the MPPA architecture.
We first introduce the model of computation of the Kalray dataflow language, which is based on cyclostatic dataflow with extensions such as the firing thresholds of Karp & Miller computation graphs. We then describe the main steps of dataflow compilation to a distributed execution platform. These include: task sequencing, communication buffer sizing, task clustering, DMA engine exploitation, place & route, NoC bandwidth allocation, and generation of run-time tables. Finally, we discuss the suitability and restrictions of this and related static dataflow models of computations with regards to the dynamic and real-time requirements of embedded applications targeted by the MPPA processor.

About the speaker:
Benoit Dupont de Dinechin is the CTO of Kalray and one of the MPPA MANYCORE main architects. He joined Kalray in 2009 as head of the software development group. Prior to Kalray, he was leading the development of production compilers and architecture description tools for DSP and VLIW cores at STMicroelectronics. Benoit contributed to the production compiler of the Cray T3E while working at the Cray Research park between 1995 and 1998. He holds an engineering degree from the Ecole Nationale Supérieure de l'Aéronautique et de l'Espace, and earned a PhD from University of Paris 6 under the supervision of Paul Feautrier.

Keynote presentation (afternoon)

Oliver Pell (Vice President of Engineering, Maxeler, UK):

"Multiscale Dataflow Computing"

Complexity of computation is a function of the underlying representation. We are extending this basic concept to consider representation of computational problems on the application level, the model level, the architecture level, arithmetic level and gate level of computation. In particular, the first step is to consider and optimize the discretization of a problem in time, space and value. Discretization of value is particularly painful, both in Physics where atomic discretization ruins many nice theories, and in computation, where most people just blindly use IEEE double precision floating point so they don't have to worry about details, until they do. Multiscale Dataflow Computing provides a process by which one can optimize the discretization of time, space and value based on a particular underlying computer architecture, and in fact, iterate the molding of the computer architecture and the discretization of the computational challenge.
The above methods have been able to achieve 10-50x faster computation per cubic foot and per Watt, resulting in less nodes per computation and therefore exponentially improved reliability and resiliency. Results published by users worldwide include financial modelling (American Finance Technology Award for most cutting edge technology, 2011), commercial deployment in the Oil&Gas industry (see Society of Exploration Geophysicists meetings and reports), weather modelling (reducing time to compute a Local Area Model - LAM from 2 hours to 2 minutes) and even sparse matrix solvers which can not be parallelized, running 20-40x faster.

About the speaker:
Oliver Pell is Vice President of Engineering at Maxeler, London, UK.

Page responsible: Christoph Kessler
Last updated: 2013-11-10