Fourth Swedish Workshop on Multicore Computing
November 23-25, 2011, Linköping University
Multicore un-unplugged: tales from the mobile processor trenches
Speaker: Dr. David Moloney, CTO Movidius Ltd., Dublin, Ireland
Movidius is a fabless semiconductor company based in Dublin, Ireland and established in 2005. Since inception the company vision has been to radically change the approach to providing multimedia functionality in mobile devices. The incumbent approach has consisted on one or more processors augmented by DSP and fixed-function cores. Having evaluated the available cores and the emerging multimedia requirements Movidius decided to build a new processor and multicore architecture optimized from the ground up for power efficiency.
The resulting SHAVE processor is a hybrid stream processor architecture combining the best features of GPUs, DSPs and RISC with both 8/16/32 bit integer and 16/32 bit floating point arithmetic as well as unique features such as hardware support for sparse data structures. SHAVE spans a very broad spectrum of applications from game-physics to 3D HD video encode on Movidius 65nm Myriad SoC containing 8 SHAVEs as well as an on-board 32-bit RISC and numerous programmable peripherals. Movidius product roadmap integrates 16 and more SHAVEs in 28nm and lower geometries.
The process of HW/SW co-design of the SHAVE ISA and Myriad multicore SoC architecture will be described as well as the lessons learned, as the old Chinese proverb says "be careful what you wish for". The detailed SHAVE architecture as well as unique features will be presented, along with the SoC implementation details including power optimization and power-saving features of the Myriad SoC. Results based on silicon will be outlined in comparison to the state-of-the-art. Finally experiences gained in the design of a range of Movidius multimedia applications for the Myriad architecture and SHAVE processor will be described along with details of the software tools and development flow.
David Moloney holds a B.Eng. degree from Dublin City University, and Ph.D. from Trinity College Dublin. For the past 25 years he has worked in microelectronics starting in 1985 with Infineon in Munich and ST Microelectronics in Milan, before returning to Ireland 1994 to help found a series of start-up technology companies including Parthus-CEVA and Silansys. David is currently co-founder (2005) and CTO of Movidius Ltd., a fabless semiconductor company headquartered in Dublin and focused on the design of software programmable multimedia accelerator SoCs. He holds 18 US patents and numerous conference and journal papers on DSP and computer architecture. David is a member of the IEEE.
Speaker: Dr. Victor Pankratius, Karlsruhe Institute of Technology, Germany
The increasing variety of multicore platforms complicates parallel application development and performance tuning. This talk outlines new perspectives on how auto-tuning can be employed beyond scientific applications. I will discuss recent work on auto-tuning in the context of software architectures, database query optimization, and application-level performance optimization on the Single-Chip Cloud Computer. The talk elaborates on the reasons why in the long run we have to make every performance-critical parallel application auto-tuned by default. In addition to performance optimization, auto-tuning simplifies the development of complex applications and makes portability easier. To realize the vision of ubiquitous auto-tuning, I will present an OS-integrated approach that tunes all multicore applications while they are running.
Dr. Pankratius heads the Multicore Software Engineering investigator group at the Karlsruhe Institute of Technology, Germany. He also serves as the elected chairman of the Software Engineering for Parallel Systems (SEPARS) international working group. Dr. Pankratius' research concentrates on how to make parallel programming easier and covers a range of research topics including auto-tuning, language design, language usability, debugging, and empirical studies. Contact him at http://www.victorpankratius.com.
Design Challenges for Scalable Concurrent Data Structures
Speaker: Prof. Philippas Tsigas, Chalmers University of Technology
Concurrent data structure designers are striving to maintain consistency of data structures while keeping the use of mutual exclusion and expensive synchronization to a minimum, in order to prevent the data structure from becoming a sequential bottleneck. We will discuss the challenges that come from the elimination of locks and mutual exclusion from the design space of concurrent data structures and also part of our effort to design efficient lock-free data structures. We will close by briefly discussing our implementation efforts that try to help programmability by providing the major findings of the concurrent data structures community in a form that makes them directly accessible to non-experts.
Philippas Tsigas is a Professor in the Department of Computing Science and Engineering at Chalmers University of Technology. He received a BSc in Mathematics from the University of Patras, Greece and a Ph.D. in Computer Engineering and Informatics from the same University in 1994. From 1993 to 1994 he was with the National Research Institute for Mathematics and Computer Science in the Netherlands (CWI), Amsterdam. From 1995 to 1997 he was with the Max-Planck Institute for Computer Science, Saarbrucken, Germany. He joined Chalmers University of Technology in 1997. He is the co-founder and the head of the Distributed Computing and Systems research group at Chalmers. His research interests include parallel and distributed computing, parallel and distributed systems, and information visualization. He is the initiator and one of the developers of NOBLE, a library of non-blocking data structures. For more information, including contact information: See www.cse.chalmers.se/~tsigas
Speaker: Dr. David Black-Schaffer, Uppsala University
GPUs have been hyped as the solution to nearly every intensive processing problem for the past half-decade, but in reality they are a mixed bag. This talk will take a look at the potential offered by these systems in terms of performance and efficiency, and discuss some of their major weaknesses in terms of programmability and applicability.
Dr. Black-Schaffer received his Ph.D. in computer architecture from Stanford University working on programming systems for power-efficient architectures for embedded media processing. After graduating he worked at Apple Inc., designing and implementing the OpenCL standard for heterogeneous multicore computing. He is now an assistant professor in the architecture research group at Uppsala University where his interests are in efficient runtime systems for parallel execution.