DF21500 Multicore Computing
Papers for Student Presentations (18/9/2013)

Choose one paper from the following list for presentation and another (preferably related) one for opposition.

NON-BLOCKING PARALLEL DATA STRUCTURES

D. Cederman, Ph. Tsigas:
Supporting Lock-Free Composition of Concurrent Data Objects.
Proceedings of the 7th ACM conference on Computing frontiers, pp. 53-62. 2010.
Maged M. Michael, Michael L. Scott:
Non-Blocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors
J. Parallel and Distr. Comput. 51(1): 1-26, May 1998.

TRANSACTIONAL MEMORY

Maurice Herlihy, J. Eliot B. Moss:
Transactional memory: Architectural support for lock-free data structures.
Proc. ISCA'93 20th Int. Symp. on Computer Architecture, pp. 289-300, 1993.

PARALLEL LANGUAGES AND ENVIRONMENTS

Q. Hou et al.:
BSGP: Bulk-synchronous GPU programming.
ACM Trans. Graph. 27(3), article 19, 2008.
R. Chen et al.:
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
Proc. PACT-2010 Conference, ACM.

SCHEDULING:

M. Nijhuis et al.:
Mapping and synchronizing streaming applications on Cell processors
Proc. HiPEAC'09 conference, Jan. 2009

AUTOTUNING FOR MULTICORE

N. Thomas et al.:
A Framework for Adaptive Algorithm Selection in STAPL.
Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPOPP), pp. 277-288, Chicago, Illinois, Jun 2005.
Markus Püschel et al.:
SPIRAL: Code Generation for DSP Transforms
Proceedings of the IEEE 93(2):232-275, 2005
S. Williams et al.:
PERI - Auto-tuning memory-intensive kernels for multicore
SciDAC 2008, Journal of Physics: Conference Series 125(2008) 012038, IOP Publishing
B. Jang et al.
Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures.
IEEE Trans. Par. Distr. Syst. 22(1), Jan. 2011.

PARALLEL ALGORITHMS

M. Amber Hassaan, Martin Burtscher, and Keshav Pingali:
Ordered vs. Unordered: a Comparison of Parallelism and Work-efficiency in Irregular algorithms
Proc. PPoPP'11 ACM Symp. on Principles and Practice of Parallel Programming, 2011.
D. Cederman, Ph. Tsigas:
GPU-Quicksort: A practical Quicksort algorithm for graphics processors.
ACM Journal of Experimental Algorithmics, Vol. 14, Article No. 1.4, July 2009.
N. Leischner, V. Osipov, P. Sanders:
GPU Sample Sort.
Proc. IPDPS-2010, April 2010.
A. Azevedo et al.:
Parallel H.264 Decoding on an Embedded Multicore Processor.
Proc. HiPEAC'09 conference, Jan. 2009

PERFORMANCE ANALYSIS

V. Adve, M. Vernon:
Parallel program performance prediction using deterministic task graph analysis.
ACM Trans. on Computer Systems 22(1), Feb. 2004.
S. Baghsorkhi et al.:
An adaptive performance modeling tool for GPU architectures.
Proc. ACM PPoPP-2010.

Other Optimizations and Analyses

N. Vasudevan et al.:
Simple and fast biased locks
Proc. PACT-2010, ACM.

Task: Prepare a 25 minutes presentation of your chosen paper and at least 3 questions on the other paper for opposition.
After the presentation, hand in a written summary of your presented paper on 2-3 pages.

Please send me your presentation slides (ppt or pdf) for approval at least 48h before your presentation. If you do not get any reply, you can proceed and present.

This page is maintained by Christoph Kessler (chrke \at ida.liu.se)

DF21500 Multicore Computing Papers for Student Presentations (18/9/2013)