DF21500 Multicore Computing
Papers for Student Presentations (23/3/2011)

Choose one paper from the following list for presentation and another (preferably related) one for opposition.

NON-BLOCKING PARALLEL DATA STRUCTURES

D. Cederman, Ph. Tsigas:
Supporting Lock-Free Composition of Concurrent Data Objects.
Proceedings of the 7th ACM conference on Computing frontiers, pp. 53-62. 2010.
Maged M. Michael, Michael L. Scott:
Non-Blocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors
J. Parallel and Distr. Comput. 51(1): 1-26, May 1998.

TRANSACTIONAL MEMORY

Maurice Herlihy, J. Eliot B. Moss:
Transactional memory: Architectural support for lock-free data structures.
Proc. ISCA'93 20th Int. Symp. on Computer Architecture, pp. 289-300, 1993.
M. Ansari et al.:
Steal-on-Abort: Improving Transactional Memory Performance through Dynamic Transaction Reordering
Proc. HiPEAC-2009

PARALLEL LANGUAGES AND ENVIRONMENTS

Eduard Ayguade et al.:
The Design of OpenMP Tasks.
IEEE Trans. on Par. and Distr. Syst. 20(3), March 2009.
T. D. Han and T. Abdelrahman:
hiCUDA: High-Level GPGPU Programming.
IEEE Trans. Par. Distr. Syst. 22(1), Jan. 2011.
P. Charles et al.:
X10: An ObjectOriented Approach to NonUniform Cluster Computing
Proc. OOPSLA-2005
Ganesh Bikshandy et al.:
Design and Use of htalib - A Library for Hierarchically Tiled Arrays.
Proc. LCPC-2006, Springer LNCS 4382:17-32, 2008.
Q. Hou et al.:
BSGP: Bulk-synchronous GPU programming.
ACM Trans. Graph. 27(3), article 19, 2008.
R. Chen et al.:
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
Proc. PACT-2010 Conference, ACM.

SCHEDULING:

Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier.
STARPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures.
In Proc. Euro-Par 2009, LNCS 5704, pp. 863-874, 2009.
Cédric Augonnet, Samuel Thibault, and Raymond Namyst.
Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures.
In Proceedings of the International Euro-Par Workshops 2009, HPPC'09, volume 6043 of Lecture Notes in Computer Science, Delft, The Netherlands, pages 56-65, August 2009. Springer.
Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias:
Provably Efficient Scheduling for Languages with Fine-Grained Parallelism
J. of the ACM 46(2), March 1999, pp. 281-321.
M. Nijhuis et al.:
Mapping and synchronizing streaming applications on Cell processors
Proc. HiPEAC'09 conference, Jan. 2009
H. Park et al.:
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures.
Proc. 17th int. conference on Parallel architectures and compilation techniques (PACT), 2008.

AUTOTUNING FOR MULTICORE

X. Li, M. Garzaran, D. Padua:
A Dynamically Tuned Sorting Library
Proc. CGO-2004
N. Thomas et al.:
A Framework for Adaptive Algorithm Selection in STAPL.
Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPOPP), pp. 277-288, Chicago, Illinois, Jun 2005.
Markus Püschel et al.:
SPIRAL: Code Generation for DSP Transforms
Proceedings of the IEEE 93(2):232-275, 2005
S. Williams et al.:
PERI - Auto-tuning memory-intensive kernels for multicore
SciDAC 2008, Journal of Physics: Conference Series 125(2008) 012038, IOP Publishing
B. Jang et al.
Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures.
IEEE Trans. Par. Distr. Syst. 22(1), Jan. 2011.

PARALLEL ALGORITHMS

D. Cederman, Ph. Tsigas:
GPU-Quicksort: A practical Quicksort algorithm for graphics processors.
ACM Journal of Experimental Algorithmics, Vol. 14, Article No. 1.4, July 2009.
N. Leischner, V. Osipov, P. Sanders:
GPU Sample Sort.
Proc. IPDPS-2010, April 2010.
Scarpazza, Villa, Petrini:
Efficient Breadth-First Search on the Cell/BE Processor
IEEE Trans. Par. and Distr. Syst. 19(10):1381-1395, 2008.
Gary J. Katz and Joseph T. Kider, Jr:
All-pairs shortest-paths for large graphs on the GPU
23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, 2008.
A. Azevedo et al.:
Parallel H.264 Decoding on an Embedded Multicore Processor.
Proc. HiPEAC'09 conference, Jan. 2009

PERFORMANCE ANALYSIS

S. Williams, A. Waterman, D. Patterson:
Roofline: an insightful visual performance model for multicore architectures.
Comm. ACM 52(4), April 2009.
Presenter: (prel.) Thierry Vilmart, Opponent: TBA
V. Adve, M. Vernon:
Parallel program performance prediction using deterministic task graph analysis.
ACM Trans. on Computer Systems 22(1), Feb. 2004.
S. Baghsorkhi et al.:
An adaptive performance modeling tool for GPU architectures.
Proc. ACM PPoPP-2010.

Other Optimizations and Analyses

I. Sung et al.:
Data layout transformation exploiting memory-level parallelism in structured grid many-core applications
Proc. PACT-2010, ACM.
N. Vasudevan et al.:
Simple and fast biased locks
Proc. PACT-2010, ACM.
G. Chen and P. Stenström:
A Methodology for Diagnosing Critical Section Bottlenecks in Multithreaded Applications.
Proc. MULTIPROG-2011 workshop, Heraklion, Jan. 2011, pp. 21-35.

Task: Prepare a 20 minutes presentation of your chosen paper and at least 3 questions on the other paper for opposition.
After the presentation, hand in a written summary of your presented paper on 2-3 pages.

Please send me your presentation slides (ppt or pdf) for approval at least 48h before your presentation. If you do not get any reply, you can proceed and present.

This page is maintained by Christoph Kessler (chrke \at ida.liu.se)

DF21500 Multicore Computing Papers for Student Presentations (23/3/2011)