Papers for Student Presentations (23/3/2011)
Choose one paper from the following list for presentation and another (preferably related) one for opposition.
D. Cederman, Ph. Tsigas:
Supporting Lock-Free Composition of Concurrent Data Objects.
Proceedings of the 7th ACM conference on Computing frontiers, pp. 53-62.
2010.
Maged M. Michael, Michael L. Scott:
Non-Blocking Algorithms and Preemption-Safe Locking
on Multiprogrammed Shared Memory Multiprocessors
J. Parallel and Distr. Comput. 51(1): 1-26, May 1998.
Maurice Herlihy, J. Eliot B. Moss:
Transactional memory: Architectural support
for lock-free data structures.
Proc. ISCA'93 20th Int. Symp. on Computer Architecture, pp. 289-300, 1993.
M. Ansari et al.:
Steal-on-Abort: Improving Transactional Memory Performance through Dynamic Transaction Reordering
Proc. HiPEAC-2009
Eduard Ayguade et al.:
The Design of OpenMP Tasks.
IEEE Trans. on Par. and Distr. Syst. 20(3), March 2009.
T. D. Han and T. Abdelrahman:
hiCUDA: High-Level GPGPU Programming.
IEEE Trans. Par. Distr. Syst. 22(1), Jan. 2011.
P. Charles et al.:
X10: An ObjectOriented Approach to NonUniform Cluster Computing
Proc. OOPSLA-2005
Ganesh Bikshandy et al.:
Design and Use of htalib - A Library for Hierarchically Tiled Arrays.
Proc. LCPC-2006, Springer LNCS 4382:17-32, 2008.
Q. Hou et al.:
BSGP: Bulk-synchronous GPU programming.
ACM Trans. Graph. 27(3), article 19, 2008.
R. Chen et al.:
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
Proc. PACT-2010 Conference, ACM.
Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier.
STARPU: A Unified Platform for Task Scheduling on
Heterogeneous Multicore Architectures.
In Proc. Euro-Par 2009, LNCS 5704, pp. 863-874, 2009.
Cédric Augonnet, Samuel Thibault, and Raymond Namyst.
Automatic Calibration of Performance Models on Heterogeneous
Multicore Architectures.
In Proceedings of the International Euro-Par Workshops 2009, HPPC'09,
volume 6043 of Lecture Notes in Computer Science, Delft,
The Netherlands, pages 56-65, August 2009. Springer.
Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias:
Provably Efficient Scheduling for Languages
with Fine-Grained Parallelism
J. of the ACM 46(2), March 1999, pp. 281-321.
M. Nijhuis et al.:
Mapping and synchronizing streaming applications on Cell processors
Proc. HiPEAC'09 conference, Jan. 2009
H. Park et al.:
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures.
Proc. 17th int. conference on Parallel architectures and compilation techniques (PACT), 2008.
X. Li, M. Garzaran, D. Padua:
A Dynamically Tuned Sorting Library
Proc. CGO-2004
N. Thomas et al.:
A Framework for Adaptive Algorithm Selection in STAPL.
Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPOPP), pp. 277-288, Chicago, Illinois, Jun 2005.
Markus Püschel et al.:
SPIRAL: Code Generation for DSP Transforms
Proceedings of the IEEE 93(2):232-275, 2005
S. Williams et al.:
PERI - Auto-tuning memory-intensive kernels for multicore
SciDAC 2008,
Journal of Physics: Conference Series 125(2008) 012038, IOP Publishing
B. Jang et al.
Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures.
IEEE Trans. Par. Distr. Syst. 22(1), Jan. 2011.
D. Cederman, Ph. Tsigas:
GPU-Quicksort: A practical Quicksort algorithm for graphics processors.
ACM Journal of Experimental Algorithmics, Vol. 14, Article No. 1.4, July 2009.
N. Leischner, V. Osipov, P. Sanders:
GPU Sample Sort.
Proc. IPDPS-2010, April 2010.
Scarpazza, Villa, Petrini:
Efficient Breadth-First Search on the Cell/BE Processor
IEEE Trans. Par. and Distr. Syst. 19(10):1381-1395, 2008.
Gary J. Katz and Joseph T. Kider, Jr:
All-pairs shortest-paths for large graphs on the GPU
23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, 2008.
A. Azevedo et al.:
Parallel H.264 Decoding on an Embedded Multicore Processor.
Proc. HiPEAC'09 conference, Jan. 2009
S. Williams, A. Waterman, D. Patterson:
Roofline: an insightful visual performance model for multicore architectures.
Comm. ACM 52(4), April 2009.
Presenter: (prel.) Thierry Vilmart, Opponent: TBA
V. Adve, M. Vernon:
Parallel program performance prediction using deterministic task graph analysis.
ACM Trans. on Computer Systems 22(1), Feb. 2004.
S. Baghsorkhi et al.:
An adaptive performance modeling tool for GPU architectures.
Proc. ACM PPoPP-2010.
I. Sung et al.:
Data layout transformation exploiting memory-level parallelism in structured grid many-core applications
Proc. PACT-2010, ACM.
N. Vasudevan et al.:
Simple and fast biased locks
Proc. PACT-2010, ACM.
G. Chen and P. Stenström:
A Methodology for Diagnosing Critical Section
Bottlenecks in Multithreaded Applications.
Proc. MULTIPROG-2011 workshop, Heraklion, Jan. 2011, pp. 21-35.
Task:
Prepare a 20 minutes presentation of your chosen paper and at least 3 questions
on the other paper for opposition.
After the presentation, hand in a written summary of your presented
paper on 2-3 pages.
Please send me your presentation slides (ppt or pdf) for approval at least 48h before your presentation. If you do not get any reply, you can proceed and present.