DF21500 Multicore Computing

Papers for Student Presentations (23/3/2011)

Choose one paper from the following list for presentation and another (preferably related) one for opposition.


  1. D. Cederman, Ph. Tsigas:
    Supporting Lock-Free Composition of Concurrent Data Objects.
    Proceedings of the 7th ACM conference on Computing frontiers, pp. 53-62. 2010.

  2. Maged M. Michael, Michael L. Scott:
    Non-Blocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors
    J. Parallel and Distr. Comput. 51(1): 1-26, May 1998.


  4. Maurice Herlihy, J. Eliot B. Moss:
    Transactional memory: Architectural support for lock-free data structures.
    Proc. ISCA'93 20th Int. Symp. on Computer Architecture, pp. 289-300, 1993.

  5. M. Ansari et al.:
    Steal-on-Abort: Improving Transactional Memory Performance through Dynamic Transaction Reordering
    Proc. HiPEAC-2009


  7. Eduard Ayguade et al.:
    The Design of OpenMP Tasks.
    IEEE Trans. on Par. and Distr. Syst. 20(3), March 2009.

  8. T. D. Han and T. Abdelrahman:
    hiCUDA: High-Level GPGPU Programming.
    IEEE Trans. Par. Distr. Syst. 22(1), Jan. 2011.

  9. P. Charles et al.:
    X10: An ObjectOriented Approach to NonUniform Cluster Computing
    Proc. OOPSLA-2005

  10. Ganesh Bikshandy et al.:
    Design and Use of htalib - A Library for Hierarchically Tiled Arrays.
    Proc. LCPC-2006, Springer LNCS 4382:17-32, 2008.

  11. Q. Hou et al.:
    BSGP: Bulk-synchronous GPU programming.
    ACM Trans. Graph. 27(3), article 19, 2008.

  12. R. Chen et al.:
    Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
    Proc. PACT-2010 Conference, ACM.


  14. Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier.
    STARPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures.
    In Proc. Euro-Par 2009, LNCS 5704, pp. 863-874, 2009.

  15. Cédric Augonnet, Samuel Thibault, and Raymond Namyst.
    Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures.
    In Proceedings of the International Euro-Par Workshops 2009, HPPC'09, volume 6043 of Lecture Notes in Computer Science, Delft, The Netherlands, pages 56-65, August 2009. Springer.

  16. Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias:
    Provably Efficient Scheduling for Languages with Fine-Grained Parallelism
    J. of the ACM 46(2), March 1999, pp. 281-321.

  17. M. Nijhuis et al.:
    Mapping and synchronizing streaming applications on Cell processors
    Proc. HiPEAC'09 conference, Jan. 2009

  18. H. Park et al.:
    Edge-centric modulo scheduling for coarse-grained reconfigurable architectures.
    Proc. 17th int. conference on Parallel architectures and compilation techniques (PACT), 2008.

  20. X. Li, M. Garzaran, D. Padua:
    A Dynamically Tuned Sorting Library
    Proc. CGO-2004

  21. N. Thomas et al.:
    A Framework for Adaptive Algorithm Selection in STAPL.
    Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPOPP), pp. 277-288, Chicago, Illinois, Jun 2005.

  22. Markus Püschel et al.:
    SPIRAL: Code Generation for DSP Transforms
    Proceedings of the IEEE 93(2):232-275, 2005

  23. S. Williams et al.:
    PERI - Auto-tuning memory-intensive kernels for multicore
    SciDAC 2008, Journal of Physics: Conference Series 125(2008) 012038, IOP Publishing

  24. B. Jang et al.
    Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures.
    IEEE Trans. Par. Distr. Syst. 22(1), Jan. 2011.


  26. D. Cederman, Ph. Tsigas:
    GPU-Quicksort: A practical Quicksort algorithm for graphics processors.
    ACM Journal of Experimental Algorithmics, Vol. 14, Article No. 1.4, July 2009.

  27. N. Leischner, V. Osipov, P. Sanders:
    GPU Sample Sort.
    Proc. IPDPS-2010, April 2010.

  28. Scarpazza, Villa, Petrini:
    Efficient Breadth-First Search on the Cell/BE Processor
    IEEE Trans. Par. and Distr. Syst. 19(10):1381-1395, 2008.

  29. Gary J. Katz and Joseph T. Kider, Jr:
    All-pairs shortest-paths for large graphs on the GPU
    23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, 2008.

  30. A. Azevedo et al.:
    Parallel H.264 Decoding on an Embedded Multicore Processor.
    Proc. HiPEAC'09 conference, Jan. 2009


  32. S. Williams, A. Waterman, D. Patterson:
    Roofline: an insightful visual performance model for multicore architectures.
    Comm. ACM 52(4), April 2009.
    ACM Trans. on Computer Systems 22(1), Feb. 2004.

  33. V. Adve, M. Vernon:
    Parallel program performance prediction using deterministic task graph analysis.
    ACM Trans. on Computer Systems 22(1), Feb. 2004.

  34. S. Baghsorkhi et al.:
    An adaptive performance modeling tool for GPU architectures.
    Proc. ACM PPoPP-2010.

  36. I. Sung et al.:
    Data layout transformation exploiting memory-level parallelism in structured grid many-core applications
    Proc. PACT-2010, ACM.

  37. N. Vasudevan et al.:
    Simple and fast biased locks
    Proc. PACT-2010, ACM.

  38. G. Chen and P. Stenström:
    A Methodology for Diagnosing Critical Section Bottlenecks in Multithreaded Applications.
    Proc. MULTIPROG-2011 workshop, Heraklion, Jan. 2011, pp. 21-35.

Task: Prepare a 20 minutes presentation of your chosen paper and at least 3 questions on the other paper for opposition.
After the presentation, hand in a written summary of your presented paper on 2-3 pages.

Please send me your presentation slides (ppt or pdf) for approval at least 48h before your presentation. If you do not get any reply, you can proceed and present.

