DF21500 Multicore Computing

Papers for Student Presentations (18/9/2013)

Choose one paper from the following list for presentation and another (preferably related) one for opposition.


  1. D. Cederman, Ph. Tsigas:
    Supporting Lock-Free Composition of Concurrent Data Objects.
    Proceedings of the 7th ACM conference on Computing frontiers, pp. 53-62. 2010.

  2. Maged M. Michael, Michael L. Scott:
    Non-Blocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors
    J. Parallel and Distr. Comput. 51(1): 1-26, May 1998.


  4. Maurice Herlihy, J. Eliot B. Moss:
    Transactional memory: Architectural support for lock-free data structures.
    Proc. ISCA'93 20th Int. Symp. on Computer Architecture, pp. 289-300, 1993.


  6. Q. Hou et al.:
    BSGP: Bulk-synchronous GPU programming.
    ACM Trans. Graph. 27(3), article 19, 2008.

  7. R. Chen et al.:
    Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
    Proc. PACT-2010 Conference, ACM.


  9. M. Nijhuis et al.:
    Mapping and synchronizing streaming applications on Cell processors
    Proc. HiPEAC'09 conference, Jan. 2009


  11. N. Thomas et al.:
    A Framework for Adaptive Algorithm Selection in STAPL.
    Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPOPP), pp. 277-288, Chicago, Illinois, Jun 2005.

  12. Markus Püschel et al.:
    SPIRAL: Code Generation for DSP Transforms
    Proceedings of the IEEE 93(2):232-275, 2005

  13. S. Williams et al.:
    PERI - Auto-tuning memory-intensive kernels for multicore
    SciDAC 2008, Journal of Physics: Conference Series 125(2008) 012038, IOP Publishing

  14. B. Jang et al.
    Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures.
    IEEE Trans. Par. Distr. Syst. 22(1), Jan. 2011.


  16. M. Amber Hassaan, Martin Burtscher, and Keshav Pingali:
    Ordered vs. Unordered: a Comparison of Parallelism and Work-efficiency in Irregular algorithms
    Proc. PPoPP'11 ACM Symp. on Principles and Practice of Parallel Programming, 2011.

  17. D. Cederman, Ph. Tsigas:
    GPU-Quicksort: A practical Quicksort algorithm for graphics processors.
    ACM Journal of Experimental Algorithmics, Vol. 14, Article No. 1.4, July 2009.

  18. N. Leischner, V. Osipov, P. Sanders:
    GPU Sample Sort.
    Proc. IPDPS-2010, April 2010.

  19. A. Azevedo et al.:
    Parallel H.264 Decoding on an Embedded Multicore Processor.
    Proc. HiPEAC'09 conference, Jan. 2009


  21. V. Adve, M. Vernon:
    Parallel program performance prediction using deterministic task graph analysis.
    ACM Trans. on Computer Systems 22(1), Feb. 2004.

  22. S. Baghsorkhi et al.:
    An adaptive performance modeling tool for GPU architectures.
    Proc. ACM PPoPP-2010.

  23. Other Optimizations and Analyses

  24. N. Vasudevan et al.:
    Simple and fast biased locks
    Proc. PACT-2010, ACM.

Task: Prepare a 25 minutes presentation of your chosen paper and at least 3 questions on the other paper for opposition.
After the presentation, hand in a written summary of your presented paper on 2-3 pages.

Please send me your presentation slides (ppt or pdf) for approval at least 48h before your presentation. If you do not get any reply, you can proceed and present.

This page is maintained by Christoph Kessler (chrke \at ida.liu.se)