List of Papers
-
J. Handy and T. Coughlin.
Semiconductor Architectures Enable Compute in Memory
Computer, vol. 56, no. 5, pp. 126-129, May 2023
-
Muthukrishnan H, Nellans D, Lustig D, et al.
Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers
Proc. 48th Annual International Symposium on Computer Architecture, 2021
-
Jang J W, Lee S, Kim D, et al.
Sparsity-Aware and Re-configurable NPU Architecture for Samsung Flagship Mobile SoC
Proc. 48th Annual International Symposium on Computer Architecture, 2021
-
Wang M, Ta T, Cheng L, et al.
Efficiently Supporting Dynamic Task Parallelism on Heterogeneous Cache-Coherent Systems
Proc. 47th Annual International Symposium on Computer Architecture, 2020
-
K. Wang, et al.
IntelliNoC: A Holistic Design Framework for Energy-Efficient and Reliable On-Chip Communication for Manycores
46th International Symposium on Computer Architecture, 2019
-
A. Pattnaik, et al.
Opportunistic Computing in GPU Architectures
46th International Symposium on Computer Architecture, 2019
-
Jeff Dean, David Patterson,
and Cliff Young (Google Brain)
A New Golden Age in Computer Architecture: Empowering the MachineLearning Revolution
Architectural Support for Programming Languages and Operating Systems (ASPLOS'17)
-
Norman P. Jouppi, et al.
In-Datacenter Performance Analysis of a Tensor Processing Unit
Proc. 44th International Symposium on Computer Architecture, 2017.
-
Martin Abadi, et al.
TensorFlow: A System for Large-Scale Machine Learning
Proc. 12th USENIX Symposium on Operating Systems Design, 2016.
-
Shawn Hershey, et al.
CNN Architectures for Large-Scale Audio Classification
Proc. IEEE ICASSP, 2017.
-
Gorkem Asilioglu, et al.
LaZy superscalar
ISCA, 2015.
-
Eyerman, Stijn and Eeckhout, Lieven
The Benefit of SMT in the Multi-core Era: Flexibility Towards Degrees of Thread-level Parallelism
ASPLOS, 2014.
-
E. Shiu and J. Ko (Google)
System design challenges for future consumer devices: From glass to Chromebooks
ICEP, 2016.
-
Arkaprava Basu, et al. (AMD Research)
Software Assisted Hardware Cache Coherence for Heterogeneous Processors
MEMSYS, 2016.
-
Whitepaper sponsored by AMD
HSA: A New Architecture for Heterogeneous Computing
TIRIAS research, 2013.
-
Loi, I; Benini, L
A Multi Banked - Multi Ported - non Blocking Shared L2 Cache for MPSoC Platforms
Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014.
-
Pricopi, M; Mitra, T
Bahurupi: A polymorphic heterogeneous multi-core architecture
ACM Transactions on Architecture and Code Optimization (TACO), 2012.
-
Minji Kim ; Jinyong Lee ; Younglok Kim
Fast and flexible pipelined multi-processor architecture for multimedia device
7th International Symposium on Communication Systems Networks and Digital Signal Processing (CSNDSP), 2010
-
Shekofteh, S.K. ; Deldari, H. ; Khalkhali, M.B.
Reducing cache contention in a multi-core processor via a scheduler
3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), 2010
-
Cohen, J. ; Garland, M.
Novel Architectures: Solving Computational Problems with GPU Computing
Computing in Science & Engineering, 2009
-
Chengming Zou; Chunfen Xia; Guanghui Zhao
Numerical Parallel Processing Based on GPU with CUDA Architecture
International Conference on Wireless Networks and Information Systems (WNIS), 2009
-
Zamith, M.P.M. ; Clua, E.W.G. ; Conci, A. ; Montenegro, A.
Parallel processing between GPU and CPU: Concepts in a game architecture
Computer Graphics, Imaging and Visualisation (CGIV), 2007
-
Hankins, R.A.; Chinya, G.N.; Collins, J.D.; Wang, P.H.; Rakvic, R.; Hong Wang; Shen, J.P.
Multiple Instruction Stream Processor
33th Intl. Symp. on Computer Architecture (ISCA), pp. 114-127, 2006.
-
Jichuan Chang; Sohi, G.S.
Cooperative Caching for Chip Multiprocessors
33th Intl. Symp. on Computer Architecture (ISCA), pp. 264-276, 2006.
-
Alameldeen, A.R.; Wood, D.A.
Interactions Between Compression and Prefetching in Chip Multiprocessors
13th Intl. Symp. on High Performance Computer Architecture (HPCA), pp. 228-239, 2007.
-
Strauss, K., Shen, X., and Torrellas, J. 2006.
Flexible SnoopingAdaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors.
33rd Ann. Intl. Symp. on Computer Architecture (ISCA), pp. 327-338.
-
Hoseok Chang, Junho Cho, and Wonyong Sung. 2006.
Performance Evaluation of an SIMD Architecture with a Multi-bank Vector Memory Unit.
IEEE Work. on Signal Processing Systems Design and Implementation (SIPS), pp. 71-76.
Page responsible: Zebo Peng
Last updated: 2023-09-04