List of Papers
-
Nikos Hardavellas
The Rise and Fall of Dark Silicon
USENIX, 2012.
-
Arkaprava Basu, et al. (AMD Research)
Software Assisted Hardware Cache Coherence for Heterogeneous Processors
MEMSYS, 2016.
-
Whitepaper sponsored by AMD
HSA: A New Architecture for Heterogeneous Computing
TIRIAS research, 2013.
-
Joshua Ho & Ryan Smith
NVIDIA Tegra X1 Preview & Architecture Analysis
anandtech.com, 2015.
-
Ryan Smith
ARM's Mali Midgard Architecture Explored
anandtech.com, 2015.
-
Loi, I; Benini, L
A Multi Banked - Multi Ported - non Blocking Shared L2 Cache for MPSoC Platforms
Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014.
-
Shriraman, A. ; Hongzhou Zhao ; Dwarkadas, S.
An Application-Tailored Approach to Hardware Cache Coherence
Computer, 2013.
-
Branover, A.; Foley, D.; Steinman, M.
AMD FUSION APU: LLANO
IEEE Micro, 2012.
-
Benini, L. ; Flamand, E. ; Fuin, D. ; Melpignano, D.
P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012
-
Pricopi, M; Mitra, T
Bahurupi: A polymorphic heterogeneous multi-core architecture
ACM Transactions on Architecture and Code Optimization (TACO), 2012.
-
Minji Kim ; Jinyong Lee ; Younglok Kim
Fast and flexible pipelined multi-processor architecture for multimedia device
7th International Symposium on Communication Systems Networks and Digital Signal Processing (CSNDSP), 2010
-
Shekofteh, S.K. ; Deldari, H. ; Khalkhali, M.B.
Reducing cache contention in a multi-core processor via a scheduler
3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), 2010
-
Kalla, R. ; Sinharoy, B. ; Starke, W.J. ; Floyd, M.
Power7: IBM's Next-Generation Server Processor
IEEE Micro, 2010.
-
Guron, S
Intel's New AES Instructions for Enhanced Performance and Security
16th International Workshop, Fast Software Encryption (FSE) 2009
-
Tuan, V.M. ; Katsura, N. ; Matsutani, H. ; Amano, H.
Evaluation of a multicore reconfigurable architecture with variable core sizes
IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2009.
-
Al Maashri, A.; Guangyu Sun ; Xiangyu Dong ; Narayanan, V. ; Yuan Xie
3D GPU architecture using cache stacking: Performance, cost, power and thermal analysis
IEEE International Conference on Computer Design (ICCD), 2009.
-
Cohen, J. ; Garland, M.
Novel Architectures: Solving Computational Problems with GPU Computing
Computing in Science & Engineering, 2009
-
Chengming Zou; Chunfen Xia; Guanghui Zhao
Numerical Parallel Processing Based on GPU with CUDA Architecture
International Conference on Wireless Networks and Information Systems (WNIS), 2009
-
Zamith, M.P.M. ; Clua, E.W.G. ; Conci, A. ; Montenegro, A.
Parallel processing between GPU and CPU: Concepts in a game architecture
Computer Graphics, Imaging and Visualisation (CGIV), 2007
-
del Barrio, V.M.; Gonzalez, C. ; Roca, J. ; Fernandez, A. ; Espasa, R.
ATTILA: a cycle-level execution-driven simulator for modern GPU architectures
IEEE International Symposium on Performance Analysis of Systems and Software, 2006
-
Teodorescu, R.; Torrellas, J.
Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors.
35th Intl. Symp. on Computer Architecture (ISCA), pp. 363-374, 2008.
-
Loh, G.H.
3D-Stacked Memory Architectures for Multi-core Processors.
35th Intl. Symp. on Computer Architecture (ISCA), pp. 453-464, 2008.
-
Hankins, R.A.; Chinya, G.N.; Collins, J.D.; Wang, P.H.; Rakvic, R.; Hong Wang; Shen, J.P.
Multiple Instruction Stream Processor
33th Intl. Symp. on Computer Architecture (ISCA), pp. 114-127, 2006.
-
Jichuan Chang; Sohi, G.S.
Cooperative Caching for Chip Multiprocessors
33th Intl. Symp. on Computer Architecture (ISCA), pp. 264-276, 2006.
-
Dybdahl, H.; Stenstrom, P.
An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors
13th Intl. Symp. on High Performance Computer Architecture (HPCA), pp. 2-12, 2007.
-
Alameldeen, A.R.; Wood, D.A.
Interactions Between Compression and Prefetching in Chip Multiprocessors
13th Intl. Symp. on High Performance Computer Architecture (HPCA), pp. 228-239, 2007.
-
Strauss, K., Shen, X., and Torrellas, J. 2006.
Flexible SnoopingAdaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors.
33rd Ann. Intl. Symp. on Computer Architecture (ISCA), pp. 327-338.
-
Jaehyuk Huh, J., Changkyu Kim, C., Shafi, H., Lixin Zhang, L., Burger, D., and Keckler, S.W. 2007.
A NUCA Substrate for Flexible CMP Cache Sharing.
IEEE Trans. Parallel and Distributed Systems 18(8), pp. 1028-1040.
-
Izadi, B.A., and Ozguner, F. 2003.
Enhanced Cluster k-Ary n-Cube, A Fault-Tolerant Multiprocessor.
IEEE Trans. Computers 52 (11), pp. 1443-1453.
-
Hoseok Chang, Junho Cho, and Wonyong Sung. 2006.
Performance Evaluation of an SIMD Architecture with a Multi-bank Vector Memory Unit.
IEEE Work. on Signal Processing Systems Design and Implementation (SIPS), pp. 71-76.
-
Hoare, R., Tung, S., and Werger, K. 2004.
An 88-way Multiprocessor within an FPGA with Customizable Instructions.
18th Intl. Parallel and Distributed Processing Symp., pp. 258-266.
-
Junho Cho, Hoseok Chang, and Wonyong Sung. 2006.
An FPGA Based SIMD Processor with A Vector Memory Unit.
IEEE Intl. Symp. on Circuits and Systems (ISCAS), 4 pp.
-
Taylor, M.D., Lee, W., Amarasinghe, S.P., and Agarwal, A. 2005.
Scalar Operand Networks.
IEEE Trans. Parallel and Distributed Systems 16(2), pp. 145-162.
-
Speight, E., Shafi, H., Lixin Zhang, and Rajamony, R. 2005.
Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors.
32rd Intl. Symp. on Computer Architecture (ISCA), pp. 346-356.
-
Dunigan, T.H., Jr., Vetter, J.S., and Worley, P.H. 2004.
Performance Evaluation of the Cray X1 Distributed Shared Memory Architecture.
12th Ann. IEEE Symp. on High Performance Interconnects, pp. 20-25.
-
Hwa-Joon Oh, Mueller, S.M., Jacobi, C., Tran, K.D., Cottier, S.R., Michael, B.W., Nishikawa, H., Totsuka, Y., Namatame, T., Yano, N.; Machida, T., and Dhong, S.H. 2006.
A Fully Pipelined Single-Precision Floating-Point Unit in the Synergistic Processor Element of a CELL Processor.
IEEE J. of Solid-State Circuits 41(4), pp. 759-771.
-
Lu Peng, Jih-Kwon Peir, Prakash, T.K., Yen-Kuang Chen, and Koppelman, D. 2007.
Memory Performance and Scalability of Intel\92s and AMD\92s Dual-Core ProcessorsA Case Study.
IEEE Intl. Performance, Computing, and Communications Conf., pp. 55-64.
-
Ye, T.T., and De Micheli, G. 2003.
Physical Planning for On-Chip Multiprocessor Networks and Switch Fabrics.
IEEE Intl. Conf. Application-Specific Systems, Architectures, and Processors (ASAP), pp. 97-107.
-
Huang, K., Grunert, D., and Thiele, L. 2007.
Windowed FIFOs for FPGA-based Multiprocessor Systems.
IEEE Intl. Conf. Application-Specific Systems, Architectures, and Processors (ASAP), pp. 36-41.
-
Herbordt, M.C., Cravy, J., and Lin, C. 2003.
Memory Considerations for High Performance SIMD Systems with On-Chip Control.
IEEE Intl. Workshop on Computer Architectures for Machine Perception, 12 pp.
-
Ipek, E., Mutlu, O., Martinez, J.F., and Caruana, R. 2008.
Self-Optimizing Memory ControllersA Reinforcement Learning Approach.
35th Ann. Intl. Symp. on Computer Architecture (ISCA), pp. 39-50.
-
Kang, J.-Y., Gupta, S., and Gaudiot, J.-L. 2008.
An Efficient Data-Distribution Mechanism in a Processor-In-Memory (PIM) Architecture Applied to Motion Estimation.
IEEE Trans. Computers 57(3), pp. 375-388.
-
Murali, S., Atienza, D., Meloni, P., Carta, S., Benini, L., De Micheli, G., and Raffo, L. 2007.
Synthesis of Predictable Networks -on-Chip-Based Interconnect Architectures for Chip Multiprocessors.
IEEE Trans. Very Large Scale Integration (VLSI) Systems 15(8), 869-880.
-
Pasricha, S., Dutt, N.D., and Ben-Romdhane, M. 2007.
BMSYNBus Matrix Communication Architecture Synthesis for MPSoC.
IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems 26(8), pp. 1454-1464.
-
Sun, F., Jha, N.K., Ravi, S., and Raghunathan, A. 2005.
Synthesis of Application-specific Heterogeneous Multiprocessor Architectures using Extensible Processors.
18th Intl. Conf. VLSI Design, pp. 551-556.
-
Bocchi, M., de Dominicis, M., Mucci, C., Deledda, A., Campi, F., Lodi, A., Toma, M., and Guerrieri, R. 2006.
Design and Implementation of a Reconfigurable Heterogeneous Multiprocessor SoC.
IEEE Conf. Custom Integrated Circuits, pp. 93-96.
-
Wun, B., and Crowley, P. 2006.
Network I/O Acceleration in Heterogeneous Multicore Processors.
14th IEEE Symp. on High-Performance Interconnects, pp. 9-14.
-
Bobda, C., and Ahmadinia, A. 2005.
Dynamic Interconnection of Reconfigurable Modules on Reconfigurable Devices.
IEEE Design & Test of Computers 22(5), pp. 443-451.
-
Nava, M.D., Blouet, P., Teninge, P., Coppola, M., Ben-Ismail, T., Picchiottino, S., and Wilson, R. 2005.
An Open Platform for Developing Multiprocessor SoCs.
IEEE Computer 38(7), pp. 60-67.
-
Villa, F.J., Acacio, M.E., and Garcia, J.M. 2006.
On the Evaluation of Dense Chip-Multiprocessor Architectures.
Intl. Conf. Embedded Computer SystemsArchitectures, Modeling and Simulation (IC-SAMOS), pp. 21-27.
-
Taeweon Suh, Daehyun Kim, and Lee, H.-H.S. 2005.
Cache Coherence Support for Non-Shared Bus Architecture on Heterogeneous MPSoCs.
42nd Design Automation Conference (DAC), pp. 553-558.
-
Bonorden, O., Bruls, N., Kastens, U., Dinh Khoi Le, auf der Heide, F.M., Niemann, J.-C., Porrmann, M., Ruckert, U., Slowik, A., and Thies, M. 2003.
A Holistic Methodology for Network Processor Design.
28th Ann. IEEE Intl. Conf. Local Computer Networks, pp. 583-592.
-
Manhee Lee, Minseon Ahn, and Eun Jung Kim. 2007.
I2SEMSInterconnects-Independent Security Enhanced Shared Memory Multiprocessor Systems.
16th Intl. Conf. Parallel Architecture and Compilation Techniques (PACT), pp. 94-103.
-
Monchiero, M., Palermo, G., Silvano, C., and Villa, O. 2006.
Exploration of Distributed Shared Memory Architectures for NoC-based Multiprocessors.
Intl. Conf. Embedded Computer SystemsArchitectures, Modeling and Simulation (IC-SAMOS), pp. 144-151.
-
Jianjun Guo, Mingche Lai, Zhengyuan Pang, Libo Huang, Fangyuan Chen, Kui Dai, and Zhiying Wang. 2008.
Memory System Design for a Multi-core Processor.
Intl. Conf. Complex, Intelligent and Software Intensive Systems (CISIS), pp. 601-606.
-
Lv, T., Ozer, I.B., Chakradhar, S.T., Jiang Xu, Wolf, W., and Henkel, J. 2005.
A Methodology for Architectural Design of Multimedia Multiprocessor SoCs.
IEEE Design & Test of Computers 22(1), pp. 18-26.
-
Ankur Agarwal, Mustafa, M., and Pandya, A.S. 2006.
QOS Driven Network-on-Chip Design for Real Time Systems.
Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1291-1295.
-
Feihui Li, Nicopoulos, C., Richardson, T., Yuan Xie, Narayanan, V., and Kandemir, M.
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory.
33rd Intl. Symp. on Computer Architecture (ISCA), pp. 130-141.
-
Bhuyan, L.N., and Hujun Wang. 2003.
Switch MSHRA Technique to Reduce Remote Read Memory Access Time in CC-NUMA Multiprocessors.
IEEE Trans. Computers 52(5), pp. 617-632.
-
Pitter, C., and Schoeberl, M. 2008.
Performance Evaluation of a Java Chip-Multiprocessor.
Intl. Symp. on Industrial Embedded Systems (SIES), pp. 34-42.
-
Hager, G., Zeiser, T., and Wellein, G. 2008.
Data Access Optimizations for Highly Threaded Multi-Core CPUs with Multiple Memory Controllers.
Intl. Parallel and Distributed Processing Symp., pp. 1-7.
-
Yingmin Li, Lee, B., Brooks, D., Zhigang Hu, and Skadron, K. 2006.
Impact of Thermal Constraints on Multi-Core Architectures.
In Proc. 10th Intersociety Conf. on Thermal and Thermomechanical Phenomena in Electronics Systems, 8 pp.
-
Daewook Kim, Manho Kim, and Sobelman, G.E. 2006.
DCOSCache Embedded Switch Architecture for Distributed Shared Memory Multiprocessor SoCs.
IEEE Intl. Symp. on Circuits and Systems (ISCAS), 4 pp.
-
Beltran, M., and Guzman, A. 2008.
Designing HIPAOCHigh Performance Architecture On Chip.
Intl. Symp. on Industrial Embedded Systems (SIES), pp. 233-236.
-
Godiwala, N., Leonard, J., and Reilly, M. 2008.
A Network Fabric for Scalable Multiprocessor Systems.
16th IEEE Symp. on High Performance Interconnects, pp. 137-144.
-
Berg, E., Zeffer, H., and Hagersten, E. 2006.
A Statistical Multiprocessor Cache Model.
IEEE Intl. Symp. on Performance Analysis of Systems and Software, pp. 89-99.
-
Pham, D.C., et al. 2006.
Overview of the Architecture, Circuit Design, and Physical Implementation of a First-Generation Cell Processor.
IEEE J. of Solid-State Circuits 41(1), pp. 179-196.
-
Liqun Cheng, Muralimanohar, N., Ramani, K., Balasubramonian, R., and Carter, J.B. 2006.
Interconnect-Aware Coherence Protocols for Chip Multiprocessors.
33rd Intl. Symp. on Computer Architecture (ISCA), pp. 339-351.
-
Ramazani, A., Monteiro, E., Dandache, A., and Lepley, B. 2003.
A Methodology to Design a Multimedia Processor Core.
10th IEEE Intl. Conf. Electronics, Circuits and Systems (3), pp. 998-1001.
-
Paver, N.C., Khan, M.H., Aldrich, B.C., and Emmons, C.D. 2003.
Accelerating Mobile Video Applications using Intel Wireless MMX Technology.
IEEE Workshop on Signal Processing Systems, pp. 207-212.
-
Martin, M.M.K., Harper, P.J., Sorin, D.J., Hill, M.D., and Wood, D.A. 2003.
Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors.
30th Ann. Intl. Symp. on Computer Architecture (ISCA), pp. 206- 217.
-
Frachtenberg, E., Petrini, F., Fernandez, J., and Pakin, S. 2006.
STORMScalable Resource Management for Large-Scale Parallel Computers.
IEEE Trans. Computers 55(12), pp. 1572-1587.
-
Feehrer, J., Rotker, P., Shih, M., Gingras, P., Yakutis, P., Phillips, S., Heath, J., and Turullols, S. 2008.
Coherency Hub Design for Multi-Node Victoria Falls Server Systems.
16th IEEE Symp. on High Performance Interconnects, pp. 43-50.
-
Wangyuan Zhang and Tao Li. 2008.
Managing Multi-Core Soft-Error Reliability Through Utility-driven Cross Domain Optimization.
In Proc.IEEE Intl. Conf. Application-Specific Systems, Architectures, and Processors (ASAP), pp. 132-137.
-
Suhendra, V., and Mitra, T. 2008.
Exploring Locking & Partitioning for Predictable Shared Caches on Multi-Cores.
45th Design Automation Conference (DAC), pp. 300-303.
-
Karlsson, M., and Hagersten, E. 2007.
Conserving Memory Bandwidth in Chip Multiprocessors with Runahead Execution.
IEEE Intl. Parallel and Distributed Processing Symp., pp. 1-10.
-
Ozturk, O., Kandemir, M., Chen, G., Irwin, M.J., and Karakoy, M. 2005.
Customized On-Chip Memories for Embedded Chip Multiprocessors.
of the Asia and South Pacific Design Automation Conf. (ASP-DAC), pp. 743-748.
-
Xu, M., Thulasiraman, P., and Thulasiram, R.K. 2008.
Exploiting Data Locality in FFT using Indirect Swap Network on Cell/B.E.
In Proc. 22nd Intl. Symp. on High Performance Computing Systems and Applications (HPCS), pp. 88-94.
-
Dreslinski, R.G., Bo Zhai, Mudge, T., Blaauw, D., and Sylvester, D. 2007.
An Energy Efficient Parallel Architecture Using Near Threshold Operation.
16th Intl. Conf. Parallel Architecture and Compilation Techniques (PACT), pp. 175-188.
-
Dash, A., and Petrov, P. 2006.
Energy-Efficient Cache Coherence for Embedded Multi-Processor Systems through Application-Driven Snoop Filtering.
9th EUROMICRO Conference on Digital System Design (DSD), pp. 79-82.
-
Francesco, P., Antonio, P., and Marchal, P. 2005.
Flexible Hardware/Software Support for Message Passing on a Distributed Shared Memory Architecture.
Design, Automation and Test in Europe (DATE), pp. 736-741.
-
Shadich, R.; McLoughlin, I.V. 2005.
A Scalable Parallel Computational Core for Embedded Processing.
IEEE TENCON Region 10, pp. 1-6.
-
Pasricha, S., Young-Hwan Park, Kurdahi, F.J., and Dutt, N. 2006.
System-Level Power-Performance Trade-Offs in Bus Matrix Communication Architecture Synthesis.
CODES+ISSS, pp. 300-305.
-
Tseng, J.H., Hao Yu, Nagar, S., Dubey, N., Franke, H., and Pattnaik, P. 2007.
Performance Studies of CommercialWorkloads on a Multi-core System.
10thIEEE Intl. Symp. on Workload Characterization, pp. 57-65.
Page responsible: Zebo Peng
Last updated: 2016-12-12