List of Papers
- AMD FUSION APU: LLANO, added 2014
- A Multi Banked - Multi Ported - non Blocking Shared L2 Cache for MPSoC Platforms , added 2014
- P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator, added 2012
- Bahurupi: A polymorphic heterogeneous multi-core architecture, added 2012
- Intel's New AES Instructions for Enhanced Performance and Security, added 2011
- Evaluation of a multicore reconfigurable architecture with variable core sizes, added 2010
- Fast and flexible pipelined multi-processor architecture for multimedia device, added 2010
- Reducing cache contention in a multi-core processor via a scheduler, added 2010
- Power7: IBM's Next-Generation Server Processor, added 2010
- 3D GPU architecture using cache stacking: Performance, cost, power and thermal analysis, added 2010
- Novel Architectures: Solving Computational Problems with GPU Computing, 2009
- Numerical Parallel Processing Based on GPU with CUDA Architecture, 2009
- Parallel processing between GPU and CPU: Concepts in a game architecture, 2007
- ATTILA: a cycle-level execution-driven simulator for modern GPU architectures, 2006
-
Teodorescu, R.; Torrellas, J.
Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors.
35th Intl. Symp. on Computer Architecture (ISCA), pp. 363-374, 2008.
-
Loh, G.H.
3D-Stacked Memory Architectures for Multi-core Processors.
35th Intl. Symp. on Computer Architecture (ISCA), pp. 453-464, 2008.
-
Donald, J.; Martonosi, M.
Techniques for Multicore Thermal ManagementClassification and New Exploration
33th Intl. Symp. on Computer Architecture (ISCA), pp. 78-88, 2006.
-
Hankins, R.A.; Chinya, G.N.; Collins, J.D.; Wang, P.H.; Rakvic, R.; Hong Wang; Shen, J.P.
Multiple Instruction Stream Processor
33th Intl. Symp. on Computer Architecture (ISCA), pp. 114-127, 2006.
-
Jichuan Chang; Sohi, G.S.
Cooperative Caching for Chip Multiprocessors
33th Intl. Symp. on Computer Architecture (ISCA), pp. 264-276, 2006.
-
Van Meter, R.; Munro, W.J.; Nemoto, K.; Itoh, K.M.
Distributed Arithmetic on a Quantum Multicomputer
33th Intl. Symp. on Computer Architecture (ISCA), pp. 354-365, 2006.
-
Dybdahl, H.; Stenstrom, P.
An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors
13th Intl. Symp. on High Performance Computer Architecture (HPCA), pp. 2-12, 2007.
-
Alameldeen, A.R.; Wood, D.A.
Interactions Between Compression and Prefetching in Chip Multiprocessors
13th Intl. Symp. on High Performance Computer Architecture (HPCA), pp. 228-239, 2007.
-
Strauss, K., Shen, X., and Torrellas, J. 2006.
Flexible SnoopingAdaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors.
33rd Ann. Intl. Symp. on Computer Architecture (ISCA), pp. 327-338.
-
Rabah, M., and Kanoun, K. 2003.
Performability Evaluation of Multipurpose Multiprocessor SystemsThe “Separation of Concerns” Approach.
IEEE Trans. Computers 52(2), pp. 223-236.
-
Jaehyuk Huh, J., Changkyu Kim, C., Shafi, H., Lixin Zhang, L., Burger, D., and Keckler, S.W. 2007.
A NUCA Substrate for Flexible CMP Cache Sharing.
IEEE Trans. Parallel and Distributed Systems 18(8), pp. 1028-1040.
-
Izadi, B.A., and Ozguner, F. 2003.
Enhanced Cluster k-Ary n-Cube, A Fault-Tolerant Multiprocessor.
IEEE Trans. Computers 52 (11), pp. 1443-1453.
-
Hoseok Chang, Junho Cho, and Wonyong Sung. 2006.
Performance Evaluation of an SIMD Architecture with a Multi-bank Vector Memory Unit.
IEEE Work. on Signal Processing Systems Design and Implementation (SIPS), pp. 71-76.
-
Kaneko, S., et al. 2004.
A 600-MHz Single-Chip Multiprocessor With 4.8-GB/s Internal Shared Pipelined Bus and 512-kB Internal Memory.
IEEE J. of Solid- State Circuits 39(1), pp. 184-193.
-
Hoare, R., Tung, S., and Werger, K. 2004.
An 88-way Multiprocessor within an FPGA with Customizable Instructions.
18th Intl. Parallel and Distributed Processing Symp., pp. 258-266.
-
Junho Cho, Hoseok Chang, and Wonyong Sung. 2006.
An FPGA Based SIMD Processor with A Vector Memory Unit.
IEEE Intl. Symp. on Circuits and Systems (ISCAS), 4 pp.
-
Taylor, M.D., Lee, W., Amarasinghe, S.P., and Agarwal, A. 2005.
Scalar Operand Networks.
IEEE Trans. Parallel and Distributed Systems 16(2), pp. 145-162.
-
Speight, E., Shafi, H., Lixin Zhang, and Rajamony, R. 2005.
Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors.
32rd Intl. Symp. on Computer Architecture (ISCA), pp. 346-356.
-
Dunigan, T.H., Jr., Vetter, J.S., and Worley, P.H. 2004.
Performance Evaluation of the Cray X1 Distributed Shared Memory Architecture.
12th Ann. IEEE Symp. on High Performance Interconnects, pp. 20-25.
-
Cvetanovic, Z. 2003.
Performance Analysis of the Alpha 21364-based HP GS1280 Multiprocessor.
30th Ann. Intl. Symp. on Computer Architecture (ISCA), pp. 218-228.
-
Hwa-Joon Oh, Mueller, S.M., Jacobi, C., Tran, K.D., Cottier, S.R., Michael, B.W., Nishikawa, H., Totsuka, Y., Namatame, T., Yano, N.; Machida, T., and Dhong, S.H. 2006.
A Fully Pipelined Single-Precision Floating-Point Unit in the Synergistic Processor Element of a CELL Processor.
IEEE J. of Solid-State Circuits 41(4), pp. 759-771.
-
Lu Peng, Jih-Kwon Peir, Prakash, T.K., Yen-Kuang Chen, and Koppelman, D. 2007.
Memory Performance and Scalability of Intel’s and AMD’s Dual-Core ProcessorsA Case Study.
IEEE Intl. Performance, Computing, and Communications Conf., pp. 55-64.
-
Ye, T.T., and De Micheli, G. 2003.
Physical Planning for On-Chip Multiprocessor Networks and Switch Fabrics.
IEEE Intl. Conf. Application-Specific Systems, Architectures, and Processors (ASAP), pp. 97-107.
-
Martin, M.M.K., Hill, M.D., and Wood, D.A. 2003.
Token CoherenceDecoupling Performance and Correctness.
30th Ann. Intl. Symp. on Computer Architecture (ISCA), pp. 182-193.
-
Kodi, A.K., and Louri, A. 2004.
A Scalable Architecture for Distributed Shared Memory Multiprocessors using Optical Interconnects.
18th Intl. Parallel and Distributed Processing Symp. pp. 11-21.
-
Huang, K., Grunert, D., and Thiele, L. 2007.
Windowed FIFOs for FPGA-based Multiprocessor Systems.
IEEE Intl. Conf. Application-Specific Systems, Architectures, and Processors (ASAP), pp. 36-41.
-
Herbordt, M.C., Cravy, J., and Lin, C. 2003.
Memory Considerations for High Performance SIMD Systems with On-Chip Control.
IEEE Intl. Workshop on Computer Architectures for Machine Perception, 12 pp.
-
Ipek, E., Mutlu, O., Martinez, J.F., and Caruana, R. 2008.
Self-Optimizing Memory ControllersA Reinforcement Learning Approach.
35th Ann. Intl. Symp. on Computer Architecture (ISCA), pp. 39-50.
-
Kang, J.-Y., Gupta, S., and Gaudiot, J.-L. 2008.
An Efficient Data-Distribution Mechanism in a Processor-In-Memory (PIM) Architecture Applied to Motion Estimation.
IEEE Trans. Computers 57(3), pp. 375-388.
-
Murali, S., Atienza, D., Meloni, P., Carta, S., Benini, L., De Micheli, G., and Raffo, L. 2007.
Synthesis of Predictable Networks -on-Chip-Based Interconnect Architectures for Chip Multiprocessors.
IEEE Trans. Very Large Scale Integration (VLSI) Systems 15(8), 869-880.
-
Pasricha, S., Dutt, N.D., and Ben-Romdhane, M. 2007.
BMSYNBus Matrix Communication Architecture Synthesis for MPSoC.
IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems 26(8), pp. 1454-1464.
-
Sun, F., Jha, N.K., Ravi, S., and Raghunathan, A. 2005.
Synthesis of Application-specific Heterogeneous Multiprocessor Architectures using Extensible Processors.
18th Intl. Conf. VLSI Design, pp. 551-556.
-
Bocchi, M., de Dominicis, M., Mucci, C., Deledda, A., Campi, F., Lodi, A., Toma, M., and Guerrieri, R. 2006.
Design and Implementation of a Reconfigurable Heterogeneous Multiprocessor SoC.
IEEE Conf. Custom Integrated Circuits, pp. 93-96.
-
Wun, B., and Crowley, P. 2006.
Network I/O Acceleration in Heterogeneous Multicore Processors.
14th IEEE Symp. on High-Performance Interconnects, pp. 9-14.
-
Gohringer, D., Hubner, M., Schatz, V., and Becker, J. 2008.
Runtime Adaptive Multi-Processor System-on-ChipRAMPSoC.
IEEE Intl. Symp. on Parallel and Distributed Processing (IPDPS), pp. 1-7.
-
Shacham, A., Bergman, K., and Carloni, L.P. 2007.
On the Design of a Photonic Network-on-Chip.
First Intl. Symp. on Networks-on-Chip (NOCS), pp. 53-64.
-
Bobda, C., and Ahmadinia, A. 2005.
Dynamic Interconnection of Reconfigurable Modules on Reconfigurable Devices.
IEEE Design & Test of Computers 22(5), pp. 443-451.
-
Nava, M.D., Blouet, P., Teninge, P., Coppola, M., Ben-Ismail, T., Picchiottino, S., and Wilson, R. 2005.
An Open Platform for Developing Multiprocessor SoCs.
IEEE Computer 38(7), pp. 60-67.
-
Villa, F.J., Acacio, M.E., and Garcia, J.M. 2006.
On the Evaluation of Dense Chip-Multiprocessor Architectures.
Intl. Conf. Embedded Computer SystemsArchitectures, Modeling and Simulation (IC-SAMOS), pp. 21-27.
-
Taeweon Suh, Daehyun Kim, and Lee, H.-H.S. 2005.
Cache Coherence Support for Non-Shared Bus Architecture on Heterogeneous MPSoCs.
42nd Design Automation Conference (DAC), pp. 553-558.
-
Bonorden, O., Bruls, N., Kastens, U., Dinh Khoi Le, auf der Heide, F.M., Niemann, J.-C., Porrmann, M., Ruckert, U., Slowik, A., and Thies, M. 2003.
A Holistic Methodology for Network Processor Design.
28th Ann. IEEE Intl. Conf. Local Computer Networks, pp. 583-592.
-
Manhee Lee, Minseon Ahn, and Eun Jung Kim. 2007.
I2SEMSInterconnects-Independent Security Enhanced Shared Memory Multiprocessor Systems.
16th Intl. Conf. Parallel Architecture and Compilation Techniques (PACT), pp. 94-103.
-
Monchiero, M., Palermo, G., Silvano, C., and Villa, O. 2006.
Exploration of Distributed Shared Memory Architectures for NoC-based Multiprocessors.
Intl. Conf. Embedded Computer SystemsArchitectures, Modeling and Simulation (IC-SAMOS), pp. 144-151.
-
Luo, Y., Laxmi Narayan Bhuyan, and Chen, X. 2003.
Shared Memory Multiprocessor Architectures for Software IP Routers.
IEEE Trans. Parallel and Distributed Systems 14(12), pp. 1240-1249.
-
Jianjun Guo, Mingche Lai, Zhengyuan Pang, Libo Huang, Fangyuan Chen, Kui Dai, and Zhiying Wang. 2008.
Memory System Design for a Multi-core Processor.
Intl. Conf. Complex, Intelligent and Software Intensive Systems (CISIS), pp. 601-606.
-
Lv, T., Ozer, I.B., Chakradhar, S.T., Jiang Xu, Wolf, W., and Henkel, J. 2005.
A Methodology for Architectural Design of Multimedia Multiprocessor SoCs.
IEEE Design & Test of Computers 22(1), pp. 18-26.
-
Ankur Agarwal, Mustafa, M., and Pandya, A.S. 2006.
QOS Driven Network-on-Chip Design for Real Time Systems.
Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1291-1295.
-
Feihui Li, Nicopoulos, C., Richardson, T., Yuan Xie, Narayanan, V., and Kandemir, M.
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory.
33rd Intl. Symp. on Computer Architecture (ISCA), pp. 130-141.
-
Bhuyan, L.N., and Hujun Wang. 2003.
Switch MSHRA Technique to Reduce Remote Read Memory Access Time in CC-NUMA Multiprocessors.
IEEE Trans. Computers 52(5), pp. 617-632.
-
Pitter, C., and Schoeberl, M. 2008.
Performance Evaluation of a Java Chip-Multiprocessor.
Intl. Symp. on Industrial Embedded Systems (SIES), pp. 34-42.
-
Hager, G., Zeiser, T., and Wellein, G. 2008.
Data Access Optimizations for Highly Threaded Multi-Core CPUs with Multiple Memory Controllers.
Intl. Parallel and Distributed Processing Symp., pp. 1-7.
-
Yingmin Li, Lee, B., Brooks, D., Zhigang Hu, and Skadron, K. 2006.
Impact of Thermal Constraints on Multi-Core Architectures.
In Proc. 10th Intersociety Conf. on Thermal and Thermomechanical Phenomena in Electronics Systems, 8 pp.
-
Daewook Kim, Manho Kim, and Sobelman, G.E. 2006.
DCOSCache Embedded Switch Architecture for Distributed Shared Memory Multiprocessor SoCs.
IEEE Intl. Symp. on Circuits and Systems (ISCAS), 4 pp.
-
Kumar, R., Zyuban, V., and Tullsen, D.M. 2005.
Interconnections in Multi-core ArchitecturesUnderstanding Mechanisms, Overheads and Scaling.
32nd Intl. Symp. on Computer Architecture (ISCA), pp. 408-419.
-
Beltran, M., and Guzman, A. 2008.
Designing HIPAOCHigh Performance Architecture On Chip.
Intl. Symp. on Industrial Embedded Systems (SIES), pp. 233-236.
-
Godiwala, N., Leonard, J., and Reilly, M. 2008.
A Network Fabric for Scalable Multiprocessor Systems.
16th IEEE Symp. on High Performance Interconnects, pp. 137-144.
-
Berg, E., Zeffer, H., and Hagersten, E. 2006.
A Statistical Multiprocessor Cache Model.
IEEE Intl. Symp. on Performance Analysis of Systems and Software, pp. 89-99.
-
Pham, D.C., et al. 2006.
Overview of the Architecture, Circuit Design, and Physical Implementation of a First-Generation Cell Processor.
IEEE J. of Solid-State Circuits 41(1), pp. 179-196.
-
Noda, H., et al. 2007.
The Circuits and Robust Design Methodology of the Massively Parallel Processor Based on the Matrix Architecture.
IEEE J. of Solid-State Circuits 42(4), pp. 804-812.
-
Wang, X., and Ziavras, S.G. 2006.
Exploiting Mixed-Mode Parallelism for Matrix Operations on the HERA Architecture through Reconfiguration.
IEE Computers and Digital Techniques 153(4), 249-260.
-
Chai, Lei, Gao, Qi, and Panda, D.K. 2007.
Understanding the Impact of Multi-Core Architecture in Cluster ComputingA Case Study with Intel Dual-Core System.
7thIEEE Intl. Symp. on Cluster Computing and the Grid, pp. 471-478.
-
Liqun Cheng, Muralimanohar, N., Ramani, K., Balasubramonian, R., and Carter, J.B. 2006.
Interconnect-Aware Coherence Protocols for Chip Multiprocessors.
33rd Intl. Symp. on Computer Architecture (ISCA), pp. 339-351.
-
Sinha, P., Sinha, A., and Basu, D. 2005.
A Reconfigurable “SFMD Architecture” For a Class of Signal Processing Applications.
In Proc. 7th IEEE CAS Symp. on Emerging TechnologiesCircuits and Systems for 4G Mobile Wireless Communications, pp. 46-49.
-
Ramazani, A., Monteiro, E., Dandache, A., and Lepley, B. 2003.
A Methodology to Design a Multimedia Processor Core.
10th IEEE Intl. Conf. Electronics, Circuits and Systems (3), pp. 998-1001.
-
Paver, N.C., Khan, M.H., Aldrich, B.C., and Emmons, C.D. 2003.
Accelerating Mobile Video Applications using Intel Wireless MMX Technology.
IEEE Workshop on Signal Processing Systems, pp. 207-212.
-
Martin, M.M.K., Harper, P.J., Sorin, D.J., Hill, M.D., and Wood, D.A. 2003.
Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors.
30th Ann. Intl. Symp. on Computer Architecture (ISCA), pp. 206- 217.
-
Gold, B.T., Kim, J., Smolens, J.C., Chung, E.S., Liaskovitis, V., Nurvitadhi, E., Falsafi, B., Hoe, J.C., and Nowatzyk, A.G. 2005.
TRUSSa Reliable, Scalable Server Architecture.
IEEE Micro 25(6), pp. 51-59.
-
Frachtenberg, E., Petrini, F., Fernandez, J., and Pakin, S. 2006.
STORMScalable Resource Management for Large-Scale Parallel Computers.
IEEE Trans. Computers 55(12), pp. 1572-1587.
-
Feehrer, J., Rotker, P., Shih, M., Gingras, P., Yakutis, P., Phillips, S., Heath, J., and Turullols, S. 2008.
Coherency Hub Design for Multi-Node Victoria Falls Server Systems.
16th IEEE Symp. on High Performance Interconnects, pp. 43-50.
-
Wangyuan Zhang and Tao Li. 2008.
Managing Multi-Core Soft-Error Reliability Through Utility-driven Cross Domain Optimization.
In Proc.IEEE Intl. Conf. Application-Specific Systems, Architectures, and Processors (ASAP), pp. 132-137.
-
Suhendra, V., and Mitra, T. 2008.
Exploring Locking & Partitioning for Predictable Shared Caches on Multi-Cores.
45th Design Automation Conference (DAC), pp. 300-303.
-
Schoeberl, M. 2007.
A Time-Triggered Network-on-Chip.
Intl. Conf. Field Programmable Logic and Applications (FPL), pp. 377-382.
-
Iqbal, M.M. 2008.
Morero Cluster of Workstations (COW) First Practical Approach towards Home Grown Supercomputers in Pakistan.
In Proc. 2nd Intl. Conf. Electrical Engineering, pp. 1-5.
-
Karlsson, M., and Hagersten, E. 2007.
Conserving Memory Bandwidth in Chip Multiprocessors with Runahead Execution.
IEEE Intl. Parallel and Distributed Processing Symp., pp. 1-10.
-
Ozturk, O., Kandemir, M., Chen, G., Irwin, M.J., and Karakoy, M. 2005.
Customized On-Chip Memories for Embedded Chip Multiprocessors.
of the Asia and South Pacific Design Automation Conf. (ASP-DAC), pp. 743-748.
-
Xu, M., Thulasiraman, P., and Thulasiram, R.K. 2008.
Exploiting Data Locality in FFT using Indirect Swap Network on Cell/B.E.
In Proc. 22nd Intl. Symp. on High Performance Computing Systems and Applications (HPCS), pp. 88-94.
-
Jizhu Lu, Perrone, M., Albayraktaroglu, K., and Franklin, M. 2008.
HMMer-CellHigh Performance Protein Profile Searching on the Cell/B.E. Processor.
IEEE Intl. Symp. on Performance Analysis of Systems and Software, pp. 223-232.
-
Dreslinski, R.G., Bo Zhai, Mudge, T., Blaauw, D., and Sylvester, D. 2007.
An Energy Efficient Parallel Architecture Using Near Threshold Operation.
16th Intl. Conf. Parallel Architecture and Compilation Techniques (PACT), pp. 175-188.
-
Dash, A., and Petrov, P. 2006.
Energy-Efficient Cache Coherence for Embedded Multi-Processor Systems through Application-Driven Snoop Filtering.
9th EUROMICRO Conference on Digital System Design (DSD), pp. 79-82.
-
Francesco, P., Antonio, P., and Marchal, P. 2005.
Flexible Hardware/Software Support for Message Passing on a Distributed Shared Memory Architecture.
Design, Automation and Test in Europe (DATE), pp. 736-741.
-
Shadich, R.; McLoughlin, I.V. 2005.
A Scalable Parallel Computational Core for Embedded Processing.
IEEE TENCON Region 10, pp. 1-6.
-
Pasricha, S., Young-Hwan Park, Kurdahi, F.J., and Dutt, N. 2006.
System-Level Power-Performance Trade-Offs in Bus Matrix Communication Architecture Synthesis.
CODES+ISSS, pp. 300-305.
-
Tseng, J.H., Hao Yu, Nagar, S., Dubey, N., Franke, H., and Pattnaik, P. 2007.
Performance Studies of CommercialWorkloads on a Multi-core System.
10thIEEE Intl. Symp. on Workload Characterization, pp. 57-65.
-
Viswanath, V. 2004.
Multi-log Processor – Towards Scalable Event-Driven Multiprocessors.
Euromicro Symp. on Digital System Design (DSD), pp. 279-286.
Page responsible: Zebo Peng
Last updated: 2014-10-29