Lab Assignment 3: Superscalar Processors
Table of Contents
Objective
The purpose of this assignment is to get insight on: a superscalar processor; how it can affect the performance of the system; the performance-cost trade-off; and why different programs benefit differently from a superscalar processor.
Time Allocation
4 hours (2 lab sessions) are allocated for this lab.
Background and additional help
You should review the following resources before you start working on this lab.- Notes from the lecture on superscalar processors
- Chapters 14.4 (on instruction pipelining), 16.1 and 16.2 (on superscalar processors) in the course book
- Section 4.4 from
The SimpleScalar User's Guide to understand more about the
sim-outorder
simulator
Assignments
You need to perform several successive architectural modifications of the superscalar processor with the goal to reduce the complexity of the architecture while avoiding significant performance degradation. For this lab, You will be using thesim-outorder
simulator, which supports out-of-order issue and execution of the instructions.
-
Performance-cost trade-off
For this part, you should use thego.ss
benchmark from~/simplescalar/cde-root
subdirectory. Use2 7
as program arguments. You should use the following parameters to modify the superscalar architecture. Each parameter must have a value of power of two, which is within the allowed range.-decode:width
(allowed range: 1-32)
-issue:width
(allowed range: 1-32)
-commit:width
(allowed range: 1-32)
-ruu:size
(allowed range: 2-512)
-res:ialu
(allowed range: 1-8)
-res:imult
(allowed range: 1-8)
-res:fpalu
(allowed range: 1-8)
-res:fpmult
(allowed range: 1-8)
- What is a good strategy to study the impact of each parameter, individually, on performance (in terms of number of cycles)? You don't want to try all possible combinations of parameter values (which is around 500,000 combinations)!
Hint. Think of changing only one parameter at a time, while setting others to fixed values (what values?). You can create your own configuration file (as instructed in Lab 0), set the above parameters to proper fixed values, and change the value of only one parameter. For example, to use your own configuration file and set decode-width to 8, you can use:
sim-outorder -config <you_config_file> -decode:width 8 go.ss 2 7
- Using scatter type charts, show the impact of each parameter on performance. Make sure Y-axis starts from 0. You can use one chart for decode, issue, and commit; one for ruu; and one for ialu, imult, fpalu, and fpmult.
- Which parameters had the least impacts on performance? Explain the reason.
- Considering the following cost function, which configuration provides the least cost provided that the total number of cycles must not surpass 1,000,000 cycles by more than 2%.
-
cost = (decode + issue + commit + ialu + imult + fpalu + fpmult) x 10 + ruu_size
- Looking at the results, what can you conclude about the ILP degree of the benchmark?
- What is a good strategy to study the impact of each parameter, individually, on performance (in terms of number of cycles)? You don't want to try all possible combinations of parameter values (which is around 500,000 combinations)!
-
Different programs different performance
For this part, you use thepc.ss
benchmark (i.e., pointer chaser) from~/simplescalar/cde-root
subdirectory. Repeat the same procedure using the same strategy and answer the following questions.- Compare the parameter impacts on performance in this benchmark and the previous one. Explain the difference.
- Which configuration does attain the best performance-cost trade-off?
Page responsible: Zebo Peng
Last updated: 2017-11-02