Hide menu

Lab Assignment 3: Superscalar Processors

Table of Contents

Objective

The purpose of this assignment is to get insight on: a superscalar processor; how it can affect the performance of the system; the performance-cost trade-off; and why different programs benefit differently from a superscalar processor.

Time Allocation

4 hours (2 lab sessions) are allocated for this lab.

Background and additional help

You should review the following resources before you start working on this lab.
  • Notes from the lecture on superscalar processors
  • Chapters 14.4 (on instruction pipelining), 16.1 and 16.2 (on superscalar processors) in the course book
  • Section 4.4 from The SimpleScalar User's Guide to understand more about the sim-outorder simulator

Assignments

You need to perform several successive architectural modifications of the superscalar processor with the goal to reduce the complexity of the architecture while avoiding significant performance degradation. For this lab, You will be using the sim-outorder simulator, which supports out-of-order issue and execution of the instructions.
  1. Performance/cost trade-off
    For this part, you should use the go.ss benchmark from ~/simplescalar/cde-root subdirectory. Use 2 7 as program arguments. You should use the following parameters to modify the superscalar architecture. Each parameter must have a value of power of two from the allowed range. Use superscalar.cfg configuration file (from ~/simplescalar/cde-root subdirectory) as the base configuration.
      -decode:width (allowed range: 1-32) (maximum number of instructions that are decoded in a cycle)
      -issue:width (allowed range: 1-32) (maximum number of instructions that are issued in a cycle)
      -commit:width (allowed range: 1-32) (maximum number of instructions that are commited in a cycle)
      -res:ialu (allowed range: 1-8) (number of integer ALUs)
      -res:imult (allowed range: 1-8) (number of integer multipliers/dividers)
      -res:fpalu (allowed range: 1-8) (number of floating point ALUs)
      -res:fpmult (allowed range: 1-8) (number of floating point multipliers/dividers)
    • What is a good strategy to study the impact of each parameter, individually, on performance (in terms of number of cycles)? You don't want to try all possible combinations of parameter values (which is around 500,000 combinations)!

      Hint. Think of changing only one parameter at a time, while setting other parameters in the configuration file to fixed values (what values?). For example, you can use superscalar.cfg as the configuration file to set the values of all parameters and then change the value of decode-width to 8 using the following command:

        ./sim-outorder -config superscalar.cfg -decode:width 8 go.ss 2 7

      Note. It is recommended that you check your strategy with your lab assistant before you continue running the commands.

    • Using scatter type charts, show the impact of each parameter on performance. Make sure Y-axis starts from 0. You can use one chart for decode, issue, and commit; and one for ialu, imult, fpalu, and fpmult.
    • Which parameters have the least impacts on performance? Explain the reason.
    • Considering the following cost function, which configuration provides the least cost provided that the total number of cycles must not surpass 1,000,000 by more than 2%?
        cost = (decode + issue + commit + ialu + imult + fpalu + fpmult)
    • Looking at the results, what can you conclude about the ILP (Instruction Level Parallelism) degree of the benchmark?
  2. Different programs with different performances
    For this part, you use the pc.ss benchmark (i.e., pointer chaser) from ~/simplescalar/cde-root subdirectory. Repeat the same procedure using the same strategy as in part 1 and plot the charts again. Then answer the following questions.
    • Compare the parameter impacts on performance in this benchmark with the previous one. Explain the difference with respect to ILP degree.
    • Which configuration does attain the best performance/cost trade-off?

What to report

  • Answers (with sufficient explanations) to each of the previous problems.

  • Page responsible: Zebo Peng
    Last updated: 2023-10-31