Hide menu

Lab Assignment 3: Superscalar Processors

Table of Contents

Objective

The purpose of this assignment is to get insight on: a superscalar processor; how it can affect the performance of the system; the performance/cost trade-off; and why different programs benefit differently from a superscalar processor.

Time Allocation

4 hours (2 lab sessions) are allocated for this lab.

Background and additional help

You should review the following resources before you start working on this lab.
  • Notes from the lecture on superscalar processors
  • Chapters 14.4 (on instruction pipelining), 16.1 and 16.2 (on superscalar processors) in the course book
  • Section 4.4 from The SimpleScalar User's Guide to understand more about the sim-outorder simulator

Assignments

You need to perform several successive architectural modifications of the superscalar processor with the goal to reduce the complexity of the architecture while avoiding significant performance degradation. For this lab, You will be using the sim-outorder simulator, which supports out-of-order issue and execution of the instructions.
  1. Performance/cost trade-off
    For this part, you should use the go.ss benchmark from ~/simplescalar/cde-root subdirectory. Use 2 7 as program arguments. You should use the following parameters to modify the superscalar architecture. Each parameter must have a value of power of two from the allowed range.
      -decode:width (allowed range: 1-32)
      -issue:width (allowed range: 1-32)
      -commit:width (allowed range: 1-32)
      -ruu:size (allowed range: 4-512)
      -res:ialu (allowed range: 1-8)
      -res:imult (allowed range: 1-8)
      -res:fpalu (allowed range: 1-8)
      -res:fpmult (allowed range: 1-8)
    • What is a good strategy to study the impact of each parameter, individually, on performance (in terms of number of cycles)? You don't want to try all possible combinations of parameter values (which is around 500,000 combinations)!

      Hint. Think of changing only one parameter at a time, while setting others to fixed values (what values?). You can create your own configuration file (as instructed in Lab 0), set the above parameters to proper fixed values, and change the value of only one parameter. For example, to use your own configuration file and set decode-width to 8, you can use:

        ./sim-outorder -config <you_config_file> -decode:width 8 go.ss 2 7

    • Note. It is recommended that you check your strategy with your lab assistant before you continue running the commands. It would be easier to run simulations and collect the data via some scripts.

    • Using scatter type charts, show the impact of each parameter on performance. Make sure Y-axis starts from 0. You can use one chart for decode, issue, and commit; one for ruu; and one for ialu, imult, fpalu, and fpmult.
    • Which parameters have the least impacts on performance? Explain the reason.
    • Considering the following cost function, which configuration provides the least cost provided that the total number of cycles must not surpass 1,000,000 cycles by more than 2% (you can run the simulation to check for the satisfiability of the constraint of the number of cycles after determining your configuration).
        cost = (decode + issue + commit + ialu + imult + fpalu + fpmult) x 10 + ruu_size
    • Looking at the results, what can you conclude about the ILP (Instruction-Level Parallelism) degree of the benchmark?
  2. Different programs with different performances
    For this part, you use the pc.ss benchmark (i.e., pointer chaser) from ~/simplescalar/cde-root subdirectory. Repeat the same procedure using the same strategy as in part 1 and plot the charts again. Then answer the following questions.
    • Compare the parameter impacts on performance in this benchmark and the previous one. Explain the difference.
    • Which configuration does attain the best performance/cost trade-off? Ignore non-significant improvements.

Page responsible: Zebo Peng
Last updated: 2023-09-27