Hide menu

Lab Assignment 3: Superscalar Processors

Table of Contents

Objective

The purpose of this assignment is to get insight on: a superscalar processor; how it can affect the performance of the system; the performance-cost trade-off; and why different programs benefit differently from a superscalar processor.

Time Allocation

4 hours (2 lab sessions) are allocated for this lab.

Background and additional help

You should review the following resources before you start working on this lab.
  • Notes from the lecture on superscalar processors
  • Chapters 14.4 (on instruction pipelining), 16.1 and 16.2 (on superscalar processors) in the course book
  • Section 4.4 from The SimpleScalar User's Guide to understand more about the sim-outorder simulator

Assignments

You need to perform several successive architectural modifications of the superscalar processor with the goal to reduce the complexity of the architecture while avoiding significant performance degradation. For this lab, You will be using the sim-outorder simulator, which supports out-of-order issue and execution of the instructions.
  1. Performance-cost trade-off
    For this part, you should use the go.ss benchmark from ~/simplescalar/cde-root subdirectory. Use 2 7 as program arguments. You should use the following parameters to modify the superscalar architecture. Each parameter must have a value of power of two, which is within the allowed range.
      -decode:width (allowed range: 1-32)
      -issue:width (allowed range: 1-32)
      -commit:width (allowed range: 1-32)
      -ruu:size (allowed range: 2-512)
      -res:ialu (allowed range: 1-8)
      -res:imult (allowed range: 1-8)
      -res:fpalu (allowed range: 1-8)
      -res:fpmult (allowed range: 1-8)
    • What is a good strategy to study the impact of each parameter, individually, on performance (in terms of number of cycles)? You don't want to try all possible combinations of parameter values (which is around 500,000 combinations)!

      Hint. Think of changing only one parameter at a time, while setting others to fixed values (what values?). You can create your own configuration file (as instructed in Lab 0), set the above parameters to proper fixed values, and change the value of only one parameter. For example, to use your own configuration file and set decode-width to 8, you can use:

        sim-outorder -config <you_config_file> -decode:width 8 go.ss 2 7
    • Using scatter type charts, show the impact of each parameter on performance. Make sure Y-axis starts from 0. You can use one chart for decode, issue, and commit; one for ruu; and one for ialu, imult, fpalu, and fpmult.
    • Which parameters had the least impacts on performance? Explain the reason.
    • Considering the following cost function, which configuration provides the least cost provided that the total number of cycles must not surpass 1,000,000 cycles by more than 2%.
        cost = (decode + issue + commit + ialu + imult + fpalu + fpmult) x 10 + ruu_size
    • Looking at the results, what can you conclude about the ILP degree of the benchmark?
  2. Different programs different performance
    For this part, you use the pc.ss benchmark (i.e., pointer chaser) from ~/simplescalar/cde-root subdirectory. Repeat the same procedure using the same strategy and answer the following questions.
    • Compare the parameter impacts on performance in this benchmark and the previous one. Explain the difference.
    • Which configuration does attain the best performance-cost trade-off?

Page responsible: Zebo Peng
Last updated: 2017-11-02