Lab Assignment 1: Cache Memories
Table of Contents
Objective
The purpose of this lab is to understand the functionality of cache memories,
and to get an insight into various trade-offs related to the design of systems with cache memories.
Time allocation
4 hours (2 lab sessions) are allocated for this lab.
Preparation
You should review the following resources before you start working on this lab:
Assignments
-
Cache basics
Solve the following problems and include
the solution in the submitted report.
-
Locality of data
This assignment requires you to use sim-cache
for the simulations on two different architectures.
You need to run two test programs (test1.ss, test2.ss) on each of the two architectures.
The source code and the binaries of the test programs are of the .c and .ss suffix, respectively. The configuration files for simulations are of the .cfg suffix. The source code, the test programs and the configuration files are already available in the simplescalar/cde-root directory - which you copied earlier in Lab 0.
To complete this assignment, please follow the instructions listed below:
-
Look into the configuration files,
cache1.cfg and
cache2.cfg, used for the simulations.
Compare the caches in the configuration files.
-
Report which fragments of the source code (in both test programs) exhibit spatial and temporal locality of the data.
-
Consider the nested loops in both the programs. Further consider all possible combinations of test programs
and cache configurations (i.e. test1.c with cache1.cfg and cache2.cfg, as well as test2.c with cache1.cfg
and cache2.cfg). For each such combination, report the number of times variable "sum" is incremented between
two consecutive cache misses . Find the combination of test program and cache configuration with the best
and the worst miss ratio.
-
Run simulations for each test program (
test1.ss and test2.ss)
using both configuration files (cache1.cfg and cache2.cfg). To
run simulations with given configurations, please follow the instructions in preparations.
Find the combination of the test program and cache configuration with the best and the worst
miss ratio. Verify that your simulation results agree with your answer in the previous
question.
-
Evaluation of cache configurations
This assignment requires you to use sim-cheetah in order to evaluate the performance of several cache configurations.
- Create a configuration file for
sim-cheetah,
so that the simulations cover the following cache configurations:
- caches with a number of sets between 128 and 4096,
- caches with a level of associativity between 1 and 8,
- caches with a line size of 32 bytes and a LRU replacement policy.
ATTENTION: For those arguments that are commented with "log base 2",
you should use the value of power 2 instead of the original value.
For example, if you want to set the line size of the cache to be 64,
you should set 6 instead of 64 in the line with a leading switch -l.
Using the go.ss benchmark,
run the simulation for the three cases in which the cache stores
either only data (data cache), or only intructions (instruction cache), or both (unified cache). To specify whether the cache stores only data, instructions or both, modify the -refs parameter (# reference stream to analyze, i.e., {inst|data|unified}) in the configuartion file.
Here is an example of runnning go.ss on SimpleScalar:
sim-cheetah -config config-file go.ss 3 7
where go.ss is a simple program that makes the computer run the game "Go" against itself using the config-file configuration file. The first parameter (3) indicates the level of the player, while the second parameter (7) sets the go table size.
-
For each of the three simulations, extract the results from the output files,
and create a plot graph which shows how the miss ratio depends on the associativity.
- Answer the following questions.
- What is the relation between the miss ratio and: the associativity and number of sets?
- Assume that "go.ss" has almost the same number of instruction and data accesses. Further assume that
we are poor enough to afford only one cache. Based on your plot graphs, explain which variant of cache
(instruction, data or unified) you will choose to run this application.
- Consider your plot graph for the instruction cache. Assume that a designer can spend
at most 8000 SEK for an instruction cache and the cost to design each byte in
an instruction cache is 0.01 SEK. For instance, with this budget and cost, we can have an
instruction cache with the maximum cache size of 800000 bytes. Now, for the given maximum budget and the given cost-per-byte, find the exact cache configuration that minimizes the following cost function:
cost function = (100000 * miss ratio) + (0.01 * cache size in bytes)
What to report
- Answers (with sufficient explanations) to each of the previous problems. You may use figures when necessary.