#### **Seminar 1**

# **Memory Hierarchy**

# Let us remember the following two slides from Lecture 2-3

# **Cache Organization**

#### Example:

- □ a cache of 64 Kbytes
- data transfer between cache and main memory is in *blocks* of 4 bytes; we say the cache is organized in *lines* of 4 bytes;
- a main memory of 16 Mbytes; each byte is addressable by a 24-bit address (2<sup>24</sup>=16M)
  - the cache consists of 2<sup>14</sup> (16K) lines
  - the main memory consists of 2<sup>22</sup> (4M) blocks

#### **Questions**:

- when we bring a block from main memory into the cache where (in which line) do we put it?
- □ when we look for the content of a certain memory address
  - in which cache line do we look for it?
  - how do we know if we have found the right information (*hit*) or not (*miss*)?

# **Set Associative Mapping**

Two-way set associative cache

- □ 13 bits are needed to identify the cache set
- **22** bits are needed to address a block in main memory
- □ *tag* size is 22 13 = 9 bits



If *miss*, the block is placed in one of the two cache lines in the set corresponding to the 13 bits field in the memory address. The *replacement algorithm* decides which line to use.

A set-associative cache consists of 64 lines, divided into four-line sets. Main memory contains 4K blocks of 128 bytes each. Show the format of the main memory addresses.

A set-associative cache consists of 64 lines, divided into four-line sets. Main memory contains 4K blocks of 128 bytes each. Show the format of the main memory addresses.

Solution:

■ 4K = 2<sup>12</sup> blocks in main memory ⇒ 12 bits are needed to address a block in main memory.

A set-associative cache consists of 64 lines, divided into four-line sets. Main memory contains 4K blocks of 128 bytes each. Show the format of the main memory addresses.

Solution:

- 4K = 2<sup>12</sup> blocks in main memory ⇒ 12 bits are needed to address a block in main memory.
- We have  $64/4 = 16 = 2^4$  sets  $\Rightarrow$  4 bits are needed to identify the cache set.
- *tag* size is 12 4 = 8 bits.

A set-associative cache consists of 64 lines, divided into four-line sets. Main memory contains 4K blocks of 128 bytes each. Show the format of the main memory addresses.

Solution:

- 4K = 2<sup>12</sup> blocks in main memory ⇒ 12 bits are needed to address a block in main memory.
- We have  $64/4 = 16 = 2^4$  sets  $\Rightarrow$  4 bits are needed to identify the cache set.
- tag size is 12 4 = 8 bits.
- Block/line length = 128 = 2<sup>7</sup> ⇒ 7 bits are needed to select a byte out of a block/line.





A two-way set-associative cache has lines of 16 bytes and a total size of 8 Kbytes. The 64-Mbyte main memory is byte addressable. Show the format of main memory addresses.

A two-way set-associative cache has lines of 16 bytes and a total size of 8 Kbytes. The 64-Mbyte main memory is byte addressable. Show the format of main memory addresses.

#### Solution:

 Main memory consists of 2<sup>26</sup>/2<sup>4</sup> = 2<sup>22</sup> blocks ⇒ 22 bits are needed to address a block in main memory.

A two-way set-associative cache has lines of 16 bytes and a total size of 8 Kbytes. The 64-Mbyte main memory is byte addressable. Show the format of main memory addresses.

#### Solution:

- Main memory consists of 2<sup>26</sup>/2<sup>4</sup> = 2<sup>22</sup> blocks ⇒ 22 bits are needed to address a block in main memory.
- We have 2<sup>13</sup>/2<sup>4</sup> = 2<sup>9</sup> lines in the cache; we have 2<sup>9</sup>/2 = 2<sup>8</sup> sets ⇒ 8 bits are needed to identify the cache set.
- *tag* size is 22 8 = 14 bits.

A two-way set-associative cache has lines of 16 bytes and a total size of 8 Kbytes. The 64-Mbyte main memory is byte addressable. Show the format of main memory addresses.

#### Solution:

- Main memory consists of 2<sup>26</sup>/2<sup>4</sup> = 2<sup>22</sup> blocks ⇒ 22 bits are needed to address a block in main memory.
- We have 2<sup>13</sup>/2<sup>4</sup> = 2<sup>9</sup> lines in the cache; we have 2<sup>9</sup>/2 = 2<sup>8</sup> sets ⇒ 8 bits are needed to identify the cache set.
- *tag* size is 22 8 = 14 bits.
- Block/line length = 16 = 2<sup>4</sup> ⇒ 4 bits are needed to select a byte out of a block/ line.



Consider a machine with a byte addressable main memory of 2<sup>16</sup> bytes and block size of 8 bytes. Assume that a direct mapped cache consisting of 32 lines is used.

- a. How is a 16-bit memory address divided into tag, line number, and byte number?
- b. Into what line would bytes with each of the following addresses be stored?
   0001 0001 0001 1011
   1100 0011 0011 0100
   1101 0000 0001 1101
   1010 1010 1010
- c. Suppose the byte with address 0001 1010 0001 1010 is stored in the cache. What are the addresses of the other bytes stored along with it?
- d. How many total bytes can be stored in the cache?
- e. Why is the tag also stored in the cache?

Consider a machine with a byte addressable main memory of 2<sup>16</sup> bytes and block size of 8 bytes. Assume that a direct mapped cache consisting of 32 lines is used.

a.

 Main memory consists of 2<sup>16</sup>/2<sup>3</sup> = 2<sup>13</sup> blocks ⇒ 13 bits are needed to address a block in main memory.

Consider a machine with a byte addressable main memory of 2<sup>16</sup> bytes and block size of 8 bytes. Assume that a direct mapped cache consisting of 32 lines is used.

#### a.

- Main memory consists of 2<sup>16</sup>/2<sup>3</sup> = 2<sup>13</sup> blocks ⇒ 13 bits are needed to address a block in main memory.
- We have 2<sup>5</sup> lines in the cache ⇒ 5 bits are needed to identify the line in the cache.
- *tag* size is 13 5 = 8 bits.

Consider a machine with a byte addressable main memory of 2<sup>16</sup> bytes and block size of 8 bytes. Assume that a direct mapped cache consisting of 32 lines is used.

#### a.

- Main memory consists of 2<sup>16</sup>/2<sup>3</sup> = 2<sup>13</sup> blocks ⇒ 13 bits are needed to address a block in main memory.
- We have 2<sup>5</sup> lines in the cache ⇒ 5 bits are needed to identify the line in the cache.
- tag size is 13 5 = 8 bits.
- **Block/line length = 2^3 \Rightarrow 3 bits are needed to select a byte out of a block/line.**





0001 0001 0001 1011 1100 0011 0011 0100 1101 0000 0001 1101 1010 1010 1010 1010





b.

Datorarkitektur Se 1



#### C.

0001 1010 0001 1010 is stored in line 3

When this byte is loaded into the cache, the whole block of 8 bytes is loaded; the main memory addresses of those 8 bytes are:



d.

Cache size is  $(2^5 \text{ lines}) \times (2^3 \text{ bytes per line}) = 2^8 = 256 \text{ bytes}.$ 

**e.** 

In order to distinguish between the 2<sup>8</sup> different blocks that can be stored in the same cache line (see lecture 2).

**Consider the following code:** 

```
for (i=0; i < 20; i++)
for (j = 0; j < 10; j++)
a[i] = a[i]*(j+1);
```

a. Give one example of the spacial locality in the code.b. Give one example of the temporal locality in the code.

#### **Consider the following code:**

```
for (i=0; i < 20; i++)
for (j = 0; j < 10; j++)
a[i] = a[i]*(j+1);
```

a. Give one example of the spacial locality in the code.b. Give one example of the temporal locality in the code.

#### Solution:

#### a.

- When a[i] is used, in the next iteration of the outer loop a[i+1] will be used.
- After an instruction is executed, immediately the following one is executed.

#### b.

Inside the inner loop, the same a[i] is accessed ten times in a short interval.

Consider an L1 cache with an access time of 1 ns and a hit ratio of H = 0.95. Suppose that we can change the cache design (size of the cache, cache organization) such that we increase H to 0.97, but increase the cache access time to 1.5 ns. What conditions must be met for this change to result in improved performance?

Consider an L1 cache with an access time of 1 ns and a hit ratio of H = 0.95. Suppose that we can change the cache design (size of the cache, cache organization) such that we increase H to 0.97, but increase the cache access time to 1.5 ns. What conditions must be met for this change to result in improved performance?

**Remember from Lecture 2:** 

Average access time  $T_s$ :

$$T_s = H \times T_1 + (1 - H) \times (T_1 + T_2) = T_1 + (1 - H) \times T_2$$

- $T_1$ : Access time to cache
- *T*<sub>2</sub> : Access time to main memory
- *H* : Hit ratio

Consider an L1 cache with an access time of 1 ns and a hit ratio of H = 0.95. Suppose that we can change the cache design (size of the cache, cache organization) such that we increase H to 0.97, but increase the cache access time to 1.5 ns. What conditions must be met for this change to result in improved performance?

$$T_{s} = H \times T_{1} + (1 - H) \times (T_{1} + T_{2}) = T_{1} + (1 - H) \times T_{2}$$

#### **Solution**

$$T_s^1 = 1 + (1 - 0.95) \times T_2 = 1 + 0.05 T_2$$

Consider an L1 cache with an access time of 1 ns and a hit ratio of H = 0.95. Suppose that we can change the cache design (size of the cache, cache organization) such that we increase H to 0.97, but increase the cache access time to 1.5 ns. What conditions must be met for this change to result in improved performance?

$$T_{s} = H \times T_{1} + (1 - H) \times (T_{1} + T_{2}) = T_{1} + (1 - H) \times T_{2}$$

#### **Solution**

$$T_s^1 = 1 + (1 - 0.95) \times T_2 = 1 + 0.05 T_2$$
  
 $T_s^2 = 1.5 + (1 - 0.97) \times T_2 = 1.5 + 0.03 T_2$ 

Consider an L1 cache with an access time of 1 ns and a hit ratio of H = 0.95. Suppose that we can change the cache design (size of the cache, cache organization) such that we increase H to 0.97, but increase the cache access time to 1.5 ns. What conditions must be met for this change to result in improved performance?

$$T_s = H \times T_1 + (1 - H) \times (T_1 + T_2) = T_1 + (1 - H) \times T_2$$

#### **Solution**

$$T_s^2 = 1,5 + (1 - 0,97) \times T_2 = 1,5 + 0,03 T_2$$
  
 $T_s^1 = 1 + (1 - 0,95) \times T_2 = 1 + 0,05 T_2$   
 $1 + 0,05 T_2 > 1,5 + 0,03 T_2 \implies T_2 > 25 \text{ ns}$ 

Increasing hit ratio to 0.97 at the cost of increasing cache access time to 1.5 will improve average memory access time in the case that main memory access time is larger than 25 ns.

A Computer has a cache, main memory, and a disk used for virtual memory. If a referenced word is in the cache, 20ns are required to access it. If it is in main memory but not in the cache, 60 ns are needed to load it into the cache, and then the reference is started again. If the word is not in main memory, 12 ms are required to fetch the word from disk, followed by 60 ns to copy it to the cache, and then the reference is started again. The cache hit ratio is 0.9 and the main memory hit ratio is 0.6. What is the average time in nanoseconds required to access a referenced word on this system?

A Computer has a cache, main memory, and a disk used for virtual memory. If a referenced word is in the cache, 20ns are required to access it. If it is in main memory but not in the cache, 60 ns are needed to load it into the cache, and then the reference is started again. If the word is not in main memory, 12 ms are required to fetch the word from disk, followed by 60 ns to copy it to the cache, and then the reference is started again. The cache hit ratio is 0.9 and the main memory hit ratio is 0.6. What is the average time in nanoseconds required to access a referenced word on this system?

#### **Solution**

Let's generalise the formula from Lecture 2 (see Problem 5) to a three level memory including cache, main memory, and disk:

 $T_{s} = H_{1} \times T_{1} + (1 - H_{1}) \times H_{2} \times (T_{1} + T_{2}) + (1 - H_{1}) \times (1 - H_{2}) \times (T_{1} + T_{2} + T_{3})$ 

- T<sub>1</sub>, T<sub>2</sub>, T<sub>3</sub> : access times to cache, main memory, disk
- H<sub>1</sub>, H<sub>2</sub>: Hit ration to cache, main memory
- H<sub>1</sub>: Probability to hit the cache
- 1 H<sub>1</sub>: Probability to go to main memory
- (1 H<sub>1</sub>) x H<sub>2</sub>: Probability to fetch from main memory
- (1 H<sub>1</sub>) x (1 H<sub>2</sub>): Probability to fetch from disk

A Computer has a cache, main memory, and a disk used for virtual memory. If a referenced word is in the cache, 20ns are required to access it. If it is in main memory but not in the cache, 60 ns are needed to load it into the cache, and then the reference is started again. If the word is not in main memory, 12 ms are required to fetch the word from disk, followed by 60 ns to copy it to the cache, and then the reference is started again. The cache hit ratio is 0.9 and the main memory hit ratio is 0.6. What is the average time in nanoseconds required to access a referenced word on this system?

#### **Solution**

Let's generalise the formula from Lecture 2 (see Problem 5) to a three level memory including cache, main memory, and disk:

 $T_s = H_1 \times T_1 + (1 - H_1) \times H_2 \times (T_1 + T_2) + (1 - H_1) \times (1 - H_2) \times (T_1 + T_2 + T_3)$ 

- **T**<sub>1</sub>, T<sub>2</sub>, T<sub>3</sub> : access times to cache, main memory, disk
- **H**<sub>1</sub>, H<sub>2</sub>: Hit ration to cache, main memory
- H<sub>1</sub>: Probability to hit the cache
- 1 H<sub>1</sub>: Probability to go to main memory
- (1 H<sub>1</sub>) x H<sub>2</sub>: Probability to fetch from main memory
- $(1 H_1) \times (1 H_2)$ : Probability to fetch from disk

 $T_s = 0.9 \times 20 + 0.1 \times 0.6 \times 80 + 0.1 \times 0.4 \times 12000080 = 480026 ns$