### **Datorarkitektur**

#### (Advanced Computer Architecture)

Petru Eles Institutionen för Datavetenskap (IDA) Linköpings Universitet

> email: petru.eles@liu.se http://www.ida.liu.se/~petel phone:013 281396 B building, room 329:220

#### **Course Information**

<u>Web page</u>: http://www.ida.liu.se/~TDDI03

Examination: written, January 13th, 2025, kl. 14 - 18.

**Lecture notes:** available from the web page, latest 24 hours before the lecture.

<u>Text book</u>: William Stallings: *Computer Organization and Architecture*, Pearson/Prentice Hall, 9th edition, 2013 or 10th edition, 2016 or 11th edition, 2021.

In addition to lectures: Two big seminars!

# **Preliminary Course Plan**

#### Lecture 1

*Introduction*: Outline, Basic computer architecture and organization, Basic functions of a computer and its main components, The von Neumann architecture.

This is to refresh your memory!!!

#### Lectures 2 and 3

*The Memory System*: Memory hierarchy, Cache memories, Virtual memories, Memory management.

#### Lectures 4 and 5

*Instruction Pipelining*: Organization of pipelined units, Pipeline hazards, Reducing branch penalties, Branch prediction strategies.

#### Lectures 6

*RISC Architectures*: An analysis of instruction execution for code generated from high-level language programs, Compiling for RISC architectures, Main characteristics of RISC architectures, RISC-CISC trade-offs.

## **Preliminary Course Plan**

#### Lectures 7 and 8

Superscalar Architectures: Instruction level parallelism and machine parallelism, Hardware techniques for performance enhancement, Data dependencies, Policies for parallel instruction execution, Limitations of the superscalar approach.

#### Lectures 9 and 10

*VLIW Architectures*: The VLIW approach - advantages and limitations. Compiling for VLIW architectures. The Merced (Itanium) architecture.

#### Lectures 11 and 12

Architectures for Parallel Computation: Parallel programs, Performance of parallel computers, A classification of computer architectures, Array processors, Multiprocessors, Multicomputers, Vector processors. Multiprocessors on chip. Multithreaded processors. General purpose graphic processors.

#### COMPUTER ARCHITECTURE (BASIC ISSUES)

- **1. What is a Computer/Computer System?**
- 2. The von Neumann Architecture
- **3. Application Specific vs. General-Purpose**
- 4. Representation of Data and Instructions
- **5. Instruction Execution**
- 6. The Control Unit
- 7. The Computer System
- 8. Main and Secondary Memory

#### 9. The Intel x86 and ARM Families





A computer is a data processing machine which is operated <u>automatically</u> under the control of a list of <u>instructions</u> (called a program) stored in its main memory.





Besides the "core" we also have the peripherals.
 <u>Computer peripherals</u> include input devices, output devices, and secondary memories. That makes the whole to a *Computer System*.















#### **Input/Output Devices**

They connect the computer to the external world. Connection is via controllers/adaptors.



### How Does It Work? The von Neumann Architecture

- All computers in use, simple or complicated, big or small, cheap or expensive work according to the same basic concept, known as the <u>von Neumann architecture</u>:
  - Data and instructions are both stored in the main memory (stored program concept);
  - The content of the memory is addressable by location (without regard to what is stored in that location);
  - Instructions are executed sequentially (from one instruction to the next, in order of their location in memory) unless the order is explicitly modified.

#### CPU



The basic von Neumann organization (architecture):

- Central processing unit (CPU) contains:
  - Control unit (CU) that coordinates the execution of instructions;
  - Arithmetic/logic unit (ALU) that performs arithmetic and logic operations;
  - □ A set of registers.
- Main memory.



#### **Register Organization**

- The set of registers within the CPU represents the top level of the memory hierarchy inside the computer system:
  - <u>User visible registers</u>: can be accessed by programs, for data storing.
  - <u>Control and Status registers</u>: used by the Control Unit to control the operation of the CPU; not directly accessible by the programmer.

#### CPU



#### **User Visible Registers**

A set of registers which can be used without restrictions as operands for any operation and as address registers; these are so called general-purpose registers.

#### CPU



#### **Control and Status Registers**

- Program Counter (PC): holds the address of the instruction to be fetched and executed.
- Instruction Register (IR): holds the last instruction fetched.
- <u>Program Status Word</u> (PSW): Condition
  Code Flags + other bits defining the status of the CPU.

#### CPU



#### Arithmetic Logic Unit (ALU)

Performs arithmetic and logic operations.
 There might be several of them in a CPU.
 ALUs are different, depending on the data type they operate on: integer ALU, floating point ALU, etc.

#### CPU



#### Arithmetic Logic Unit (ALU)

Performs arithmetic and logic operations.
 There might be several of them in a CPU.
 ALUs are different, depending on the data type they operate on: integer ALU, floating point ALU, etc.

#### **Control Unit**

- The control unit generates the appropriate signals such that all other components of the CPU and the computer system, together, execute the current instruction.
- The current instruction to execute is stored in the instruction register (IR); it is the instruction whose memory address is stored in the program counter (PC)

### **Representation of Data**

- Inside a computer, data and control information (instructions) are all represented in binary format which uses only two basic symbols: "0" and "1".
- The two basic symbols are represented by electronics signals.

Numeric data are represented using the binary system, in which the positional values are powers of 2:

$$100101 = 1*2^{0} + 0*2^{1} + 1*2^{2} + 0*2^{3} + 0*2^{4} + 1*2^{5}$$
  
10110 = 0\*2<sup>0</sup> + 1\*2<sup>1</sup> + 1\*2<sup>2</sup> + 0\*2<sup>3</sup> + 1\*2<sup>4</sup>

Binary numbers are added, subtracted, multiplied and divided (by the ALU) directly; it is not needed to convert them to decimal numbers first.
 100101 + 10110 = 111011

1

- A CPU can only execute *machine instructions*,
- Each computer has a set of specific machine instructions which its CPU is able to recognize and execute.

- A CPU can only execute *machine instructions*,
- Each computer has a set of specific machine instructions which its CPU is able to recognize and execute.

A machine instruction is represented as a sequence of bits (binary digits). These bits are organized into *fields* that define:

- A CPU can only execute *machine instructions*,
- Each computer has a set of specific machine instructions which its CPU is able to recognize and execute.



- A machine instruction is represented as a sequence of bits (binary digits). These bits are organized into *fields* that define:
  - □ What has to be done (the operation code).

- A CPU can only execute *machine instructions*,
- Each computer has a set of specific machine instructions which its CPU is able to recognize and execute.



- A machine instruction is represented as a sequence of bits (binary digits).
  These bits are organized into *fields* that define:
  - What has to be done (the operation code).
  - **To whom the operation applies (source operands).**

- A CPU can only execute *machine instructions*,
- Each computer has a set of specific machine instructions which its CPU is able to recognize and execute.



- A machine instruction is represented as a sequence of bits (binary digits).
  These bits are organized into *fields* that define:
  - What has to be done (the operation code).
  - **To whom the operation applies (source operands).**
  - Where does the result go (destination operand); in this example CPU it is assumed that the result of the operation is stored in the same place where the second operand was stored; no additional field is needed.

- A CPU can only execute *machine instructions*,
- Each computer has a set of specific machine instructions which its CPU is able to recognize and execute.

Number of bits, number and length of the fields and their order is particular to each computer; this defines the *instruction format* of that computer.

### **Types of Machine Instructions**

- Machine instructions are of four types:
  - **Data transfer between memory and CPU registers**
  - □ Arithmetic and logic operations
  - Program control (test and branch); these are those instructions that change the flow of instruction execution by *jumping* to an instruction *different* from the instruction following the current one in memory.
  - □ I/O transfer

You see, there are very simple things a machine instruction does! But many machine instructions, together, perform the big thing!

#### **Instruction Execution**

#### Let's imagine you write in a program the following instruction:

Z := (Y + X) \* 3;

The instruction will be executed by the CPU as a sequence of four machine instructions!

#### **Instruction Execution**

Let's imagine you write in a program the following instruction: Z := (Y + X) \* 3;

| w                                               | emory address at<br>hich the instruc-<br>on/data is stored | Content of the memory                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|-------------------------------------------------|------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Move value of Y to Reg                          | <b>2</b> 00001000                                          | 0000101110001010<br>Move addr of Y Reg 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| Add value of X to Reg<br>(result kept in Reg 2) | <b>2</b> 00001001                                          | 0001101110000010      Transform        Add      addr of X Reg 2      Transform                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| Multiply Reg 2 with<br>(result kept in Reg      |                                                            | Add      addr of X Reg 2      Image: Note of the second seco |
| Store Reg 2 at address of                       | F <b>Z</b> 00001011                                        | 0001001110010010<br>Move addr of Z Reg 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| Value of X:<br>Value of Y:                      |                                                            | 00000000000001011-X<br>0000000000000011-Y                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| Final value of Z: 4<br>Datorarkitektur Fö 1     | <b>42</b> 01110010                                         | $000000000101010 - Z \int_{33 \text{ of } 60}^{50}$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |

### Let's Follow the Instruction Execution



## Let's Follow the Instruction Execution



### **Let's Follow the Instruction Execution**















### **Compilers**

High Level Language (e.g. C, C++, Java)

We have written in our program:

What the computer executes is:



Machine instructions for the particular processor that runs the program.

# **Compilers**

High Level Language (e.g. C, C++, Java)

We have written in our program:

What the computer executes is:



Machine instructions for the particular processor that runs the program.

Who brings us from our program to the machine instructions?



A compiler is a program that translates programs written in a high level language into machine code to be executed on a certain processor.

Datorarkitektur Fö 1

# **The Machine Cycle**

Many things have to be done to execute a simple machine instruction:

- Fetch instruction
- Decode instruction
- Execute instruction

# The Machine Cycle

Many things have to be done to execute a simple machine instruction:

- **Fetch instruction**
- **Decode instruction**
- Execute instruction
  Execute instruction
  Execute instruction

# **The Machine Cycle**

Many things have to be done to execute a simple machine instruction:



Datorarkitektur Fö 1

#### **Case Studies**

#### INTEL x86 Family

- The most successful example of a modern CISC (complex instruction set computer) microprocessor
- ARM Family
  - The most successful RISC (reduced instruction set computer) microprocessor ever.

#### The INTEL x86

The number one microprocessor used in non-embedded systems.

First member: INTEL 4004 in 1971

First general purpose microprocessor: INTEL 8080 in 1974

# **Milestones: INTEL x86**

- **8080 (1974)** 
  - □ First general purpose microprocessor;
  - □ 8 bits;
  - □ Used in first personal computer: *Altair*.
- **8086 (1978)** 
  - □ 16 bits
  - **Something like an instruction cache**
  - □ First one used in IBM PC
- **80286 (1982)** 
  - **Huge increase in addressable memory (16 MByte instead of 1 MByte)**
- **80386 (1985)** 
  - □ 32 bits
- **80486 (1989)** 
  - Complex cache structure and pipelining
  - □ Math coprocessor

# **Milestones: INTEL x86**

- Pentium (1993)
  - Introduces superscalar technology
- Pentium Pro (1995)
  - Advanced superscalar techniques
  - Branch prediction and speculative execution
- Pentium II (1997)
  - □ Intel MMX technology (instruction set extension for multimedia)
- Pentium III (1999)
  - **Additional floating point instructions**
  - Support for 3D graphics software
- Pentium 4 (2000)
  - Further improvements on the line of Pentium III
- **Core (2006)** 
  - □ Solo: single core
  - Duo: Dual core two processors on a chip

### **Milestones: INTEL x86**

- **Core 2 (2006)** 
  - □ 64 bits
  - Dual: two processors on a chip
- Core 2 Quad (2007)
  - □ Four processors on a chip
- Core i7 (2009/2010), mobile version 2011
  - □ Six processors, Hyperthreading
- Core i9 (2017)
  - **10-18 processors, Hyperthreading**

Backward compatibility: newer versions can run the programs running on older versions.

# **The ARM Family**

ARM processors are widely used in embedded systems (games, phones, multimedia applications, various hand-held devices, consumer products, automotive, medical equipment, wireless etc.).

High performance, low size/cost, low power consumption

ARM Cambridge, UK, are designing single and multiprocessor architectures and *licence* them to manufacturers.

# **Milestones: The ARM Family**

- ARM1 and ARM2 (1985)
  - □ 32 bit RISC processor
  - □ 3 stage pipeline
- ARM3 (1989)
  - □ cache memory
- ARM6 (1992)
  - **Floating point unit**
- **ARM7 (1994)** 
  - □ Most successful family.
  - □ First used as part of complete Systems on Chip (SoC)
- **ARM8 (1996)** 
  - □ 5 stage pipeline
- **ARM9 (1997)** 
  - □ separate data/instruction cache

# **Milestones: The ARM Family**

**StrongArm (1996)** 

□ Special version of ARM9; developed with DEC and, later, bought by Intel.

- **ARM10 (2000)** 
  - □ 6 stage pipeline
- ARM11 (2002)
  - □ 9 stage pipeline
  - Media processing extensions
- XScale (2002)
  - Successor to StrongArm, by Intel
  - □ 7/8 stage pipeline
  - Dynamic voltage & frequency management
- **Cortex (2005)** 
  - □ 13 stage pipeline
  - □ superscalar

## **Milestones: The ARM Family**

- ARM11 MPCore (2005)
  - □ Multicore based on ARM11; up to 4 cores
- Cortex-A9 MPCore (2008)
  - Multicore based on out-of-order superscalar Cortex-A9 core; up to 4 cores
- Cortex- A15 MPCore (2011, commercial 2012)
  - Multicore based on an out-of-order superscalar Cortex-A15 core;
    2 clusters, 4 cores each.
- ARM Cortex-A57 (2014, commercial 2016)
  - Out-of-order superscalar, 64-bit instruction set.
    Used in chips by *Qualcomm*, *Samsung*, *Nvidia*.

- Memory hierarchy:
  - □ cache memory
  - □ virtual memory
  - memory management;

- Memory hierarchy:
  - □ cache memory
  - □ virtual memory
  - memory management;
- Advanced CPU structures and instruction execution strategies:
  - **pipelining**
  - □ **RISC** architectures
  - superscalar architectures
  - □ VLIW architectures

- Memory hierarchy:
  - □ cache memory
  - □ virtual memory
  - memory management;
- Advanced CPU structures and instruction execution strategies:
  - **pipelining**
  - □ **RISC** architectures
  - superscalar architectures
  - □ VLIW architectures
- System Architectures for parallel computing
  - □ performance of parallel computers
  - parallel computer architectures
  - multicore and multithreaded processors
  - general purpose graphic processors