TDDC86 Compiler Optimizations and Code Generation DF00100 Advanced Compiler Construction



# Multi-Level Intermediate Representations

Local CSE, DAGs, Lowering Call Sequences

**Survey of some Compiler Frameworks** 

Christoph Kessler, IDA, Linköpings universitet, 2012.





### **Multi-Level IR**

- **Multi-level IR**, e.g.
  - AST abstract syntax tree implicit control and data flow
  - HIR high-level IR
  - MIR medium-level IR
  - LIR low-level IR, symbolic registers
  - VLIR very low-level IR, target specific, target registers
- Standard form and possibly also SSA (static single assignment) form
- Open form (tree, graph) and/or closed (linearized, flattened) form
  - For expressions: Trees vs DAGs (directed acyclic graphs)
- Translation by lowering
  - ② Analysis / Optimization engines can work on the most appropriate level of abstraction
  - Clean separation of compiler phases, somewhat easier to extend and debug
  - $\ensuremath{\textcircled{\otimes}}$  Framework gets larger and slower
- C. Kessler, IDA, Linköpings universitet.
- TDDC86 Compiler Optimizations and Code Generation









F95



#### WGS UNT





# Symbol table



#### Some typical fields in a symbol table entry

| Field Name | Field Type      | Meaning                                                                               |
|------------|-----------------|---------------------------------------------------------------------------------------|
| name       | char *          | the symbol's identifier                                                               |
| sclass     | enum { STATIC,} | storage class                                                                         |
| size       | int             | size in bytes                                                                         |
| type       | struct type *   | source language data type                                                             |
| basetype   | struct type *   | source-lang. type of elements of a constructed type                                   |
| machtype   | enum { }        | machine type corresponding to<br>source type (or element type if<br>constructed type) |
| basereg    | char *          | base register to compute address                                                      |
| disp       | int             | displacement to address on stack                                                      |
| reg        | char *          | name of register containing the symbol's value                                        |

TDDC86 Compiler Optimizations and Code Generation DF00100 Advanced Compiler Construction



# **Multi-Level IR Overview**









- loop structures and bounds explicit endfor
- array subscripts explicit
- → suitable for data dependence analysis and loop transformation / parallelization
- artificial entry node for the procedure
- assignments var = expr
- unassigned expressions, e.g. conditionals

11

• function calls

C. Kessler, IDA, Linköpings universitet.

TDDC86 Compiler Optimizations and Code Generation

a[i] = 2



### **Generating a CFG from AST**



- Straightforward for structured programming languages
  - Traverse AST and compose control flow graph recursively
  - As in syntax-directed translation, but separate pass
  - Stitching points: single entry, single exit point of control; symbolic labels for linearization













- SSA form makes data flow (esp., def-use chains) explicit
- Certain program analyses and transformations are easier to implement or more efficient on SSA-representation
- (Up to now) SSA is not suitable for code generation

**SSA-Form vs. Standard Form of IR** 

- Requires transformation back to standard form
- Comes later...



### MIR – medium-level intermediate representation



- "language independent"
- control flow reduced to simple branches, call, return
- variable accesses still in terms of symbol table names
- explicit code for procedure / block entry / exit
- suitable for most optimizations
- basis for code generation



**TDDC86** Compiler Optimizations and Code Generation DF00100 Advanced Compiler Construction



# Flattening 1: From HIR to MIR



# HIR→MIR (1): Flattening the expressions

By a postorder traversal of each expression tree in the CFG:

- Decompose the nodes of the expression trees (operators, ...) into simple operations (ADD, SUB, MUL, ...)
- Infer the types of operands and results (language semantics)
  - annotate each operation by its (result) type
  - insert explicit conversion operations where necessary
- Flatten each expression tree (= partial order of evaluation) to a sequence of operations (= total order of evaluation) using temporary variables t1, t2, ... to keep track of data flow

21

• This is static scheduling! May have an impact on space / time requirements

C. Kessler, IDA, Linköpings universitet.

**TDDC86 Compiler Optimizations and Code Generation** 

# HIR→MIR (2): Lowering Array References (1)

- HIR: t1 = a [ i, j+2 ]
- the Lvalue of a [ i, j+2 ] is (on a 32-bit architecture) (addr a) + 4 \* ( i \* 20 + j + 2 )

| MIR:<br>t1 = j + 2<br>t2 = i * 20<br>t3 = t1 + t2<br>t4 = 4 * t3<br>t5 = addr a<br>t6 = t5 + t4<br>t7 = *t6 |    |                                                   |
|-------------------------------------------------------------------------------------------------------------|----|---------------------------------------------------|
| t7 = *t6<br>C. Kessler, IDA, Linköpings universitet.                                                        | 22 | TDDC86 Compiler Optimizations and Code Generation |



### **Control flow graph**



4

- Depth-first search of the control flow graph
- Topological ordering of the operations, starting with *entry* node
  - at conditional branches: one exit fall-through, other exit branch to a label
- **Basic blocks** = maximum-length subsequences of statements containing no branch nor join of control flow

23

**Basic block graph** obtained from CFG by merging statements in a basic block to a single node

C. Kessler, IDA, Linköpings universitet.

**TDDC86** Compiler Optimizations and Code Generation

- Nodes: primitive operations (e.g., quadruples)
- Edges: control flow transitions
- Example: 1: (JEQZ, 5, 0, 0) (ASGN, 2, 0, 2: A) (ADD 3: 3, B) Α, (JUMP, 7, 0, 0) 4: (ASGN, 23, 5: 0, A ) (SUB 6: B ) Α, 1, (MUL, 7: Β, C ) Α, (ADD, С, 1, A) 8:

Β,

2,

0)

24

(JNEZ,

C. Kessler, IDA, Linköpings universitet.

9:

#### **Basic block**



- A **basic block** is a sequence of textually consecutive operations (e.g. MIR operations, LIR operations, quadruples) that contains no branches (except perhaps its last operation) and no branch targets (except perhaps its first operation).
  - Always executed in same order from entry to exit

| A.k.a. straight-line code                  |       | 1: | (JEQZ, | 5,  | 0, | 0)  | <b>B1</b> |      |
|--------------------------------------------|-------|----|--------|-----|----|-----|-----------|------|
|                                            | f - 2 | 2: | (ASGN, | 2,  | 0, | A)  | <b>B2</b> |      |
|                                            | 3     | 3: | (ADD   | А,  | 3, | B ) |           |      |
|                                            |       | 4: | (JUMP, | 7,  | 0, | 0)  |           |      |
|                                            | Ę     | 5: | (ASGN, | 23, | 0, | A)  | <b>B3</b> |      |
|                                            | e     | 5: | (SUB   | А,  | 1, | B ) |           |      |
|                                            |       | 7: | ( MUL, | А,  | В, | C ) | <b>B4</b> |      |
|                                            | 8     | 3: | (ADD,  | C,  | 1, | A)  |           |      |
| C. Kessler, IDA, Linköpings universitet. 2 | 5 9   | 9: | (JNEZ, | В,  | 2, | 0)  | 1         | tion |



#### LIR – low-level intermediate representation



- in GCC: Register-transfer language (RTL)
- usually architecture dependent
  - e.g. equivalents of target instructions + addressing modes for IR operations
  - variable accesses in terms of target memory addresses









TDDC86 Compiler Optimizations and Code Generation





#### MIR→LIR: Storage Binding



- mapping variables (symbol table items) to addresses
- (virtual) register allocation
- procedure frame layout implies addressing of formal parameters and local variables relative to frame pointer fp, and parameter passing (call sequences)

31

- for accesses, generate Load and Store operations
- further lowering of the program representation

#### MIR→LIR translation example



#### MIR→LIR: Procedure call sequence (1) [Muchnick 5.6]



**TDDC86 Compiler Optimizations and Code Generation** 

- call instruction assembles arguments and transfers control to callee
- evaluate each argument (reference vs. value param.) and
  - push it on the stack, or write it to a parameter register
- determine code address of the callee (mostly, compile-time or link-time constant)
- store caller-save registers (usually, push on the stack)

33

save return address (usually in a register) and branch to code *entry* of callee.

C. Kessler, IDA, Linköpings universitet.

C. Kessler, IDA, Linköpings universitet,

TDDC86 Compiler Optimizations and Code Generation

**TDDC86** Compiler Optimizations and Code Generation

### MIR→LIR: Procedure call sequence (2)

#### Procedure prologue

C. Kessler, IDA, Linköpings universitet.

executed on entry to the procedure

- save old frame pointer fp
- old stack pointer sp becomes new frame pointer fp
- determine new sp (creating space for local variables)
- save callee-save registers

MIR→LIR: Procedure call sequence (3)





**TDDC86 Compiler Optimizations and Code Generation** 

#### Procedure epilogue

executed at return from procedure

- restore callee-save registers
- put return value (if existing) in appropriate place (reg/stack)

35

- restore old values for sp and fp
- branch to return address

Caller cleans up upon return:

restore caller-save registers

C. Kessler, IDA, Linköpings universitet.

use the return value (if applicable)

**From Trees to DAGs:** 

34

# Common Subexpression Elimination (CSE)

E.g., at MIR→LIR Lowering

Christoph Kessler, IDA, Linköpings universitet, 2012.

| From  | Trees to | DAGs: |
|-------|----------|-------|
| Local | CSE      |       |

start with an empty DAG.

for each (MIR) operation  $v = (t, op, u_1, u_2)$ 

for each operand  $u_i$  of v

if u<sub>i</sub> not yet represented by a DAG node d<sub>i</sub>
create DAG leaf node d<sub>i</sub> ← new dagnode( u<sub>i</sub>, LEAF, NULL, NULL )
// LEAF denotes a value live on entry to basic block

if there is no parent node p of all  $d_i$  with operator op in the DAG

37

create  $p \leftarrow \text{new dagnode}(v, op, d_1, d_2)$ 

label *p* with *t* // result (temp.) variable

remove t as label of any other node in the DAG

for all non-removed labels create assignments.

C. Kessler, IDA, Linköpings universitet.

TDDC86 Compiler Optimizations and Code Generation

 $t \leftarrow u_1 \ op \ u_2$ 















#### Register allocation

- determine what values to keep in a register
- "symbolic registers", "virtual registers"

#### Register assignment

- assign virtual to physical registers
- Two values cannot be mapped to the same register if they are *alive* simultaneously, i.e. their *live ranges* overlap (depends on schedule).

C. Kessler, IDA, Linköpings universitet.

42 TDDC8

TDDC86 Compiler Optimizations and Code Generation

### **On LIR/VLIR: Instruction scheduling**



 reorders the instructions (LIR/VLIR) (subject to precedence constraints given by dependences) to minimize

43

- space requirements (# registers)
- time requirements (# CPU cycles)
- power consumption

• ...

#### C. Kessler, IDA, Linköpings universitet.

TDDC86 Compiler Optimizations and Code Generation

# Remarks on IR design (1) [Cooper'02]



AST? DAGs? Call graph? Control flow graph? Program dep. graph? SSA? ...

- Level of abstraction is critical for implementation cost and opportunities:
  - representation chosen affects the entire compiler

**Example 1:** Addressing for arrays and aggregates (structs)

- source level AST: hides entire address computation A[i+1][j]
- pointer formulation: may hide critical knowledge (bounds)
- low-level code: may make it hard to see the reference
- $\rightarrow$  "best" representation depends on how it is used
  - for dependence-based transformations: source-level IR (AST, HIR)
  - for fast execution: pointer formulation (MIR, LIR)
  - for optimizing address computation: low-level repr. (LIR, VLIR, target)

44



TDDC86 Compiler Optimizations and Code Generation

# **Remarks on IR Design (2)**



- **Example 2**: Representation for comparison&branch
- fundamentally, 3 different operations:
  - Compare  $\rightarrow$  convert result to boolean  $\rightarrow$  branch combined in different ways by processor architects
- "best" representation may depend on target machine

| ■ r7 = (x < y) | cmp x y (sets CC) | r7 = (x < y) |
|----------------|-------------------|--------------|
| br r7, L12     | brLT L12          | [r7] br L12  |

45

 $\rightarrow$  design problem for a retargetable compiler

#### C. Kessler, IDA, Linköpings universitet.

TDDC86 Compiler Optimizations and Code Generation

# Summary

#### Multi-level IR

- Translation by lowering
- © Program analyses and transformations can work on the most appropriate level of abstraction
- © Clean separation of compiler phases
- ☺ Compiler framework gets larger and slower









# **APPENDIX – For Self-Study**

# **Compiler Frameworks**

#### A (non-exhaustive) survey

with a focus on open-source frameworks

Christoph Kessler, IDA, Linköpings universitet, 2012.

- Dragon-book style C compiler implementation in C
- Very small (20K Loc), well documented, well tested, widely used
- Open source: http://www.cs.princeton.edu/software/lcc
- Textbook A retargetable C compiler [Fraser, Hanson 1995] contains complete source code
- One-pass compiler, fast
- C frontend (hand-crafted scanner and recursive descent parser) with own C preprocessor
- Low-level IR
  - Basic-block graph containing DAGs of quadruples
  - No AST
- Interface to IBURG code generator generator
  - Example code generators for MIPS, SPARC, Alpha, x86 processors
  - Tree pattern matching + dynamic programming
- Few optimizations only
  - local common subexpr. elimination, constant folding
- Good choice for source-to-target compiling if a prototype is needed soon



8

#### GCC 4.x

- Gnu Compiler Collection (earlier: Gnu C Compiler)
- Compilers for C, C++, Fortran, Java, Objective-C, Ada ...
  - sometimes with own extensions, e.g. Gnu-C
- Open-source, developed since 1985
- Very large
- 3 IR formats (all language independent)
  - GENERIC: tree representation for whole function (also statements)
  - GIMPLE (simple version of GENERIC for optimizations) based on tree's but expressions in quadruple form. High-level, low-level and SSA-low-level form.
  - RTL (Register Transfer Language, low-level, Lisp-like) (the traditional GCC-IR) only word-sized data types; stack explicit; statement scope
- Many optimizations
- Many target architectures
- Version 4.x (since ~2004) has strong support for retargetable code generation

49

- Machine description in .md file
- Reservation tables for instruction scheduler generation
- Good choice if one has the time to get into the framework

#### C. Kessler, IDA, Linköpings universitet.

**TDDC86 Compiler Optimizations and Code Generation** 

# **Open64 / ORC Open Research Compiler**



- Based on SGI Pro-64 Compiler for MIPS processor, written in C++, went open source in 2000
- Several tracks of development (Open64, ORC, …)
- For Intel Itanium (IA-64) and x86 (IA-32) processors. Also retargeted to x86-64, Ceva DSP, Tensilica, XScale, ARM ... "simple to retarget" (?)
- Languages: C, C++, Fortran95 (uses GCC as frontend), OpenMP and UPC (for parallel programming)
- Industrial strength, with contributions from Intel, Pathscale, …
- Open source: www.open64.net, ipf-orc.sourceforge.net
- 6-layer IR:
  - WHIRL (VH, H, M, L, VL) 5 levels of abstraction
    - All levels semantically equivalent
    - Each level a lower level subset of the higher form
  - and target-specific very low-level CGIR
- Many optimizations, many third-party contributed components C. Kessler, IDA, Linköpings universitet.



# LLVM

#### (llvm.org)

- **LLVM** (Univ. of Illinois at Urbana Champaign)
  - "Low-level virtual machine"
  - Front-ends (Clang, GCC) for C, C++, Objective-C, Fortran, ...
  - One IR level: a LIR + SSA-LIR,
    - Inearized form, printable, shippable, but target-dependent,
    - "LLVM instruction set"
  - compiles to many target platforms
    - ▶ x86, Itanium, ARM, Alpha, SPARC, PowerPC, Cell SPE, ...
    - And to low-level C
  - Link-time interprocedural analysis and optimization framework for whole-program analysis

52

- JIT support available for x86, PowerPC
- Open source
- C. Kessler, IDA, Linköpings universitet.

**TDDC86 Compiler Optimizations and Code Generation** 







- VEX: "VLIW EXample"
  - Generic clustered VLIW Architecture and Instruction Set
- From the book by Fisher, Faraboschi, Young: Embedded Computing, Morgan Kaufmann 2005
  - www.vliw.org/book
- Developed at HP Research
  - Based on the compiler for HP/ST Lx (ST200 DSP)
- Compiler, Libraries, Simulator and Tools available in binary form from HP for non-commercial use
  - IR not accessible, but CFGs and DAGs can be dumped or visualized
- Transformations controllable by options and/or #pragmas
  - Scalar optimizations, loop unrolling, prefetching, function inlining, ...

53

• Global scheduling (esp., trace scheduling), but no software pipelining

C. Kessler, IDA, Linköpings universitet.

**TDDC86 Compiler Optimizations and Code Generation** 



#### A commercial compiler framework

www.ace.nl

Christoph Kessler, IDA, Linköpings universitet, 2012.





### Engine



- Modular compiler building block
- Performs a well-defined task
- Focus on algorithms, not compiler configuration
- Parameters are handles on the underlying common IR repository
- Execution may be in a separate process or as subroutine call the engine writer does not know!
- View of an engine class: the part of the common IR repository that it can access (scope set by access rights: read, write, create)
- Examples: Analyzers, Lowerers, Optimizers, Translators, Support

57

C. Kessler, IDA, Linköpings universitet.

TDDC86 Compiler Optimizations and Code Generation

#### **Composite Engines in CoSy**



- Built from simple engines or from other composite engines by combining engines in interaction schemes (Loop, Pipeline, Fork, Parallel, Speculative, ...)
- Described in EDL (Engine Description Language)
- View defined by the joint effect of constituent engines
- A compiler is nothing more than a large composite engine

58



C. Kessler, IDA, Linköpings universitet.

TDDC86 Compiler Optimizations and Code Generation









- Component classes (engine class)
- Component instances (engines)
- Basic components are implemented in C
- Interaction schemes (cf. skeletons) form complex connectors
  - SEQUENTIAL
  - PIPELINE
  - DATAPARALLEL
  - SPECULATIVE
- EDL can embed automatically
  - Single-call-components into pipes
  - p<> means a stream of p-items
  - EDL can map their protocols to each other (p vs p<>)

C. Kessler, IDA, Linköpings universitet.

ControlFlowAnalyser cfa; CommonSubExprEliminator cse; LoopVariableSimplifier lvs;

ENGINE CLASS <u>optimizer</u> (procedure p)

PIPELINE cfa(p); cse(p); lvs(p);

#### ENGINE CLASS compiler ( file f )

....
Token token;
Module m;
PIPELINE // lexer takes file, delivers token stream:
 lexer( IN f, OUT token<> );
 // Parser delivers a module
 parser( IN token<>, OUT m );
 sema( m );
 decompose( m, p<> );
 // here comes a stream of procedures
 // from the module
 optimizer( p<> );
 backend( p<> );

·····

#### **Evaluation of CoSy**



- The outer call layers of the compiler are generated from view description specifications
  - Adapter, coordination, communication, encapsulation
  - Sequential and parallel implementation can be exchanged
  - There is also a non-commercial prototype [Martin Alt: On Parallel Compilation. PhD thesis, 1997, Univ. Saarbrücken]

61

- Access layer to the repository must be efficient (solved by generation of macros)
- Because of views, a CoSy-compiler is very simply extensible
  - That's why it is expensive
  - Reconfiguration of a compiler within an hour

#### C. Kessler, IDA, Linköpings universitet.

TDDC86 Compiler Optimizations and Code Generation

# Source-to-Source compiler frameworks



- Cetus
  - C / OpenMP source-to-source compiler written in Java.
  - Open source
- ROSE
  - C++ source-to-source compiler
  - Open source
- Tools and generators
  - TXL source-to-source transformation system
  - ANTLR frontend generator
- C. Kessler, IDA, Linköpings universitet.

. . .

62 TDDC

TDDC86 Compiler Optimizations and Code Generation

# More frameworks (mostly historical) ...



- Some influential frameworks of the 1990s
  - ...some of them still active today
  - **SUIF** Stanford university intermediate format, suif.stanford.edu
  - Trimaran (for instruction-level parallel processors) www.trimaran.org
  - Polaris (Fortran) UIUC
  - Jikes RVM (Java) IBM
  - Soot (Java)
  - GMD Toolbox / Cocolab Cocktail<sup>™</sup> compiler generation tool suite

63

- and many others ...
- And many more for the embedded domain ...

C. Kessler, IDA, Linköpings universitet.

TDDC86 Compiler Optimizations and Code Generation

11