Lab 5: Optimization

The aim of this lab is to perform optimization on the internal form constructed and type checked in previous labs.

Optimization

Optimization is a whole field of research in itself, and we will only briefly discuss the various types here. Refer to the lectures and course book (especially Chapters 9 and 10) for a more in-depth treatment.

When to optimize?

Optimization can be applied during several compiling phases. While the program is represented as some form of tree, it is relatively easy to transform the tree by use of tree traversals that analyze the trees and nodes and replace or rearrange them if possible (indeed, that is what we will do in this lab), in order to generate better code later on. Later on, the program needs to be linearized into list form, since assembler and ultimately processors are linear, for example into quadruples. These quadruple lists are suited for other kinds of optimization, where unnecessary statements can be eliminated, loop invariants moved outside loops, and many other things. Finally the assembler code generated can be analyzed and optimized, this time with special properties of the target processor in mind.

Constant folding

We will only implement a very simple algorithm for optimization known as constant folding. The idea is to identify constant arithmetic expressions at compile time and replace them with the result right away, saving us some run-time evaluations. The optimizing is done on the AST internal form, i.e., before we transform the program into quadruples, and is destructive in the sense that it modifies the AST, replacing subtrees with their evaluations.

As an example, the DIESEL code

if (a > 2) then
  a := 2 + 3;
end;

will be optimized to

if (a > 2) then
  a := 5;
end;

Obviously, most programmers are smart enough not to write code like this themselves, but there are cases where it can come in useful. A more intelligent version could analyze program blocks and identify variables whose value don’t change during the block’s lifetime, and pre-calculate expressions involving this variable once instead of several times, for example. However, we will stick with the basic version outlined above in this lab.

Expressed as AST nodes, the above optimization will look like this (node types are represented by their DIESEL symbols out of space considerations):

Example of constant folding.

Implementation

In this lab, you are to implement the constant folding algorithm as described above for DIESEL, by editing the file optimize.cc.

The tree traversal will be done using recursive method calls, much like the type checking in the previous lab. Optimization is performed one program block at a time. It is started from parser.y by a call to do_optimize() in a global optimizer object which is an instance of the ast_optimizer class, which will in turn start an optimize() call which will propagate down the AST and try to identify subtrees eligible for optimization.

The ast_optimizer class is a standalone class, again much like the semantic class, used as a wrapper for optimization. Its instantiation is a global variable, optimizer, which is declared in optimize.cc. The actual optimize() methods are part of the AST nodes, but their implementations lie in the optimize.cc file. The interface is kept clean in order to make it easy to change the algorithm used without having to fiddle with other parts of the compiler.

Some of the methods have already been implemented; look at them to get an idea of what you are supposed to do. Most of them you are to complete yourself (look for Your code here comments). Depending on your solution, it might be convenient to add new methods to the ast_optimizer class. If you do so, you will also need to add the declarations of these methods to the optimize.hh file. Remember to try to keep the interface as clean as possible, though.

Files that you need to change

optimize.hh and optimize.cc

contain optimizing code for the AST nodes as well as the declaration and implementation of the ast_optimizer class. These are the files you will edit in this lab.

Other files of interest

All these files are the same as in lab 4.

ast.cc

contains (part of) the implementations of the AST nodes.

ast.hh

contains the definitions for the AST nodes. You will need to perform some safe downcasting in this lab, and the methods for doing that can be found here.

parser.y

is the input file to bison.

quads.hh and quads.cc

contain quad generation code. You won’t need to edit these files in this lab.

codegen.hh and codegen.cc

contain assembler generation code. You won’t need to edit these files until the last lab.

error.hh, error.cc, symtab.hh, symbol.cc, symtab.cc, scanner.l, semantic.hh, and semantic.cc

use your versions from the earlier labs.

main.cc

this is the compiler wrapper, parsing flags and the like. Same as in the previous labs.

Makefile

and diesel use the same files as in the last lab.

Declarations and routines available in ast.hh, optimize.hh, and optimize.cc

The recursive optimize() method common to all the AST classes takes no arguments, and does not return anything; it either performs a destructive update of the AST, or passes the call onward to its children. The declarations can be found in ast.hh. You will have to implement their method bodies yourself.

The ast_optimizer class (currently) only contains one method of external interest and some utility functions that you may want to use:

class ast_optimizer

Public Functions

void do_optimize(ast_stmt_list *body)

Starts recursive (destructive) optimization of a block.

This method has already been written for you, and is rather trivial. It is the interface to parser.y.

Parameters

body – a list of statements representing the block body to be optimized.

bool is_binop(ast_expression*)

Returns true if the argument is a subclass of ast_binaryoperation. It is needed to find out which nodes are eligible for optimization.

ast_expression *fold_constants(ast_expression*)

This is a convenient method used in optimize.cc. It has to be public so the ast_* nodes can access it. Another solution would be to make it a static method in the optimize.cc file… A matter of preference.

As mentioned before, it is quite likely that you will want to add methods to the ast_optimizer class. Feel free to do so at your leisure, but remember to keep the interface as clean as possible. See the code comments for more information.

Program specification

  1. Your program must be able to handle optimization of all operations derived from the ast_binaryoperation class.

  2. Your program needs only optimize subtrees whose leaf nodes are instances of ast_real, ast_integer or ast_id (only those ast_id nodes that represent constants should be treated).

  3. Your program does not need to be able to optimize expressions containing ast_cast nodes, but feel free to implement it anyway as an exercise (it should not be too much extra work).

  4. Your program does not need to handle optimizing of binary relations, but feel free to implement it anyway as an exercise. It will be very similar to handling binary operators.

  5. Your program must preserve the code structure, i.e., the destructive updates must not change the result of running the compiled program in any way.

  6. Optimization should be done one block at a time. Apart from the calls to do_optimize(), no optimization code is to reside in parser.y.

Debugging help

As previously, you can send all AST nodes to cout.

How do I know it’s correct?

Compare the AST before and after optimization has been performed. Study the subtrees representing arithmetic expressions only involving constants. Make sure that they have been replaced by their result. An example is added as an enclosure.

Reporting your work

  • For the demonstration, run the test program for the test case opttest1.d and discuss your code.

  • Hand in your code as described in Handing in Results.

Example execution

Running make lab5 will take the sample input and compare it to the sample output. You should test your code on more examples than this.

./diesel -b -p -a ../testpgm/opttest1.d
Listing 13 trace/opttest1.trace
An AST will be printed for each block.
No quads will be generated.

Unoptimized AST for "F"
Statement list (preceding, last_stmt)
+-NULL
+-Return (value)
  +-Id (L) [INTEGER]

Optimized AST for "F"
Statement list (preceding, last_stmt)
+-NULL
+-Return (value)
  +-Id (L) [INTEGER]

Unoptimized AST for "PROC"
Statement list (preceding, last_stmt)
+-NULL
+-Assignment (left, right)
  +-Id (I) [INTEGER]
  +-Id (L) [INTEGER]

Optimized AST for "PROC"
Statement list (preceding, last_stmt)
+-NULL
+-Assignment (left, right)
  +-Id (I) [INTEGER]
  +-Id (L) [INTEGER]

Unoptimized AST for global level
Statement list (preceding, last_stmt)
+-Statement list (preceding, last_stmt)
| +-Statement list (preceding, last_stmt)
| | +-Statement list (preceding, last_stmt)
| | | +-Statement list (preceding, last_stmt)
| | | | +-Statement list (preceding, last_stmt)
| | | | | +-Statement list (preceding, last_stmt)
| | | | | | +-Statement list (preceding, last_stmt)
| | | | | | | +-Statement list (preceding, last_stmt)
| | | | | | | | +-Statement list (preceding, last_stmt)
| | | | | | | | | +-Statement list (preceding, last_stmt)
| | | | | | | | | | +-Statement list (preceding, last_stmt)
| | | | | | | | | | | +-NULL
| | | | | | | | | | | +-Assignment (left, right)
| | | | | | | | | | |   +-Id (I) [INTEGER]
| | | | | | | | | | |   +-Add (left, right) [INTEGER]
| | | | | | | | | | |     +-Id (FOO) [INTEGER]
| | | | | | | | | | |     +-Integer [2]
| | | | | | | | | | +-Assignment (left, right)
| | | | | | | | | |   +-Id (I) [INTEGER]
| | | | | | | | | |   +-Mult (left, right) [INTEGER]
| | | | | | | | | |     +-Integer [3]
| | | | | | | | | |     +-Integer [1]
| | | | | | | | | +-Assignment (left, right)
| | | | | | | | |   +-Id (I) [INTEGER]
| | | | | | | | |   +-Unary minus (expr) [INTEGER]
| | | | | | | | |     +-Mult (left, right) [INTEGER]
| | | | | | | | |       +-Integer [3]
| | | | | | | | |       +-Integer [1]
| | | | | | | | +-Assignment (left, right)
| | | | | | | |   +-Id (I) [INTEGER]
| | | | | | | |   +-Sub (left, right) [INTEGER]
| | | | | | | |     +-Id (I) [INTEGER]
| | | | | | | |     +-Mult (left, right) [INTEGER]
| | | | | | | |       +-Integer [3]
| | | | | | | |       +-Integer [1]
| | | | | | | +-Assignment (left, right)
| | | | | | |   +-Id (I) [INTEGER]
| | | | | | |   +-Or (left, right) [INTEGER]
| | | | | | |     +-Id (I) [INTEGER]
| | | | | | |     +-Mult (left, right) [INTEGER]
| | | | | | |       +-Integer [3]
| | | | | | |       +-Integer [1]
| | | | | | +-Assignment (left, right)
| | | | | |   +-Id (I) [INTEGER]
| | | | | |   +-Add (left, right) [INTEGER]
| | | | | |     +-Add (left, right) [INTEGER]
| | | | | |     | +-Id (I) [INTEGER]
| | | | | |     | +-Integer [1]
| | | | | |     +-Add (left, right) [INTEGER]
| | | | | |       +-Id (I) [INTEGER]
| | | | | |       +-Unary minus (expr) [INTEGER]
| | | | | |         +-Add (left, right) [INTEGER]
| | | | | |           +-Integer [63]
| | | | | |           +-Integer [3]
| | | | | +-Assignment (left, right)
| | | | |   +-Id (I) [INTEGER]
| | | | |   +-Or (left, right) [INTEGER]
| | | | |     +-Add (left, right) [INTEGER]
| | | | |     | +-Id (I) [INTEGER]
| | | | |     | +-Integer [1]
| | | | |     +-Add (left, right) [INTEGER]
| | | | |       +-Id (I) [INTEGER]
| | | | |       +-Unary minus (expr) [INTEGER]
| | | | |         +-Add (left, right) [INTEGER]
| | | | |           +-Integer [64]
| | | | |           +-Integer [3]
| | | | +-Assignment (left, right)
| | | |   +-Id (I) [INTEGER]
| | | |   +-Function call (function, arguments) [INTEGER]
| | | |     +-Id (F) [INTEGER]
| | | |     +-Expression list (preceding, last_expr)
| | | |       +-NULL
| | | |       +-Mult (left, right) [INTEGER]
| | | |         +-Integer [3]
| | | |         +-Integer [1]
| | | +-Assignment (left, right)
| | |   +-Id (A) [INTEGER]
| | |   +-Add (left, right) [INTEGER]
| | |     +-Id (I) [INTEGER]
| | |     +-Integer [1]
| | +-Procedure call (procedure, arguments)
| |   +-Id (PROC) [VOID]
| |   +-Expression list (preceding, last_expr)
| |     +-NULL
| |     +-Mult (left, right) [INTEGER]
| |       +-Integer [3]
| |       +-Integer [1]
| +-If (condition, then, elsif, else)
|   +-Less than (left, right) [INTEGER]
|   | +-Sub (left, right) [INTEGER]
|   | | +-Integer [2]
|   | | +-Integer [3]
|   | +-Integer [4]
|   +-Statement list (preceding, last_stmt)
|   | +-NULL
|   | +-Assignment (left, right)
|   |   +-Id (I) [INTEGER]
|   |   +-Add (left, right) [INTEGER]
|   |     +-Id (A) [INTEGER]
|   |     +-Integer [5]
|   +-NULL
|   +-NULL
+-If (condition, then, elsif, else)
  +-And (left, right) [INTEGER]
  | +-Integer [5]
  | +-Integer [2]
  +-Statement list (preceding, last_stmt)
  | +-Statement list (preceding, last_stmt)
  | | +-NULL
  | | +-Assignment (left, right)
  | |   +-Id (P) [REAL]
  | |   +-Divide (left, right) [REAL]
  | |     +-Real [15]
  | |     +-Cast [REAL]
  | |       +-Integer [2]
  | +-Assignment (left, right)
  |   +-Id (Q) [REAL]
  |   +-Add (left, right) [REAL]
  |     +-Real [5.6]
  |     +-Real [1]
  +-Elsif list (preceding, last_elsif)
  | +-NULL
  | +-Elsif (condition, body)
  |   +-Or (left, right) [INTEGER]
  |   | +-Integer [3]
  |   | +-Integer [1]
  |   +-Statement list (preceding, last_stmt)
  |     +-NULL
  |     +-Assignment (left, right)
  |       +-Id (Q) [REAL]
  |       +-Add (left, right) [REAL]
  |         +-Real [6.6]
  |         +-Real [1]
  +-NULL

Optimized AST for global level
Statement list (preceding, last_stmt)
+-Statement list (preceding, last_stmt)
| +-Statement list (preceding, last_stmt)
| | +-Statement list (preceding, last_stmt)
| | | +-Statement list (preceding, last_stmt)
| | | | +-Statement list (preceding, last_stmt)
| | | | | +-Statement list (preceding, last_stmt)
| | | | | | +-Statement list (preceding, last_stmt)
| | | | | | | +-Statement list (preceding, last_stmt)
| | | | | | | | +-Statement list (preceding, last_stmt)
| | | | | | | | | +-Statement list (preceding, last_stmt)
| | | | | | | | | | +-Statement list (preceding, last_stmt)
| | | | | | | | | | | +-NULL
| | | | | | | | | | | +-Assignment (left, right)
| | | | | | | | | | |   +-Id (I) [INTEGER]
| | | | | | | | | | |   +-Integer [4]
| | | | | | | | | | +-Assignment (left, right)
| | | | | | | | | |   +-Id (I) [INTEGER]
| | | | | | | | | |   +-Integer [3]
| | | | | | | | | +-Assignment (left, right)
| | | | | | | | |   +-Id (I) [INTEGER]
| | | | | | | | |   +-Unary minus (expr) [INTEGER]
| | | | | | | | |     +-Integer [3]
| | | | | | | | +-Assignment (left, right)
| | | | | | | |   +-Id (I) [INTEGER]
| | | | | | | |   +-Sub (left, right) [INTEGER]
| | | | | | | |     +-Id (I) [INTEGER]
| | | | | | | |     +-Integer [3]
| | | | | | | +-Assignment (left, right)
| | | | | | |   +-Id (I) [INTEGER]
| | | | | | |   +-Or (left, right) [INTEGER]
| | | | | | |     +-Id (I) [INTEGER]
| | | | | | |     +-Integer [3]
| | | | | | +-Assignment (left, right)
| | | | | |   +-Id (I) [INTEGER]
| | | | | |   +-Add (left, right) [INTEGER]
| | | | | |     +-Add (left, right) [INTEGER]
| | | | | |     | +-Id (I) [INTEGER]
| | | | | |     | +-Integer [1]
| | | | | |     +-Add (left, right) [INTEGER]
| | | | | |       +-Id (I) [INTEGER]
| | | | | |       +-Unary minus (expr) [INTEGER]
| | | | | |         +-Integer [66]
| | | | | +-Assignment (left, right)
| | | | |   +-Id (I) [INTEGER]
| | | | |   +-Or (left, right) [INTEGER]
| | | | |     +-Add (left, right) [INTEGER]
| | | | |     | +-Id (I) [INTEGER]
| | | | |     | +-Integer [1]
| | | | |     +-Add (left, right) [INTEGER]
| | | | |       +-Id (I) [INTEGER]
| | | | |       +-Unary minus (expr) [INTEGER]
| | | | |         +-Integer [67]
| | | | +-Assignment (left, right)
| | | |   +-Id (I) [INTEGER]
| | | |   +-Function call (function, arguments) [INTEGER]
| | | |     +-Id (F) [INTEGER]
| | | |     +-Expression list (preceding, last_expr)
| | | |       +-NULL
| | | |       +-Integer [3]
| | | +-Assignment (left, right)
| | |   +-Id (A) [INTEGER]
| | |   +-Add (left, right) [INTEGER]
| | |     +-Id (I) [INTEGER]
| | |     +-Integer [1]
| | +-Procedure call (procedure, arguments)
| |   +-Id (PROC) [VOID]
| |   +-Expression list (preceding, last_expr)
| |     +-NULL
| |     +-Integer [3]
| +-If (condition, then, elsif, else)
|   +-Less than (left, right) [INTEGER]
|   | +-Integer [-1]
|   | +-Integer [4]
|   +-Statement list (preceding, last_stmt)
|   | +-NULL
|   | +-Assignment (left, right)
|   |   +-Id (I) [INTEGER]
|   |   +-Add (left, right) [INTEGER]
|   |     +-Id (A) [INTEGER]
|   |     +-Integer [5]
|   +-NULL
|   +-NULL
+-If (condition, then, elsif, else)
  +-Integer [1]
  +-Statement list (preceding, last_stmt)
  | +-Statement list (preceding, last_stmt)
  | | +-NULL
  | | +-Assignment (left, right)
  | |   +-Id (P) [REAL]
  | |   +-Divide (left, right) [REAL]
  | |     +-Real [15]
  | |     +-Cast [REAL]
  | |       +-Integer [2]
  | +-Assignment (left, right)
  |   +-Id (Q) [REAL]
  |   +-Real [6.6]
  +-Elsif list (preceding, last_elsif)
  | +-NULL
  | +-Elsif (condition, body)
  |   +-Integer [1]
  |   +-Statement list (preceding, last_stmt)
  |     +-NULL
  |     +-Assignment (left, right)
  |       +-Id (Q) [REAL]
  |       +-Real [7.6]
  +-NULL

Note

If you optimize more nodes than required, the traces may be different in this and subsequent labs. If you do implement such extra features, it is suggested that you can easily disable it to compare against the provided traces to save time.