Lab 6: Intermediary Code

In this lab we will study intermediary code generation, especially the representation called quadruples.

Introduction

Intermediary code:

  • is closer to machine code without being machine-dependent.

  • can handle temporary variables (intermediate storage of results).

  • means higher portability. Intermediary code can easily be expanded to, for example, assembler – which we will study in the next lab.

  • offers the possibility of performing code optimization such as code transposition and register allocation.

Various types of intermediary code are:

  • Infix notation

  • Postfix notation (reverse Polish notation)

  • Three-address code:

    • Triples

    • Quadruples

The intermediary code we will use is quadruples where each instruction has four fields:

operator    operand1    operand2    result

operand1 and operand2 denote the operands and result denotes the result of the operation, usually a temporary variable. The temporary variables can be stored in registers or on the stack. In the lab the temporary variables will be stored together with the local variables on the stack.

Example:

(A + B) * (C + D) - E
Table 12 Intermediate code generation using quadruples

Operator

Operand1

Operand2

Result

+

A

B

T1

+

C

D

T2

*

T1

T2

T3

-

T3

E

T4

where T1, T2, T3, and T4 are temporary variables.

As you perhaps have worked out, operand1, operand2 and result are often indexes in the symbol table.

Specifications for DIESEL

Traversing the abstract syntax tree, after type checking and optimization has been performed, one block at a time, generates quadruples.

Example of AST tree.

The code given as an AST tree above is translated to:

q_iplus

10

11

13

q_idiv

13

12

14

q_iassign

14

0

9

The numbers are sym_index positions given in the following symbol table:

9

10

11

12

13

14

A

B

C

D

T1

T2

Note that operations are typed, i.e. there are both q_iplus (integer addition) and q_rplus (real addition). The operation to select depends on the node type if it is an arithmetic operation, but on the children’s types if it is a relational operation. For example, if the tree node operation is ast_add and both operands are of integer type, we can use q_iplus.

Handling real numbers

When generating assembler code, life will be a lot easier if we can assume that all real numbers are stored in 64 bits. We can do this by storing real numbers as integers, in the IEEE format. This is done by some pointer conversion magic in symbol_table::ieee(), which is already written. It takes as argument a double number, and returns a long representing it in the 64-bit IEEE format.

Whenever you generate a quad representing or treating a real number, such as a q_rload quad below, you will need to first convert the real number (which is of type double in our C++ program) to an IEEE integer. This is done quite conveniently:

sym_tab->ieee(value); // value is the double you want to convert.

Quadruple operations

(operation, operand1, operand2, result)

This is the structure used when describing the 39 different quadruple operations required for DIESEL. The result can usually also be considered the destination.

Load operations

(q_rload, long, -, sym_index)

Loads a temporary variable with a real number represented as a 64-bit integer in the IEEE format. Note that the operand is interpreted as a real number, not as an index to the symbol table. The operation is needed to describe expressions such as 3.14 * D.

If q_rload did not exist, how could we know what it says here?:

(q_rmult, 10785233, 114, 143)

Instead we generate the quadruples:

(q_rload, 10785233, -, 143) 1

(q_rmult, 143, 114, 144)

where we can determine that 143 is an index to the symbol table for a temporary variable which contains 3.14. The alternative would be to introduce extra fields in the operands specifying how they are to be interpreted.

(q_iload, long, -, sym_index)

Loads a temporary variable with an integer constant, in a similar way to q_rload.

Store operations

Store quadruples are usually preceded by an address calculation for an array element (see q_lindex).

(operation, sym_index, -, address)

Operation

Description

q_rstore

Assignment of a real number value to memory address.

q_istore

Assignment of an integer value to memory address.

Unary operations

(operation, sym_index, -, sym_index)

Operation

Description

q_inot

Logical negation.

q_ruminus

Unary minus for real numbers.

q_iuminus

Unary minus for integers.

q_rassign

Assignment of real value to real variable.

q_iassign

Assignment of integer value to integer variable.

q_itor

Type conversion from integer to real.

Binary operations

Note that logical operations and comparisons return 1 for true and 0 for false although the input considers any non-zero argument to be true.

(operation, sym_index, sym_index, sym_index)

Operation

Description

q_rplus

Real number addition.

q_iplus

Integer addition.

q_rminus

Real number subtraction

q_iminus

Integer subtraction.

q_ior

Logical “or”.

q_iand

Logical “and”.

q_rmult

Real number multiplication.

q_imult

Integer multiplication.

q_rdivide

Real number division.

q_idivide

Integer division.

q_imod

Integer remainder.

q_req

Real equality comparison.

q_ieq

Integer equality comparison.

q_rne

Real inequality comparison.

q_ine

Integer inequality comparison.

q_rlt

Real less than comparison.

q_ilt

Integer less than comparison.

q_rgt

Real greater than comparison.

q_igt

Integer greater than comparison.

Subroutines

(q_call, sym_index, no.params, sym_index)

q_call is the call of a procedure or function whose position in the symbol table is given by operand1 and the number of parameters is given by operand2. The quadruple is preceded by zero or more parameters (see q_param). A function returns the value in result. For a procedure the result should be NULL_SYM.

(operation, labelno., sym_index, -)

Returns a value from a function. operand2 is the function’s result value. operand1 specifies the number on the label which is placed at the end of the function (last_label).

Operation

Description

q_rreturn

Returns a real number.

q_ireturn

Returns an integer.

(q_param, sym_index, -, -)

Specifies that operand1 is to be sent as an argument (actual parameter) to a procedure/function.

Example:

foo(a, bar(b), c);

gives us:

q_param

11

0

0

q_param

10

0

0

q_call

13

1

14

q_param

14

0

0

q_param

9

0

0

q_call

12

3

0

where the symbol table is defined below:

9

10

11

12

13

14

A

B

C

FOO

BAR

T1

Pay special attention to the treatment of the case above, when the result of a function call becomes the parameter of another function. Understanding this will come in handy in the last lab.

Array indexing

(q_lindex, sym_index, sym_index, sym_index)

Calculates the address (l-value index) for a new array element. This is used as the assignment is made to an array element, operand1 is the array, operand2 holds the index into the array. The address is put in result and can as far as we are concerned here be stored in an integer, since DIESEL does not allow user-defined types.

(q_rrindex, sym_index, sym_index, sym_index)

Real r-value indexing. Retrieves the value of an element in an array (of real). operand1 is the array, operand2 holds the index into the array. The element’s value is put in result.

(q_irindex, sym_index, sym_index, sym_index)

Integer r-value indexing. Retrieves the value of an element in an array (of integer). operand1 is the array, operand2 holds the index into the array. The element’s value is put in result.

The example illustrates q_lindex and q_irindex (a is assumed to be declared as an integer array) :

a[a[1]] := a[2];

gives us:

q_iload

2

0

10

q_irindex

9

10

11

q_iload

1

0

12

q_irindex

9

12

13

q_lindex

9

13

14

q_istore

11

0

14

where the symbol table is given below:

9

10

11

12

13

14

A

T1

T2

T3

T4

T5

Other operations

(q_labl, labelno., -, -)

Marks a position to where a jump can be made.

(q_jmp, labelno., -, -)

Unconditional jump to the specified position.

(q_jmpf, labelno., -, -)

Conditional jump to the specified position. The jump is made if operand2 is zero (hence the name, “jump if false”).

(q_nop, -, -, -)

Illegal operation.

Implementation

In this lab, you will write the routines for converting the internal form we have been working with so far, the abstract syntax tree, into quadruples (quads).

This is done one block at a time by traversing the AST. The quad generation is started from parser.y with a call to do_quads(), which in turn starts a recursive call generate_quads() which propagates down the AST for the program block we’re currently treating (if you’re starting to recognize a pattern from the semantic and optimizing labs here, you’re definitely on to something). This method returns a sym_index to wherever its result was stored, if any. A difference is that the do_quads() method is declared in the classes ast_functionhead and ast_procedurehead instead of in a separate class, like in the two previous labs. The reasons are that such a class would have been small enough to be trivial, and also to show a variant approach to making the interfaces between the compiler phases as clean as possible.

In some cases methods with other names than generate_quads() need to be defined. These cases can easily be found by reading the quads.cc file, and/or the ast.hh file. Read the code comments for more information on the individual cases. Studying the pre-written implementations of a few methods will prove extremely helpful in understanding what is going on here.

The result of the traversal is a quad_list containing all the quads generated while traversing the AST for a program block. New quads are added onto this list as they are created. The quad_list is a class of its own, declared in quads.hh, which is written with future quad optimization in mind, even if this lab course currently does not include any such.

The files that you need to modify

quads.hh and quads.cc

contain quad generation code for the AST nodes as well as the declaration and implementation of the quadruple, quad_list, quad_list_element and quad_list_iterator classes. These are the files you will edit in this lab.

symtab.cc

contains symbol table implementation. You will need to complete one more method in this lab. Other than that, use your version from the earlier labs.

Other files of interest

parser.y

is the input file to bison.

ast.hh

contains the definitions for the AST nodes.

ast.cc

contains (part of) the implementations of the AST nodes.

codegen.hh and codegen.cc

contain assembler generation code. You will not need to edit these files in this lab.

error.hh, error.cc, symtab.hh, symbol.cc, scanner.l, semantic.hh, semantic.cc, optimize.hh, and optimize.cc

use your versions from the earlier labs.

main.cc

this is the compiler wrapper, parsing flags and the like. Same as in the previous labs.

Makefile

and diesel use the same files as in the last lab.

Declarations and routines available in ast.hh, quads.hh, and quads.cc

Type declarations

quad_op_type contains the various quadruple operations.

Variable declarations

op_code contains the quad_op_type for a quadruple.

The operand1, operand2, and result fields described above are stored in the variables sym1, sym2, sym3, int1, int2, and int3 of the quadruple class, respectively. A more memory-efficient solution would have been to use unions for these three fields, but having them as separate variables lead to slightly less obfuscated code in the next lab. You will have to refer to the list above (see also the comment at the top of the file quads.hh describing the quads and their arguments) to figure out which quad wants what arguments (you would have needed to do that with unions too).

quad_list::last_label is the number of the q_labl quadruple which concludes each block. Quadruple generation for a block always starts by the programmer deciding what number the final q_labl is to have. Why? Well, it has to do with the return-quadruples which in assembler are implemented as (forward) jumps to the block’s exit. But the q_labl quadruple we want to jump to has not yet been generated. To avoid ‘backpatching’ we therefore jump to the q_labl which is going to get the number last_label. It is generated eventually as the last quadruple in the block. This is taken care of by the do_quads() methods.

quad_nr is used to number the quadruples. Numbering is used when the quadruples are written in the trace files; it has no significance to the compiling itself other than to assist debugging. head and tail are simply pointers to the first and last elements on a quad list. The list needs to be able to be iterated over, both for debug printouts and for assembler expansion later on (and optimization, if it were to be implemented).

Class declarations

quadruple is the class implementing the actual quads. It contains the variables sym1, sym2, … described above, and a quad_op denoting what quad it is. The class has a few constructors which support the various argument combinations that can arise. For the fields that don’t require an argument, pass a NULL_SYM to the constructor.

quad_list_element is a wrapper class for the quads seen as list elements, and contains links to the next quad on the list. It’s only purpose is to keep the code representation (the quadruple class) separate from the list implementation (the quad_list class).

quad_list_iterator is a convenience class for iterating over a quad_list. You will not have to bother yourself with this class, all code using it is already written.

quad_list is the class that actually contains the list of quads. The + operator has been overloaded to make it simple to add new quads to a quad list. Look at the prewritten methods for examples. It supports iteration over the list by use of the quad_list_iterator class. It also contains the last_label which is the number of the last q_labl quad the list has. Each program block will be transformed into a quad_list, which will in turn be passed as an argument to the assembler code generator in the next lab. The list is created in parser.y at the appropriate places. See that file for more information.

The following routines have already been written:

quad_list *ast_procedurehead::do_quads(ast_stmt_list *s)

Starts the generation of a quad list for a program block pointed to by the argument. It then adds on the last label quad, and returns the list when it is done.

Parameters

s – the procedure body to generate quads for.

quad_list *ast_functionhead::do_quads(ast_stmt_list *s)

Starts the generation of a quad list for a program block pointed to by the argument. It then adds on the last label quad, and returns the list when it is done.

Parameters

s – the function body to generate quads for.

Furthermore, the implementation for ast_while nodes, as well as assignments (which need some special treatment) have been written for you. Study them carefully; they will be very helpful as examples.

Program specification

  1. Complete the empty generate_quads() method bodies in quads.cc.

  2. In file sym_tab.cc: Complete the empty method body for

    sym_index symbol_table::gen_temp_var(sym_index)

    Given a symbol table index to a type (e.g., integer_type etc.), generates, installs and returns index to a temporary variable of that type. It is not meaningful to generate a temporary variable of void_type.

    Choose a method for giving the temporary variables names which can not collide with the user.

    Note

    The implementation used for the traces returns identifiers with exactly length MAX_TEMP_VAR_LENGTH. It pads the remainder of the identifier with spaces, giving temporary names such as:

    "$1      "
    "$2      "
    // ...
    "$1234   "
    
  3. Pay special attention to the handling of parameters. Make sure they are generated in the correct order.

  4. Try to generalise, e.g. treat all binary operators using the same routine, all binary relations using another routine, etc. Example:

    sym_index ast_add::generate_quads(quad_list &q) {
       return do_binaryoperation(q, q_iplus, q_rplus, this);
    }
    

Debugging help

You can send any individual quad, as well as a quad list, directly to cout. Also note that there is a flag to the diesel script, -q, which will print the quad list for each block during compiling.

How do I know it’s correct?

An example execution is included as a reference. Start out by compiling simple test programs, and compare the lists you get to what you believe they should look like. Test one AST node type at a time if you run into trouble.

Reporting your work

  • For the demonstration, run the test program for the test case quadtest1.d and discuss your code.

  • Hand in your code as described in Handing in Results.

Example execution

Running make lab6 will take the sample input and compare it to the sample output. You should test your code on more examples than this.

./diesel -b -q -y ../testpgm/quadtest1.d
Listing 14 trace/quadtest1.trace
Symbol table will be printed after compilation.
A quad list will be printed for each block.

Quad list for "FOO"
    1    q_itor     I          -          $1         
    2    q_rlt      $1         X          $2         
    3    q_jmpf     6          $2         -          
    4    q_iload    1          -          $3         
    5    q_iplus    I          $3         $4         
    6    q_iassign  $4         -          I          
    7    q_iload    1          -          $5         
    8    q_itor     $5         -          $6         
    9    q_rminus   X          $6         $7         
   10    q_rassign  $7         -          X          
   11    q_jmp      7          -          -          
   12    q_labl     6          -          -          
   13    q_itor     I          -          $8         
   14    q_rne      $8         X          $9         
   15    q_inot     $9         -          $10        
   16    q_jmpf     8          $10        -          
   17    q_jmp      7          -          -          
   18    q_labl     8          -          -          
   19    q_iload    0          -          $11        
   20    q_ine      I          $11        $12        
   21    q_jmpf     9          $12        -          
   22    q_iuminus  I          -          $13        
   23    q_iassign  $13        -          I          
   24    q_jmp      10         -          -          
   25    q_labl     9          -          -          
   26    q_iload    7          -          $14        
   27    q_iassign  $14        -          I          
   28    q_labl     10         -          -          
   29    q_iload    1          -          $15        
   30    q_itor     $15        -          $16        
   31    q_rplus    X          $16        $17        
   32    q_rassign  $17        -          X          
   33    q_iload    33         -          $18        
   34    q_param    $18        -          -          
   35    q_call     WRITE      1          (null)     

   36    q_labl     7          -          -          
   37    q_ireturn  5          I          -          
   38    q_labl     5          -          -          

Generating assembler for function "FOO"

Quad list for global level
    1    q_iload    2          -          $19        
    2    q_iload    1          -          $20        
    3    q_lindex   A          $20        $21        
    4    q_istore   $19        -          $21        
    5    q_iload    1          -          $22        
    6    q_irindex  A          $22        $23        
    7    q_iload    1          -          $24        
    8    q_irindex  A          $24        $25        
    9    q_iload    1          -          $26        
   10    q_iminus   $25        $26        $27        
   11    q_lindex   A          $27        $28        
   12    q_istore   $23        -          $28        
   13    q_iload    3          -          $29        
   14    q_itor     $29        -          $30        
   15    q_rassign  $30        -          X          
   16    q_param    X          -          -          
   17    q_call     TRUNC      1          $31        
   18    q_iassign  $31        -          I          
   19    q_iload    4          -          $32        
   20    q_itor     $32        -          $33        
   21    q_iload    4          -          $34        
   22    q_itor     $34        -          $35        
   23    q_iload    2          -          $36        
   24    q_itor     $36        -          $37        
   25    q_rdivide  $35        $37        $38        
   26    q_rplus    $33        $38        $39        
   27    q_rassign  $39        -          X          
   28    q_param    X          -          -          
   29    q_iload    3          -          $41        
   30    q_imult    $41        I          $42        
   31    q_param    $42        -          -          
   32    q_call     FOO        2          $40        
   33    q_iassign  $40        -          I          
   34    q_labl     19         -          -          

Generating assembler, global level
7GLOBAL.4VOID7INTEGER4REAL4READ5WRITE7INT-ARG5TRUNC8REAL-ARG8QUADTEST4SIZE1A1I1X3FOO8$1      8$2      8$3      8$4      8$5      8$6      8$7      8$8      8$9      8$10     8$11     8$12     8$13     8$14     8$15     8$16     8$17     8$18     8$19     8$20     8$21     8$22     8$23     8$24     8$25     8$26     8$27     8$28     8$29     8$30     8$31     8$32     8$33     8$34     8$35     8$36     8$37     8$38     8$39     8$40     8$41     8$42     
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------^ (pool_pos = 462)

Symbol table (size = 58):
Pos  Name      Lev Hash Back Offs Type      Tag
-----------------------------------------------
  0: GLOBAL.     0   -1  159    0 GLOBAL.   SYM_PROC      lbl = -1 ar_size = 0  
  1: VOID        0   -1   82    0 VOID      SYM_NAMETYPE  
  2: INTEGER     0   -1  462    0 VOID      SYM_NAMETYPE  
  3: REAL        0   -1  324    0 VOID      SYM_NAMETYPE  
  4: READ        0   -1  316    0 INTEGER   SYM_FUNC      lbl = 0  ar_size = 0  
  5: WRITE       0   -1  139    0 VOID      SYM_PROC      lbl = 1  ar_size = 0  
  6: INT-ARG     0   -1  210    0 INTEGER   SYM_PARAM     
  7: TRUNC       0   -1  332    0 INTEGER   SYM_FUNC      lbl = 2  ar_size = 0  
  8: REAL-ARG    0   -1  427    0 REAL      SYM_PARAM     
  9: QUADTEST    0   -1  203    0 VOID      SYM_PROC      lbl = 3  ar_size = 288
 10: SIZE        1   -1  475    0 INTEGER   SYM_CONST     value = 10
 11: A           1   -1   65    0 INTEGER   SYM_ARRAY     card = 10  
 12: I           1   -1   73   80 INTEGER   SYM_VAR       
 13: X           1   -1   88   88 REAL      SYM_VAR       
 14: FOO         1   -1   68    0 INTEGER   SYM_FUNC      lbl = 4  ar_size = 144
 15: I           2   -1   73    0 INTEGER   SYM_PARAM     
 16: X           2   -1   88    8 REAL      SYM_PARAM     prec = I           
 17: $1          2   -1  341    0 REAL      SYM_VAR       
 18: $2          2   -1   22    8 INTEGER   SYM_VAR       
 19: $3          2   -1  215   16 INTEGER   SYM_VAR       
 20: $4          2   -1  408   24 INTEGER   SYM_VAR       
 21: $5          2   -1   89   32 INTEGER   SYM_VAR       
 22: $6          2   -1  282   40 REAL      SYM_VAR       
 23: $7          2   -1  475   48 REAL      SYM_VAR       
 24: $8          2   -1  156   56 REAL      SYM_VAR       
 25: $9          2   -1  349   64 INTEGER   SYM_VAR       
 26: $10         2   -1  357   72 INTEGER   SYM_VAR       
 27: $11         2   -1    6   80 INTEGER   SYM_VAR       
 28: $12         2   -1  167   88 INTEGER   SYM_VAR       
 29: $13         2   -1  328   96 INTEGER   SYM_VAR       
 30: $14         2   -1  489  104 INTEGER   SYM_VAR       
 31: $15         2   -1  138  112 INTEGER   SYM_VAR       
 32: $16         2   -1  299  120 REAL      SYM_VAR       
 33: $17         2   -1  460  128 REAL      SYM_VAR       
 34: $18         2   -1  109  136 INTEGER   SYM_VAR       
 35: $19         1   -1  270   96 INTEGER   SYM_VAR       
 36: $20         1   -1   38  104 INTEGER   SYM_VAR       
 37: $21         1   -1  199  112 INTEGER   SYM_VAR       
 38: $22         1   -1  360  120 INTEGER   SYM_VAR       
 39: $23         1   -1    9  128 INTEGER   SYM_VAR       
 40: $24         1   -1  170  136 INTEGER   SYM_VAR       
 41: $25         1   -1  331  144 INTEGER   SYM_VAR       
 42: $26         1   -1  492  152 INTEGER   SYM_VAR       
 43: $27         1   -1  141  160 INTEGER   SYM_VAR       
 44: $28         1   -1  302  168 INTEGER   SYM_VAR       
 45: $29         1   -1  463  176 INTEGER   SYM_VAR       
 46: $30         1   -1  231  184 REAL      SYM_VAR       
 47: $31         1   -1  392  192 INTEGER   SYM_VAR       
 48: $32         1   -1   41  200 INTEGER   SYM_VAR       
 49: $33         1   -1  202  208 REAL      SYM_VAR       
 50: $34         1   -1  363  216 INTEGER   SYM_VAR       
 51: $35         1   -1   12  224 REAL      SYM_VAR       
 52: $36         1   -1  173  232 INTEGER   SYM_VAR       
 53: $37         1   -1  334  240 REAL      SYM_VAR       
 54: $38         1   -1  495  248 REAL      SYM_VAR       
 55: $39         1   -1  144  256 REAL      SYM_VAR       
 56: $40         1   -1  424  264 INTEGER   SYM_VAR       
 57: $41         1   -1   73  272 INTEGER   SYM_VAR       
 58: $42         1   -1  234  280 INTEGER   SYM_VAR       
1

10785233 is the decimal representation of 3.14 stored as a 64-bit (double-precision) float in IEEE format.