Lab 6: Intermediary Code¶
In this lab we will study intermediary code generation, especially the representation called quadruples.
Introduction¶
Intermediary code:
is closer to machine code without being machine-dependent.
can handle temporary variables (intermediate storage of results).
means higher portability. Intermediary code can easily be expanded to, for example, assembler – which we will study in the next lab.
offers the possibility of performing code optimization such as code transposition and register allocation.
Various types of intermediary code are:
Infix notation
Postfix notation (reverse Polish notation)
Three-address code:
Triples
Quadruples
The intermediary code we will use is quadruples where each instruction has four fields:
operator operand1 operand2 result
operand1
and operand2
denote the operands and result
denotes the
result of the operation, usually a temporary variable. The temporary
variables can be stored in registers or on the stack. In the lab the
temporary variables will be stored together with the local variables on
the stack.
Example:
(A + B) * (C + D) - E
Operator |
Operand1 |
Operand2 |
Result |
---|---|---|---|
+ |
A |
B |
T1 |
+ |
C |
D |
T2 |
* |
T1 |
T2 |
T3 |
- |
T3 |
E |
T4 |
where T1
, T2
, T3
, and T4
are temporary variables.
As you perhaps have worked out, operand1
, operand2
and result
are
often indexes in the symbol table.
Specifications for DIESEL¶
Traversing the abstract syntax tree, after type checking and optimization has been performed, one block at a time, generates quadruples.
The code given as an AST tree above is translated to:
q_iplus
10
11
13
q_idiv
13
12
14
q_iassign
14
0
9
The numbers are sym_index
positions given in the following symbol table:
9
10
11
12
13
14
A
B
C
D
T1
T2
Note that operations are typed, i.e. there are both q_iplus
(integer
addition) and q_rplus
(real addition). The operation to select depends
on the node type if it is an arithmetic operation, but on the children’s
types if it is a relational operation. For example, if the tree node
operation is ast_add
and both operands are of integer type, we can use
q_iplus
.
Handling real numbers¶
When generating assembler code, life will be a lot easier if we can assume that all real numbers are stored in 64 bits.
We can do this by storing real numbers as integers, in the IEEE format.
This is done by some pointer conversion magic in symbol_table::ieee()
, which is already written.
It takes as argument a double number, and returns a long representing it in the 64-bit IEEE format.
Whenever you generate a quad representing or treating a real number, such as a q_rload
quad below, you will need to first convert the real number (which is of type double
in our C++ program) to an IEEE integer. This is done quite conveniently:
sym_tab->ieee(value); // value is the double you want to convert.
Quadruple operations¶
(operation, operand1, operand2, result)
This is the structure used when describing the 39 different quadruple operations required for DIESEL. The result can usually also be considered the destination.
Load operations¶
(q_rload, long, -, sym_index)
Loads a temporary variable with a real number represented as a 64-bit integer in the IEEE format. Note that the operand is interpreted as a real number, not as an index to the symbol table. The operation is needed to describe expressions such as 3.14 * D.
If
q_rload
did not exist, how could we know what it says here?:(q_rmult, 10785233, 114, 143)
Instead we generate the quadruples:
(q_rload, 10785233, -, 143)
1(q_rmult, 143, 114, 144)
where we can determine that
143
is an index to the symbol table for a temporary variable which contains3.14
. The alternative would be to introduce extra fields in the operands specifying how they are to be interpreted.(q_iload, long, -, sym_index)
Loads a temporary variable with an integer constant, in a similar way to q_rload.
Store operations¶
Store quadruples are usually preceded by an address calculation for an
array element (see q_lindex
).
(operation, sym_index, -, address)
Operation
Description
q_rstore
Assignment of a real number value to memory address.
q_istore
Assignment of an integer value to memory address.
Unary operations¶
(operation, sym_index, -, sym_index)
Operation
Description
q_inot
Logical negation.
q_ruminus
Unary minus for real numbers.
q_iuminus
Unary minus for integers.
q_rassign
Assignment of real value to real variable.
q_iassign
Assignment of integer value to integer variable.
q_itor
Type conversion from integer to real.
Binary operations¶
Note that logical operations and comparisons return 1
for true
and 0
for false although the input considers any non-zero argument
to be true.
(operation, sym_index, sym_index, sym_index)
Operation
Description
q_rplus
Real number addition.
q_iplus
Integer addition.
q_rminus
Real number subtraction
q_iminus
Integer subtraction.
q_ior
Logical “or”.
q_iand
Logical “and”.
q_rmult
Real number multiplication.
q_imult
Integer multiplication.
q_rdivide
Real number division.
q_idivide
Integer division.
q_imod
Integer remainder.
q_req
Real equality comparison.
q_ieq
Integer equality comparison.
q_rne
Real inequality comparison.
q_ine
Integer inequality comparison.
q_rlt
Real less than comparison.
q_ilt
Integer less than comparison.
q_rgt
Real greater than comparison.
q_igt
Integer greater than comparison.
Subroutines¶
(q_call, sym_index, no.params, sym_index)
q_call
is the call of a procedure or function whose position in the symbol table is given byoperand1
and the number of parameters is given byoperand2
. The quadruple is preceded by zero or more parameters (seeq_param
). A function returns the value in result. For a procedure the result should beNULL_SYM
.(operation, labelno., sym_index, -)
Returns a value from a function.
operand2
is the function’s result value.operand1
specifies the number on the label which is placed at the end of the function (last_label
).Operation
Description
q_rreturn
Returns a real number.
q_ireturn
Returns an integer.
(q_param, sym_index, -, -)
Specifies that
operand1
is to be sent as an argument (actual parameter) to a procedure/function.
Example:
foo(a, bar(b), c);
gives us:
q_param |
11 |
0 |
0 |
q_param |
10 |
0 |
0 |
q_call |
13 |
1 |
14 |
q_param |
14 |
0 |
0 |
q_param |
9 |
0 |
0 |
q_call |
12 |
3 |
0 |
where the symbol table is defined below:
9 |
10 |
11 |
12 |
13 |
14 |
---|---|---|---|---|---|
A |
B |
C |
FOO |
BAR |
T1 |
Pay special attention to the treatment of the case above, when the result of a function call becomes the parameter of another function. Understanding this will come in handy in the last lab.
Array indexing¶
(q_lindex, sym_index, sym_index, sym_index)
Calculates the address (l-value index) for a new array element. This is used as the assignment is made to an array element,
operand1
is the array,operand2
holds the index into the array. The address is put in result and can as far as we are concerned here be stored in an integer, since DIESEL does not allow user-defined types.(q_rrindex, sym_index, sym_index, sym_index)
Real r-value indexing. Retrieves the value of an element in an array (of real).
operand1
is the array,operand2
holds the index into the array. The element’s value is put in result.(q_irindex, sym_index, sym_index, sym_index)
Integer r-value indexing. Retrieves the value of an element in an array (of integer).
operand1
is the array,operand2
holds the index into the array. The element’s value is put in result.
The example illustrates q_lindex
and q_irindex
(a is assumed to be
declared as an integer array) :
a[a[1]] := a[2];
gives us:
q_iload |
2 |
0 |
10 |
q_irindex |
9 |
10 |
11 |
q_iload |
1 |
0 |
12 |
q_irindex |
9 |
12 |
13 |
q_lindex |
9 |
13 |
14 |
q_istore |
11 |
0 |
14 |
where the symbol table is given below:
9 |
10 |
11 |
12 |
13 |
14 |
---|---|---|---|---|---|
A |
T1 |
T2 |
T3 |
T4 |
T5 |
Other operations¶
(q_labl, labelno., -, -)
Marks a position to where a jump can be made.
(q_jmp, labelno., -, -)
Unconditional jump to the specified position.
(q_jmpf, labelno., -, -)
Conditional jump to the specified position. The jump is made if
operand2
is zero (hence the name, “jump if false”).(q_nop, -, -, -)
Illegal operation.
Implementation¶
In this lab, you will write the routines for converting the internal form we have been working with so far, the abstract syntax tree, into quadruples (quads).
This is done one block at a time by traversing the AST.
The quad generation is started from parser.y with a call to do_quads()
, which in turn starts a recursive call generate_quads()
which propagates down the AST for the program block we’re currently treating (if you’re starting to recognize a pattern from the semantic and optimizing labs
here, you’re definitely on to something).
This method returns a sym_index
to wherever its result was stored, if any.
A difference is that the do_quads()
method is declared in the classes ast_functionhead
and ast_procedurehead
instead of in a separate class, like in the two previous labs.
The reasons are that such a class would have been small enough to be trivial, and also to show a variant approach to making the interfaces between the compiler phases as clean as possible.
In some cases methods with other names than generate_quads()
need to be defined.
These cases can easily be found by reading the quads.cc
file, and/or the ast.hh
file.
Read the code comments for more information on the individual cases.
Studying the pre-written implementations of a few methods will prove extremely helpful in understanding what is going on here.
The result of the traversal is a quad_list
containing all the quads generated while traversing the AST for a program block.
New quads are added onto this list as they are created.
The quad_list
is a class of its own, declared in quads.hh
, which is written with future quad optimization in mind, even if this lab course currently does not include any such.
The files that you need to modify¶
- quads.hh and quads.cc
contain quad generation code for the AST nodes as well as the declaration and implementation of the quadruple,
quad_list
,quad_list_element
andquad_list_iterator
classes. These are the files you will edit in this lab.- symtab.cc
contains symbol table implementation. You will need to complete one more method in this lab. Other than that, use your version from the earlier labs.
Other files of interest¶
- parser.y
is the input file to bison.
- ast.hh
contains the definitions for the AST nodes.
- ast.cc
contains (part of) the implementations of the AST nodes.
- codegen.hh and codegen.cc
contain assembler generation code. You will not need to edit these files in this lab.
- error.hh, error.cc, symtab.hh, symbol.cc, scanner.l, semantic.hh, semantic.cc, optimize.hh, and optimize.cc
use your versions from the earlier labs.
- main.cc
this is the compiler wrapper, parsing flags and the like. Same as in the previous labs.
- Makefile
and diesel use the same files as in the last lab.
Declarations and routines available in ast.hh, quads.hh, and quads.cc¶
Type declarations¶
quad_op_type
contains the various quadruple operations.
Variable declarations¶
op_code
contains the quad_op_type
for a quadruple.
The operand1
, operand2
, and result
fields described above are stored in the variables sym1
, sym2
, sym3
, int1
, int2
, and int3
of the quadruple
class, respectively.
A more memory-efficient solution would have been to use unions for these three fields, but having them as separate variables lead to slightly less obfuscated code in the next lab.
You will have to refer to the list above (see also the comment at the top of the file quads.hh
describing the quads and their arguments) to figure out which quad wants what arguments (you would have needed to do that with unions too).
quad_list::last_label
is the number of the q_labl
quadruple which concludes each block.
Quadruple generation for a block always starts by the programmer deciding what number the final q_labl
is to have.
Why? Well, it has to do with the return
-quadruples which in assembler are implemented as (forward) jumps to the block’s exit.
But the q_labl
quadruple we want to jump to has not yet been generated.
To avoid ‘backpatching’ we therefore jump to the q_labl
which is going to get the number last_label
.
It is generated eventually as the last quadruple in the block.
This is taken care of by the do_quads()
methods.
quad_nr
is used to number the quadruples.
Numbering is used when the quadruples are written in the trace files; it has no significance to the compiling itself other than to assist debugging.
head
and tail
are simply pointers to the first and last elements on a quad list.
The list needs to be able to be iterated over, both for debug printouts and for assembler expansion later on (and optimization, if it were to be implemented).
Class declarations¶
quadruple
is the class implementing the actual quads. It contains the
variables sym1
, sym2
, … described above, and a quad_op
denoting
what quad it is. The class has a few constructors which support the
various argument combinations that can arise. For the fields that don’t
require an argument, pass a NULL_SYM
to the constructor.
quad_list_element
is a wrapper class for the quads seen as list
elements, and contains links to the next quad on the list. It’s only
purpose is to keep the code representation (the quadruple
class)
separate from the list implementation (the quad_list
class).
quad_list_iterator
is a convenience class for iterating over a
quad_list
. You will not have to bother yourself with this class, all
code using it is already written.
quad_list
is the class that actually contains the list of quads. The
+
operator has been overloaded to make it simple to add new quads to a
quad list. Look at the prewritten methods for examples. It supports
iteration over the list by use of the quad_list_iterator
class.
It also contains the last_label
which is the number of the last q_labl
quad the list has.
Each program block will be transformed into a quad_list
, which will in turn be passed as an argument to the assembler code generator in the next lab.
The list is created in parser.y
at the appropriate places.
See that file for more information.
The following routines have already been written:
-
quad_list *
ast_procedurehead
::
do_quads
(ast_stmt_list *s) Starts the generation of a quad list for a program block pointed to by the argument. It then adds on the last label quad, and returns the list when it is done.
- Parameters
s – the procedure body to generate quads for.
-
quad_list *
ast_functionhead
::
do_quads
(ast_stmt_list *s) Starts the generation of a quad list for a program block pointed to by the argument. It then adds on the last label quad, and returns the list when it is done.
- Parameters
s – the function body to generate quads for.
Furthermore, the implementation for ast_while
nodes, as well as assignments (which need some special treatment) have been written for you.
Study them carefully; they will be very helpful as examples.
Program specification¶
Complete the empty
generate_quads()
method bodies inquads.cc
.In file
sym_tab.cc
: Complete the empty method body for-
sym_index
symbol_table
::
gen_temp_var
(sym_index) Given a symbol table index to a type (e.g.,
integer_type
etc.), generates, installs and returns index to a temporary variable of that type. It is not meaningful to generate a temporary variable ofvoid_type
.Choose a method for giving the temporary variables names which can not collide with the user.
Note
The implementation used for the traces returns identifiers with exactly length
MAX_TEMP_VAR_LENGTH
. It pads the remainder of the identifier with spaces, giving temporary names such as:"$1 " "$2 " // ... "$1234 "
-
sym_index
Pay special attention to the handling of parameters. Make sure they are generated in the correct order.
Try to generalise, e.g. treat all binary operators using the same routine, all binary relations using another routine, etc. Example:
sym_index ast_add::generate_quads(quad_list &q) { return do_binaryoperation(q, q_iplus, q_rplus, this); }
Debugging help¶
You can send any individual quad, as well as a quad list, directly to
cout
. Also note that there is a flag to the diesel
script, -q
,
which will print the quad list for each block during compiling.
How do I know it’s correct?¶
An example execution is included as a reference. Start out by compiling simple test programs, and compare the lists you get to what you believe they should look like. Test one AST node type at a time if you run into trouble.
Reporting your work¶
For the demonstration, run the test program for the test case
quadtest1.d
and discuss your code.Hand in your code as described in Handing in Results.
Example execution¶
Running make lab6
will take the sample input and compare it to the
sample output. You should test your code on more examples than this.
./diesel -b -q -y ../testpgm/quadtest1.d
Symbol table will be printed after compilation.
A quad list will be printed for each block.
Quad list for "FOO"
1 q_itor I - $1
2 q_rlt $1 X $2
3 q_jmpf 6 $2 -
4 q_iload 1 - $3
5 q_iplus I $3 $4
6 q_iassign $4 - I
7 q_iload 1 - $5
8 q_itor $5 - $6
9 q_rminus X $6 $7
10 q_rassign $7 - X
11 q_jmp 7 - -
12 q_labl 6 - -
13 q_itor I - $8
14 q_rne $8 X $9
15 q_inot $9 - $10
16 q_jmpf 8 $10 -
17 q_jmp 7 - -
18 q_labl 8 - -
19 q_iload 0 - $11
20 q_ine I $11 $12
21 q_jmpf 9 $12 -
22 q_iuminus I - $13
23 q_iassign $13 - I
24 q_jmp 10 - -
25 q_labl 9 - -
26 q_iload 7 - $14
27 q_iassign $14 - I
28 q_labl 10 - -
29 q_iload 1 - $15
30 q_itor $15 - $16
31 q_rplus X $16 $17
32 q_rassign $17 - X
33 q_iload 33 - $18
34 q_param $18 - -
35 q_call WRITE 1 (null)
36 q_labl 7 - -
37 q_ireturn 5 I -
38 q_labl 5 - -
Generating assembler for function "FOO"
Quad list for global level
1 q_iload 2 - $19
2 q_iload 1 - $20
3 q_lindex A $20 $21
4 q_istore $19 - $21
5 q_iload 1 - $22
6 q_irindex A $22 $23
7 q_iload 1 - $24
8 q_irindex A $24 $25
9 q_iload 1 - $26
10 q_iminus $25 $26 $27
11 q_lindex A $27 $28
12 q_istore $23 - $28
13 q_iload 3 - $29
14 q_itor $29 - $30
15 q_rassign $30 - X
16 q_param X - -
17 q_call TRUNC 1 $31
18 q_iassign $31 - I
19 q_iload 4 - $32
20 q_itor $32 - $33
21 q_iload 4 - $34
22 q_itor $34 - $35
23 q_iload 2 - $36
24 q_itor $36 - $37
25 q_rdivide $35 $37 $38
26 q_rplus $33 $38 $39
27 q_rassign $39 - X
28 q_param X - -
29 q_iload 3 - $41
30 q_imult $41 I $42
31 q_param $42 - -
32 q_call FOO 2 $40
33 q_iassign $40 - I
34 q_labl 19 - -
Generating assembler, global level
7GLOBAL.4VOID7INTEGER4REAL4READ5WRITE7INT-ARG5TRUNC8REAL-ARG8QUADTEST4SIZE1A1I1X3FOO8$1 8$2 8$3 8$4 8$5 8$6 8$7 8$8 8$9 8$10 8$11 8$12 8$13 8$14 8$15 8$16 8$17 8$18 8$19 8$20 8$21 8$22 8$23 8$24 8$25 8$26 8$27 8$28 8$29 8$30 8$31 8$32 8$33 8$34 8$35 8$36 8$37 8$38 8$39 8$40 8$41 8$42
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------^ (pool_pos = 462)
Symbol table (size = 58):
Pos Name Lev Hash Back Offs Type Tag
-----------------------------------------------
0: GLOBAL. 0 -1 159 0 GLOBAL. SYM_PROC lbl = -1 ar_size = 0
1: VOID 0 -1 82 0 VOID SYM_NAMETYPE
2: INTEGER 0 -1 462 0 VOID SYM_NAMETYPE
3: REAL 0 -1 324 0 VOID SYM_NAMETYPE
4: READ 0 -1 316 0 INTEGER SYM_FUNC lbl = 0 ar_size = 0
5: WRITE 0 -1 139 0 VOID SYM_PROC lbl = 1 ar_size = 0
6: INT-ARG 0 -1 210 0 INTEGER SYM_PARAM
7: TRUNC 0 -1 332 0 INTEGER SYM_FUNC lbl = 2 ar_size = 0
8: REAL-ARG 0 -1 427 0 REAL SYM_PARAM
9: QUADTEST 0 -1 203 0 VOID SYM_PROC lbl = 3 ar_size = 288
10: SIZE 1 -1 475 0 INTEGER SYM_CONST value = 10
11: A 1 -1 65 0 INTEGER SYM_ARRAY card = 10
12: I 1 -1 73 80 INTEGER SYM_VAR
13: X 1 -1 88 88 REAL SYM_VAR
14: FOO 1 -1 68 0 INTEGER SYM_FUNC lbl = 4 ar_size = 144
15: I 2 -1 73 0 INTEGER SYM_PARAM
16: X 2 -1 88 8 REAL SYM_PARAM prec = I
17: $1 2 -1 341 0 REAL SYM_VAR
18: $2 2 -1 22 8 INTEGER SYM_VAR
19: $3 2 -1 215 16 INTEGER SYM_VAR
20: $4 2 -1 408 24 INTEGER SYM_VAR
21: $5 2 -1 89 32 INTEGER SYM_VAR
22: $6 2 -1 282 40 REAL SYM_VAR
23: $7 2 -1 475 48 REAL SYM_VAR
24: $8 2 -1 156 56 REAL SYM_VAR
25: $9 2 -1 349 64 INTEGER SYM_VAR
26: $10 2 -1 357 72 INTEGER SYM_VAR
27: $11 2 -1 6 80 INTEGER SYM_VAR
28: $12 2 -1 167 88 INTEGER SYM_VAR
29: $13 2 -1 328 96 INTEGER SYM_VAR
30: $14 2 -1 489 104 INTEGER SYM_VAR
31: $15 2 -1 138 112 INTEGER SYM_VAR
32: $16 2 -1 299 120 REAL SYM_VAR
33: $17 2 -1 460 128 REAL SYM_VAR
34: $18 2 -1 109 136 INTEGER SYM_VAR
35: $19 1 -1 270 96 INTEGER SYM_VAR
36: $20 1 -1 38 104 INTEGER SYM_VAR
37: $21 1 -1 199 112 INTEGER SYM_VAR
38: $22 1 -1 360 120 INTEGER SYM_VAR
39: $23 1 -1 9 128 INTEGER SYM_VAR
40: $24 1 -1 170 136 INTEGER SYM_VAR
41: $25 1 -1 331 144 INTEGER SYM_VAR
42: $26 1 -1 492 152 INTEGER SYM_VAR
43: $27 1 -1 141 160 INTEGER SYM_VAR
44: $28 1 -1 302 168 INTEGER SYM_VAR
45: $29 1 -1 463 176 INTEGER SYM_VAR
46: $30 1 -1 231 184 REAL SYM_VAR
47: $31 1 -1 392 192 INTEGER SYM_VAR
48: $32 1 -1 41 200 INTEGER SYM_VAR
49: $33 1 -1 202 208 REAL SYM_VAR
50: $34 1 -1 363 216 INTEGER SYM_VAR
51: $35 1 -1 12 224 REAL SYM_VAR
52: $36 1 -1 173 232 INTEGER SYM_VAR
53: $37 1 -1 334 240 REAL SYM_VAR
54: $38 1 -1 495 248 REAL SYM_VAR
55: $39 1 -1 144 256 REAL SYM_VAR
56: $40 1 -1 424 264 INTEGER SYM_VAR
57: $41 1 -1 73 272 INTEGER SYM_VAR
58: $42 1 -1 234 280 INTEGER SYM_VAR
- 1
10785233
is the decimal representation of3.14
stored as a 64-bit (double-precision) float in IEEE format.