Lab 2 : Symbol Table¶
In this lab, you will write functions to administrate a symbol table. If you are unsure about scope-rules, you should read pages 394 - 396 (as far as section 7.2) in the course textbook.
Introduction¶
In the symbol table information is gathered about the names 1 that appear in the program. The symbol table primarily helps:
In checking the program’s semantic correctness.
Using the symbol table you can, for example, determine whether a variable has been declared and in that case if it is of an integer or floating-point type.
In generating code.
The symbol table keeps track, for example, of memory requirements for various variables.
Furthermore, the following features are desirable:
Support for block structures and scope rules, if the language requires that.
In block-structured languages the same identifier can occur several times, but in different blocks. The symbol table, therefore, ought to be organised so you have access to the variables which are visible at a given time.
Compact storage.
An implementation requiring little memory is to be aimed for.
Quick addition of, and searching for names.
The need to search fast collides (as usual) with memory space demands, as mentioned above.
Resident storage.
The symbol table may be needed later, if the compiler works in several runs, or for diagnosis of execution errors.
The information stored in the symbol table is a list of pairs (name, attribute), also often referred to as bindings:
- Name
The name forms a search key in the symbol table. A problem which must be dealt with is how to efficiently store all significant characters in the identifier.
- Attributes which are dependent on the language to be compiled
These include type information, array dimensions, the number of parameters in procedures and functions, as well as line numbers for declarations.
- Attributes which are dependent on the language we are compiling to
For example, if we are generating machine code, we will have problems with references to addresses in memory, whose values are unknown at code generation time. If we choose to generate assembler, we can instead refer to a so-called label (symbolic position) whose value is determined later. Assembler will then perform the translation to absolute addresses in memory.
The operations we would like to perform on the symbol tables are:
Look up a name
Add a new name
Add an attribute to a name
Retrieve the value of an attribute of a name
Remove a name and its associated information
Data structures for symbol tables¶
When judging various data structures, the following should be taken into consideration:
Memory space,
Time taken to add and search for a name,
How difficult it is to introduce scoping.
Linear lists¶
Searching is performed in linear fashion from beginning to end. The search halts as soon as the name you are looking for, is found.
Linear lists are easy to implement and require little memory space. Scoping is easily introduced using the LIFO principle. However, search times are unacceptable. Reports have shown that a compiler, which implements symbol tables using linear lists, can spend 25% of compilation time on searching.
Trees¶
The symbol table can be in the form of a tree; then each subprogram has a symbol table attached to its node in the abstract syntax tree. The main program has a similar table for globally declared objects. The advantage of this structure is that it is easy to represent scoping and is quicker than linear lists.
Hashing¶
When high demands are made on the search time, hashing is used. Entry to the hash table is calculated using a suitable key transformation. If the symbol is not in the entry, you have to follow the link field. By making the hash table large enough you can control the search time.
However, extra space is required if you do not want to risk filling the hash table 2. Introducing scoping is somewhat more difficult than for linear lists and trees.
The symbol table in DIESEL¶
The symbol table to be implemented, uses hashing with linking. In DIESEL
there are seven different types of symbols: constants, variables,
arrays, parameters (formal), procedures, functions and name types
(integer
, real
, and void
). Each symbol type is represented
by a separate class, and all symbols inherit from the abstract class
symbol
. Let us first examine what information is common to all
symbols (and is found in the symbol
class):
-
class
symbol
The symbol table consists of entries of subclasses to symbol. This class contains data that are common to all symbol types. This class is never used directly. Use the derived classes instead.
Subclassed by array_symbol, constant_symbol, function_symbol, nametype_symbol, parameter_symbol, procedure_symbol, variable_symbol
Public Members
-
pool_index
id
Index to the string_pool, ie, its name.
The object’s identifier is stored in the string table (i.e. the spelling, see previous lab). Index to string table (called
id
in the lab skeleton) is a value denoting where an identifier starts in the string table. Example:The scanner from lab1 has created a string table that look like this (the number denotes the length of the following identifier):
7GLOBAL.4VOID7INTEGER...
The id to
GLOBAL.
is 0 because it starts on the first position in the string table (the index to the first position in an array in C++ is 0).
-
sym_type
tag
This field is only used to give us a way to differentiate between the various symbol classes when we need to do a safe downcast from the superclass. The default value of tag is
SYM_UNDEF
.
-
sym_index
type
In practice type is used only by constants, variables, arrays, parameters and functions; all other symbol types are of type void. The type field contains an index to a name type which in DIESEL only can be
integer_type
,real_type
, orvoid_type
. These will be preinstalled in the symbol table.
-
sym_index
hash_link
If the object you are searching for not is found immediately after the key transformation, you follow the hash link backwards in the table.
-
hash_index
back_link
This link points back to the hash table from the symbol table and can be used to speed up certain types of lookups. It is possible to complete the lab without using it, though.
-
block_level
level
The lexical level states how deeply nested the object is in the program. For example, an object on the first level is global. Eight levels is the maximum in Diesel.
-
int
offset
Offset specifies which relative position the object has in the memory space which has been reserved for a given lexical level on the runtime stack. This is used during code generation.
-
pool_index
There are specialized symbols with additional attributes:
-
class
constant_symbol
: public symbol Derived symbol type, used for constants.
Public Members
-
constant_value
const_value
Value of a constant, can be int or float. The value is stored in a union; check the
#type
attribute to figure out if you should accessivar
orrvar
.
-
constant_value
-
class
variable_symbol
: public symbol Derived symbol type, used for variables.
-
class
array_symbol
: public symbol Derived symbol type, used for arrays.
Study the following declaration:
var arr : array[10] of real;
The type attribute (see above) in this case is an index to the name type
real
. We see that we also need an attribute,#array_cardinality
, which in this example would have the value10
. To be slightly more general, we also have an attribute,#index_type
, which at present can only point to the name type#integer_type
.Public Members
-
sym_index
index_type
Points to the index type in the symbol table.
-
int
array_cardinality
Note: cardinality = nr of elements,.
-
sym_index
-
class
parameter_symbol
: public symbol Parameters require particular checking, as the formal parameters are to be bound to the actual ones. Assume the following declaration:
procedure p(i : integer; r : real);
which is then called with the following actual parameters:
p(22/7, 355/113);
To perform a semantic check, we must find the formal parameters (
i
andr
) in the symbol table and compare their type attributes with the actual parameters. In this example a type conflict occurs when the value of the expression22/7
(which is areal
) is to be bound to theinteger
variablei
.For this reason the formal parameters are linked together in the symbol table. The
preceding
attribute points to the previous parameter in the link chain. The procedure (or function) points to thelast
parameter. It might seem counter-intuitive that the parameters are stored backwards like this, but it will actually make life easier later on.The handling of parameters is actually one of the trickiest parts to understand correctly in this entire lab course. Some work has already been done for you, but you will have to do a lot of it yourself. Make sure to read the code comments carefully when dealing with parameters, and be sure to have this lab material handy as well. The order of the parameters will be flipped around during compiling more than once. Once you have that sequence clear, getting them to work will be a lot easier.
Public Members
-
int
size
Nr of bytes parameter needs.
-
parameter_symbol *
preceding
Link to preceding parameter, if any.
-
int
Procedures and functions have identical members but different semantics:
-
class
procedure_symbol
: public symbol Derived symbol type, used for procedures.
Public Members
-
int
ar_size
Activation record size.
Specifies the memory space that is needed to store the local variables belonging to the procedure/function on the runtime stack. A procedure that declares an array of ten integers, two reals and three integers requires (10+2+3)*8 bytes.
-
int
label_nr
The label number which is assigned to the procedure or function in assembler code.
-
parameter_symbol *
last_parameter
List of parameters. We store them in reverse order to make type-checking easier later on.
-
int
-
class
function_symbol
: public symbol Derived symbol type, used for functions.
Public Members
-
int
ar_size
Activation record size.
Specifies the memory space that is needed to store the local variables belonging to the procedure/function on the runtime stack. A procedure that declares an array of ten integers, two reals and three integers requires (10+2+3)*8 bytes.
-
int
label_nr
The label number which is assigned to the procedure or function in assembler code.
-
parameter_symbol *
last_parameter
List of parameters. We store them in reverse order to make type-checking easier later on.
-
int
The symbol table structure¶
The symbol table itself is made up of an array of pointers to symbol
objects, which belong to one of the classes mentioned above. A symbol is
accessed via its index, a sym_index
, rather than by a pointer. Why?
To allow us to use hashing.
A variable called sym_pos
keeps track of the current position in the
table. It will always contain the index to the last symbol that was
entered into the table. Note that the first element in an array in C++
has index 0
(zero).
The symbol table also contains a string pool, where identifiers and string constants are stored.
A hash table is used for fast access to symbols.
A block table, or display, is used to keep track of the current nesting depth. Read up on displays in the course book. It isn’t really that complicated once you get the hang of it.
To try to make things clearer, let us study how the symbol table is constructed step by step.
At the beginning the table is empty:

In the diagram, we have chosen (for reasons of space) to show only the attributes we care about right now (id
and hash_link
).
We assume the following functions exist (see the top of symtab.hh
for some type definitions used here):
-
class
symbol_table
Public Functions
-
sym_index
install_symbol
(const pool_index, const sym_type tag) Installs a symbol in the symbol table and returns its index. This method installs a symbol in the symbol table and returns its index. If the symbol already existed at the same lexical level (i.e. two objects with the same identifi and value of the attribute level), the object will not be installed, and the index of the symbol that already exists is returned.
If a symbol is installed, its tag will be set to
SYM_UNDEF
(which is done by default in the symbol constructor). The appropriate tag will then be set in theenter_
method that calledsymbol_table::install_symbol
. In this way name collisions can be discovered. For instance: The variablem
is declared twice on the same lexical level, and that is of course not correct. When the firstm
is encountered the methodsymbol_table::enter_variable
will try to install the symbol in the symbol table. It callssymbol_table::install_symbol
which creates and install a symbol of the correct type. The symbol’s tag is now set toSYM_UNDEF
. Whensymbol_table::install_symbol
finishes, the execution continues insidesymbol_table::enter_variable
and the method checks if the recent installed symbol’s tag is set toSYM_UNDEF
, and if it’s true, it changes the tag toSYM_VAR
. When the secondm
is encountered the procedure is repeated, but this timesymbol_table::install_symbol
returns thesym_index
to the firstm
. When checking the tag of this symbol,symbol_table::enter_variable
gets the answerSYM_VAR
and notSYM_UNDEF
. Thereforesymbol_table::enter_variable
knows that a symbol with this name already exists in the symbol table at this lexical level and outputs an error message. The secondm
is never installed. Furthermore, the level attribute will be set to the current lexical level,symbol_table#current_level
(which is currently zero).Note that the first argument to
symbol_table::install_symbol
is an index to the string table (symbol_table#string_pool
). This is, of course, what the scanner outputs in theYYSTYPE::pool_p
field of theyylval
union. If you had problems understanding what it was for earlier, hopefully some of that has become more clear now!The second argument is the tag that the symbol should have, it denotes which symbol class should be used (
constant_symbol
,variable_symbol
, etc.).
-
sym_index
lookup_symbol
(const pool_index) Look for a symbol with the same identifier as the argument specifies in the string table, and returns its symbol table index. The search is performed according to the scoping rules, i.e. from the current lexical level and outwards. Make sure you understand what this means! If the object is not found,
NULL_SYM
is returned.
-
void
open_scope
() Opens a new lexical level.
The routine is called when the name of a procedure or function has been installed. Note that
proc
lies outside the lexical level thati
andloc
have, in this example:procedure proc(i : integer); var loc : integer; begin end;
The
#block_table
keeps track of the active blocks in the program. When we have treated the wholeproc
procedure above, we must make sure that theloc
variable is not “visible”. We shall soon see how this is achieved.
-
sym_index
close_scope
() Closes the current lexical level and returns a symbol table index to the new lexical level.
The routine is called when we have finished with a procedure or function. The local variables that were there will then become “invisible”, i.e., the following program code can not reference them.
Private Members
-
char *
string_pool
The actual string pool.
-
block_level
current_level
Current nesting depth.
-
sym_index *
block_table
Table of level sym_index pointers. They point at the start of a new scope/block.
-
sym_index
When creating a new symbol table object, the name types global
,
integer_type
, real
and void
as well as the standard
functions trunc
and read
, and the standard procedure write
are installed in the symbol table.
To save space in the examples below, we skip global
, void
,
trunc
, and integer
; and starts with the name type real
instead. The principle is the same.
How the figures should be read:
A dotted arrow means that the link is removed in this picture: |
|
A dashed arrow means that the link is added in this picture: |
|
A normal (filled) arrow means that the link was added in any previous picture: |
Inserting REAL into the empty symbol table:

The symbol’s real
pool_pos
value is hashed. A symbol of the type
real
is created, and it’s index to the string table is set to
real
’s pool_pos
. Since there is no other
symbol on this position in the hash table the hash link is set to
nothing, i.e.,
NULL_SYM
(this is symbolized by the
diagonal line in the square named hash link) . The index to the symbol
is saved in the hash table
.
The pointer
sym_pos
is increased by one .
Step 2:
The symbol write
is installed in the same manner as real
:

Step 3: Inserting READ into the symbol table and linking back to REAL:

The symbol’s read
pool_pos
is hashed and happens to have the
same hash value as read
, therefore there is a hash table collision
between read
and real
.
The symbol is installed and its id is set to the
pool_pos
value of
read
. The index
to the symbol is saved in the hash table
,
and the previous
index, that pointed to the first symbol (
real
), is lost .
Since we just can’t throw away
real
, we set the hash_link
of
read
to point to real
.
We can now find both
read
and real
in the symbol table by following the hash link. The
sym_pos
variable is increased for every object added, and always
points to the last installed object. The pool_pos
variable plays a
similar role for the string table.
We can now start to deal with the program, which is assumed to look like this:
program prog;
var a : integer; b : integer; c : integer;
procedure p1;
var a : integer;
begin
c := b + a;
end;
begin
c := b + a;
end.
The first identifier to be detected by the scanner is prog
. We
install it:

The method that called install_symbol()
(enter_procedure()
) sets,
among other things, the tag of PROG
to SYM_PROC
3. This is
immediately followed by a call to open_scope()
as a new lexical
level begins; the global level:

The current lexical level (or depth) is now 1
. The first element
of the block table now points to the last object inserted
(the current
sym_pos
is saved into the block table), which will be
the procedure/function we just entered. This is a convention which means
that we always know which procedure or function we are in, namely:
sym_table[block_table[current_level]]
Incidentally, this is identical to:
symtab->get_symbol(sym_tab->current_environment())
which is how you can reference the current environment from outside the symbol table. More about that in the next lab, however.
current_level
thus points to the current block level and therefore goes up and down.
When compilation has finished current_level
should be the same as when compilation started (zero).
Exercise
This exercise is not mandatory, but it helps you understand the implementation.
Assume that REAL
has been stored in the string table (pool table) in
position 9
. Work out what happens at the following call:
i := lookup_symbol(9);
The next identifiers to be installed are a
, b
and
c
. All three give rise to collisions during hashing. Therefore
they are all installed in the same manner as read
. This
description is valid for all the three pictures: A hash table collision
occurs .
The symbol is installed and its id is set to the
pool_pos
value of the symbol .
The index to the symbol is
saved in the hash table
and the previous index that pointed
to some other symbol is removed
.
The hash link of the installed symbol is set to point to the other symbol instead
, which would be lost otherwise.
Installing a
:

Installing b
:

Installing c
:

Take a look at how prog
can be found by following the hash links from
c
. Also, note
how write
, which is on another lexical level than the current one,
is still “visible” via the hash links.
The next identifier is p1
. There is no collision this time and
p1
is installed in the same manner as real
and write
.
Since p1
is a procedure, open_scope()
is called after the
installation. The current lexical level is now 2
and the current
sym_pos
has been saved in the block table .

We now have two active blocks. Compilation continues and we find
identifier a
. We install it while we start at the scope p1
:

Exercise
This exercise is not mandatory, but helpful for understanding the implementation.
There are now two variables called a
in the program. Why is it that
the a
we just found does not collide with the first one and generate
the error message “already declared”?
After the declaration we find the statement
c := b + a;
which results in three calls to lookup_symbol()
:
lookup_symbol(poolindex of c);
We hash c
and follow the index from the hash table to the symbol
table. Is c
there? Yes, return its index in the table (7
).
lookup_symbol(poolindex of b);
We hash b
and follow the index from the hash table to the symbol
table. Is b
there? No, c
was there so we follow the hash link
backwards. Is b
there? Yes, return its index (6
).
lookup_symbol(poolindex of a);
We hash a
and follow the index from the hash table to the symbol
table. Is a
there? Yes, return its index (9
).
We can see, then, how lookup_symbol()
always follows the scope rules.
If you didn’t see it, read about scope rules in the course book or the
lecture notes, then look through these figures again.
We now come to the end of procedure p1
and therefore call
close_scope()
.
Consider what will happen now. Which symbols in the symbol table will “disappear” and which will remain “visible”?
The answer is that p1
’s local variable a
will “disappear”.
p1
itself is still visible (the main program can, of course, call
it).
Now we have to reorganise the symbol table so that lookup_symbol()
does not find a
. The algorithm we use is (study with care; this is
important):
for each symbol from sym_pos down to block_table[current_level] + 1
if the hash table points to the symbol
let it instead point to what the symbol points to with its hash link.
There is only one symbol that is affected, the symbol a
, which was
recently installed. The symbol is marked with 'I'
in the following figure.
Follow the
back link from the symbol or rehash its id to find out which index in
the hash table that is to be changed .
Follow the hash link from
a
.
Save the index of the symbol that was found at
the end of the hash link in the hash table
on the position found in
.
The previous index is of course lost
.
The hash link of the affected symbol must be set to
NULL_SYM
. At last, close the scope
.

The symbol table look like this when the scope of p1
is closed:

The condition as to whether the hash table points to the object can
only be answered by searching the whole hash table or re-hashing. We can
avoid this by placing a backward pointer in the symbol table. If
install_symbol()
really does install, it sets the backward pointer
to the hash index which was obtained at the key transformation. Use this
feature if you want to. This lab can be solved without using it if
desired.
We have now reached the main program which consists of the line:
c := b + a;
Convince yourself that lookup_symbol()
follows the scope rules.
Finally we find end and call close_scope()
again.
The symbols which are affected this time are marked I-V
.


Symbol II
’s (p1
’s) hash link points to NULL_SYM
and therefore the arrow from the hash table is removed (i.e. points to
NULL_SYM
). Symbol III
’s (c
’s) hash link points to
symbol IV
(b
) which makes the arrow from c
, b
,
prog
point to symbol IV
instead. In the next step the
pointer from c
, b
, prog
is set to the symbol prog. Symbol
V
(a
) makes the pointer from a
, write
point to the
symbol write
.
The final result:

Observations¶
One detail has been left out on purpose. When we scanned the statement
c := b + a
new copies of c
, b
and a should have been
inserted in the string table (but only there!). It would be smarter to
check whether a was already in the string table before installation. If
it is, you can “share” its index. This is left as an extra exercise.
If you want to implement this, you will need to change your
scanner.l
file slightly (hint: The method pool_forget()
might be
useful if you do).
If you want to save even more space in the string table, you can use the
observation that names that have been defined in a block that has
already been parsed are no longer visible. Hence you can move
pool_pos
back to the position which applied before the block.
This is not appropriate if you want to save the symbol table (why would you want to do this?).
A relatively expensive operation is the comparison made between identifiers at collisions in the hash table. If you are prepared to only have six significant characters for identifiers, you can get rid of the string table completely and code the six characters as a 32-bit integer (as letters and digits can be coded as numbers less than 40 and 40⁶ < 2³²). Other options include using a tree instead of a linked list (making conflicts cheaper, but adding variables and closing a scope more expensive), or increasing the size of the hash table and the quality of the hashing algorithm (resulting in fewer conflicts).
Implementation¶
Before you start with the implementation read The files you need to change and Other files of interest; you might also want to skim through the other sections for some useful hints.
You are to do four things:
Write the methods open_scope()
and close_scope()
.
They are called upon entering and leaving an environment (i.e., a
procedure/function) respectively.
Write the method lookup_symbol()
. Given a pool_index
to a
symbol, you should search from the innermost (current) environment and
outwards, to see if such a symbol exists. If it does, return its
sym_index
. Otherwise, return NULL_SYM
. Note that if you choose
to implement shared strings (see the non-mandatory exercise above) the
scanner will have to call this method. When?
Write the method install_symbol()
. Given the pool_index
of a symbol, and a sym_type
tag, you should create and install the
correct symbol type in the symbol table. If a symbol with the same
identifier already was installed on this lexical level, you shouldn’t
re-install it but instead return its sym_index
. This method is the
biggest part of this lab. Remember that there is a superclass symbol
from which all the various concrete symbol classes are derived. You will
need to instantiate the correct symbol class and initialize the various
symbol attributes: back_link
(if you use it),
id
,
hash_link
,
level
,
offset
, and
tag
(tag
will be set
to SYM_UNDEF
in the symbol constructor when creating a new symbol).
The second argument to install_symbol()
is used to find out which
symbol class to install. If any of the tables becomes filled,
fatal()
must be called.
Write the method enter_procedure()
. You can get a lot of help from
looking at the pre-written method enter_function()
; they will be
very similar. One thing you will need to do in this method is a safe
down-casting from class symbol to subclass procedure_symbol
. Look at
enter_function()
to see how this is done. If it is still confusing,
ask your lab assistant. The technique of safe down-casting in this way
will be used in many more places throughout this lab course, so you
might as well spend the time to understand it right away.
The files you need to change¶
- symtab.cc
contains the symbol table implementation. This is the file you will need to edit to complete this lab.
- scanner.l
flex input file. Unless you choose to implement shared strings (not mandatory), you should not need to edit this file any further (assuming you got it right in the previous lab, of course).
Other files of interest¶
Note: Other than the Makefile, use the same files you were given in the first lab.
- symtab.hh
contains all definitions concerning symbols and the symbol table.
- symbol.cc
contains the symbol class implementations.
- error.hh error.cc
global error and debug routines.
- symtabtest.cc
used for testing. Edit this file to simulate various calls to the symbol table. See the comments in the file for more information.
- Makefile
this is not the same as in the first lab!
Declarations and routines available in symtab.hh, symtab.cc¶
-
const block_level
MAX_BLOCK
= 8 Max allowed nesting levels (size of block table).
-
const hash_index
MAX_HASH
= 512 Max size of hash table.
-
const sym_index
MAX_SYM
= 1024 Max size of symbol table.
-
const pool_index
BASE_POOL_SIZE
= 1024 Base size of string pool.
-
const sym_index
NULL_SYM
= -1 Signifies ‘no symbol’.
-
const int
ILLEGAL_ARRAY_CARD
= -1 Signifies a non-int array size. The code using this constant has already been written for you.
The types block_index
, hash_index
, pool_index
and
sym_index
index the various tables.
The union constant_value
is used in constant symbols for storing the
constant’s value.
The tag type of an object in the symbol table is one of the elements in
sym_type
. The install_symbol()
method
initiates the tag type to SYM_UNDEF
if the installation actually takes place.
The symbol table, sym_tab
, is a global variable pointing to an
instance of the symbol_table
class; its index sym_pos
is
private. You can never access the actual table from outside the
symbol_table
class. There are access methods for the data members
you will need to set or query.
The block table, block_table
, and its index,
current_level
, are
private data members of the symbol_table
class (the code comments
refer to the block table and associated methods as the “Display”,
which is what it really is).
The string pool, string_pool
, and its index,
pool_pos
, are
private data members of the symbol_table
class.
The types void_type
, integer_type
, and real_type
are just
global constants denoting the three types symbols can have in diesel.
void_type
means “no type”, equivalent to a Pascal procedure, for
example. Using these types, other phases of the compiler can check the
type of an object in a convenient way:
if (sym_tab->get_symbol_type(pos) == integer_type) {
// ...
}
All the symbol classes are declared in the file symtab.hh
, and their
implementations can be found in the file symbol.cc
. You will need to
use this information a lot in this lab.
String table methods:¶
-
class
symbol_table
Public Functions
-
pool_index
pool_install
(char*) Install a string (an identifier or a string constant) in the string pool. Returns the index to the installed string.
-
char *
pool_lookup
(const pool_index) Given a
pool_index
into the string pool, returns the string it points to.
-
bool
pool_compare
(const pool_index, const pool_index) Compare two strings taking their respective
pool_index
as arguments. Returns true if they are identical, and false otherwise. Note that it is not enough to just check that the indices are identical. Why not?
-
pool_index
pool_forget
(const pool_index) Remove last installed entry from the string pool. This will only be useful if you opt to implement shared strings.
-
char *
fix_string
(const char*) Remove double
''
in strings constants.Internalizes a string constant. You hopefully use this method in several labs.
-
char *
capitalize
(const char*) Capitalizes a string. You hopefully use this method in several labs.
-
pool_index
Hash table methods:¶
-
class
symbol_table
Public Functions
-
hash_index
hash
(const pool_index) Given a
pool_index
to a string, return its hash index.
-
hash_index
Display methods:¶
-
class
symbol_table
Public Functions
-
sym_index
current_environment
() Returns the symbol table index to the current environment, i.e., procedure/function.
-
sym_index
Symbol table methods:¶
-
class
symbol_table
Public Functions
-
void
print
(int) Very useful method for dumping the symbol table to stdout. The value of the argument determines what info will be printed:
- 1
Print one line of info per symbol table entry.
- 2
Print (only) the string table, showing the current pool_pos.
- 3
Print (only) the non-zero elements in the hash table.
- any other
Print detailed information about every symbol in the symbol table. Watch out, though: this gets very long if you have more than a few symbols installed.
-
pool_index
get_symbol_id
(const sym_index) Given a symbol table index, return its pool_index, or 0 (zero) if no such symbol existed.
-
sym_index
get_symbol_type
(const sym_index) Given a symbol table index, returns its type (
void_type
,integer_type
, orreal_type
). If the symbol didn’t exist, returnsvoid_type
.
-
sym_type
get_symbol_tag
(const sym_index) Given a symbol table index, returns its tag type (i.e., the tag marking which symbol class it is:
SYM_CONST
,SYM_VAR
).
-
sym_index
enter_constant
(position_information*, const pool_index, const sym_index, const long) Given position information, a
pool_index
to an identifier’s name, asym_index
to the desired type and an int representing the actual constant value, generate and install a constant of the desired type.
-
sym_index
enter_constant
(position_information*, const pool_index, const sym_index, const double) Given position information, a
pool_index
to an identifier’s name, asym_index
to the desired type and an int representing the actual constant value, generate and install a constant of the desired type.
-
sym_index
enter_variable
(position_information*, const pool_index, const sym_index) Given position information, a
pool_index
to an identifier’s name, and a sym_index to the desired type, generate and install a variable of the desired type.
-
sym_index
enter_variable
(const pool_index, const sym_index) This convenience method is used for installing temporary variables for which position information is not relevant. See quads.cc (lab6).
-
sym_index
enter_array
(position_information*, const pool_index, const sym_index, const int) Given position information, a
pool_index
to an identifier’s name, asym_index
to the desired type, and an array cardinality, generate and install an array of the desired type.
-
sym_index
enter_function
(position_information*, const pool_index) Given position information, and a
pool_index
to an identifier’s name, generate and install a function.The type is set later on since it will not be known at symbol install time.
-
sym_index
enter_parameter
(position_information*, const pool_index, const sym_index) Given position information, a
pool_index
to an identifier’s name, and a sym_index to the desired type, generate and install a parameter of the desired type.
-
sym_index
enter_nametype
(position_information*, const pool_index) Given position information, and a
pool_index
to an identifier’s name, generate and install a nametype of the desired type.Since DIESEL doesn’t allow user-defined types, you will never need to use this method. It is used from the symbol table class constructor to install the default types, though.
NOTE: Should perhaps be marked private?
-
void
All the enter_
methods call the internal method install_symbol()
with the appropriate parameters. The common attributes (described in )
are set in install_symbol()
, but the specific attributes are set in
the enter_
-method that called install_symbol()
. A newly
installed symbol will have its tag field set to SYM_UNDEF
, because
otherwise there will be a name collision. Take care to study all the
symbol classes and make sure you set the appropriate attributes. Don’t
overlook the constructors either; some work has already been done for
you.
As far as constants are concerned the values of all attributes at call
time to enter_constant()
are known, but this is not the case with,
for example, enter_procedure()
. The size of a procedure (the
attribute ar_size
) can obviously not be known before we know how
many local variables there are.
Study how enter_variable()
, for example, uses the
fact that current_environment()
always returns
the nearest surrounding procedure or function to update its ar_size
attribute.
Pay attention to how the safe down-casting is done; it’s not that hard
to understand once you get the idea, and it is a technique used
throughout this lab course.
Debugging help¶
All symbols can be sent directly to cout
. The entire symbol table
can be printed using the print()
method with various arguments (see
above). (For those who know C++ well, there are certain things called
io
manipulators which are used to regulate the level of detail
printed about a symbol, but you really shouldn’t need to use them. Their
implementation resides at the bottom of the file symbol.cc
for those
who are interested.)
How do I know it’s correct?¶
When you compile your program using make, a binary executable file
called symtab will be created. It will run the code residing in the
symtabtest.cc
file, simulating the parsing of a code fragment.
There are three tests you should do: 2a, 2b and 2c.
You run these by passing a flag to the executable, for example: ./symtab a
.
The tests should have different values depending on which test you are running.
These are described at the top of every test output (see Example execution).
Test 2a shows if
your symbol table works, 2b tests if your program complains about
re-declarations and 2c tests all the various enter_
-methods.
Reporting your work¶
For the demonstration, run the test program (symtab) for the three test cases 2a-c and discuss your code.
Hand in your code as described in Handing in Results.
Example execution¶
./symtab a
Starting symtabtest.cc...
Inside p2:
symbol:
id: c
type: INTEGER
level: 3
hash_link: 15
back_link: 99
offset: 0
tag: SYM_VAR
class: variable_symbol
symbol:
id: b
type: REAL
level: 2
hash_link: 11
back_link: 98
offset: 0
tag: SYM_VAR
class: variable_symbol
symbol:
id: a
type: INTEGER
level: 1
hash_link: -1
back_link: 97
offset: 0
tag: SYM_VAR
class: variable_symbol
Inside p1:
symbol:
id: c
type: REAL
level: 2
hash_link: 12
back_link: 99
offset: 8
tag: SYM_VAR
class: variable_symbol
symbol:
id: b
type: REAL
level: 2
hash_link: 11
back_link: 98
offset: 0
tag: SYM_VAR
class: variable_symbol
symbol:
id: a
type: INTEGER
level: 1
hash_link: -1
back_link: 97
offset: 0
tag: SYM_VAR
class: variable_symbol
7GLOBAL.4VOID7INTEGER4REAL4READ5WRITE7INT-ARG5TRUNC8REAL-ARG4prog1a1b1c2p11b1c2p21c
-----------------------------------------------------------------------------------^ (pool_pos = 83)
Symbol table (size = 17):
Pos Name Lev Hash Back Offs Type Tag
-----------------------------------------------
0: GLOBAL. 0 -1 159 0 GLOBAL. SYM_PROC lbl = -1 ar_size = 0
1: VOID 0 -1 82 0 VOID SYM_NAMETYPE
2: INTEGER 0 -1 462 0 VOID SYM_NAMETYPE
3: REAL 0 -1 324 0 VOID SYM_NAMETYPE
4: READ 0 -1 316 0 INTEGER SYM_FUNC lbl = 0 ar_size = 0
5: WRITE 0 -1 139 0 VOID SYM_PROC lbl = 1 ar_size = 0
6: INT-ARG 0 -1 210 0 INTEGER SYM_PARAM
7: TRUNC 0 -1 332 0 INTEGER SYM_FUNC lbl = 2 ar_size = 0
8: REAL-ARG 0 -1 427 0 REAL SYM_PARAM
9: prog 0 -1 24 0 VOID SYM_PROC lbl = 3 ar_size = 24
10: a 1 -1 97 0 INTEGER SYM_VAR
11: b 1 -1 98 8 INTEGER SYM_VAR
12: c 1 -1 99 16 INTEGER SYM_VAR
13: p1 1 -1 161 0 VOID SYM_PROC lbl = 4 ar_size = 16
14: b 2 -1 98 0 REAL SYM_VAR
15: c 2 -1 99 8 REAL SYM_VAR
16: p2 2 -1 162 0 VOID SYM_PROC lbl = 5 ar_size = 8
17: c 3 -1 99 0 INTEGER SYM_VAR
ENDING TEST PROGRAM RUN -----------------------------
./symtab b
Starting symtabtest.cc...
Type conflict, line 0, col 0: Redeclaration: symbol:
id: a
type: INTEGER
level: 1
hash_link: -1
back_link: 97
offset: 0
tag: SYM_VAR
class: variable_symbol
Inside p2:
symbol:
id: c
type: INTEGER
level: 3
hash_link: 15
back_link: 99
offset: 0
tag: SYM_VAR
class: variable_symbol
symbol:
id: b
type: REAL
level: 2
hash_link: 11
back_link: 98
offset: 0
tag: SYM_VAR
class: variable_symbol
symbol:
id: a
type: INTEGER
level: 1
hash_link: -1
back_link: 97
offset: 0
tag: SYM_VAR
class: variable_symbol
Inside p1:
symbol:
id: c
type: REAL
level: 2
hash_link: 12
back_link: 99
offset: 8
tag: SYM_VAR
class: variable_symbol
symbol:
id: b
type: REAL
level: 2
hash_link: 11
back_link: 98
offset: 0
tag: SYM_VAR
class: variable_symbol
symbol:
id: a
type: INTEGER
level: 1
hash_link: -1
back_link: 97
offset: 0
tag: SYM_VAR
class: variable_symbol
7GLOBAL.4VOID7INTEGER4REAL4READ5WRITE7INT-ARG5TRUNC8REAL-ARG4prog1a1a1b1c2p11b1c2p21c
-------------------------------------------------------------------------------------^ (pool_pos = 85)
Symbol table (size = 17):
Pos Name Lev Hash Back Offs Type Tag
-----------------------------------------------
0: GLOBAL. 0 -1 159 0 GLOBAL. SYM_PROC lbl = -1 ar_size = 0
1: VOID 0 -1 82 0 VOID SYM_NAMETYPE
2: INTEGER 0 -1 462 0 VOID SYM_NAMETYPE
3: REAL 0 -1 324 0 VOID SYM_NAMETYPE
4: READ 0 -1 316 0 INTEGER SYM_FUNC lbl = 0 ar_size = 0
5: WRITE 0 -1 139 0 VOID SYM_PROC lbl = 1 ar_size = 0
6: INT-ARG 0 -1 210 0 INTEGER SYM_PARAM
7: TRUNC 0 -1 332 0 INTEGER SYM_FUNC lbl = 2 ar_size = 0
8: REAL-ARG 0 -1 427 0 REAL SYM_PARAM
9: prog 0 -1 24 0 VOID SYM_PROC lbl = 3 ar_size = 24
10: a 1 -1 97 0 INTEGER SYM_VAR
11: b 1 -1 98 8 INTEGER SYM_VAR
12: c 1 -1 99 16 INTEGER SYM_VAR
13: p1 1 -1 161 0 VOID SYM_PROC lbl = 4 ar_size = 16
14: b 2 -1 98 0 REAL SYM_VAR
15: c 2 -1 99 8 REAL SYM_VAR
16: p2 2 -1 162 0 VOID SYM_PROC lbl = 5 ar_size = 8
17: c 3 -1 99 0 INTEGER SYM_VAR
ENDING TEST PROGRAM RUN -----------------------------
./symtab c
Starting symtabtest.cc...
Current environment: 0
7GLOBAL.4VOID7INTEGER4REAL4READ5WRITE7INT-ARG5TRUNC8REAL-ARG9test_proc
----------------------------------------------------------------------^ (pool_pos = 70)
Current environment: 9
Constant stored as:
symbol:
id: test_const1
type: REAL
level: 1
hash_link: -1
back_link: 183
offset: 0
tag: SYM_CONST
class: constant_symbol
const_value.rval:2.45
Current environment: 0
7GLOBAL.4VOID7INTEGER4REAL4READ5WRITE7INT-ARG5TRUNC8REAL-ARG9test_proc11test_param111test_param211test_const19test_var19test_func11test_array1
------------------------------------------------------------------------------------------------------------------------------------------^ (pool_pos = 138)
Symbol table (size = 15):
Pos Name Lev Hash Back Offs Type Tag
-----------------------------------------------
0: GLOBAL. 0 -1 159 0 GLOBAL. SYM_PROC lbl = -1 ar_size = 0
1: VOID 0 -1 82 0 VOID SYM_NAMETYPE
2: INTEGER 0 -1 462 0 VOID SYM_NAMETYPE
3: REAL 0 -1 324 0 VOID SYM_NAMETYPE
4: READ 0 -1 316 0 INTEGER SYM_FUNC lbl = 0 ar_size = 0
5: WRITE 0 -1 139 0 VOID SYM_PROC lbl = 1 ar_size = 0
6: INT-ARG 0 -1 210 0 INTEGER SYM_PARAM
7: TRUNC 0 -1 332 0 INTEGER SYM_FUNC lbl = 2 ar_size = 0
8: REAL-ARG 0 -1 427 0 REAL SYM_PARAM
9: test_proc 0 -1 211 0 VOID SYM_PROC lbl = 3 ar_size = 8
10: test_param1 1 -1 481 0 INTEGER SYM_PARAM
11: test_param2 1 -1 482 8 REAL SYM_PARAM prec = test_param1
12: test_const1 1 -1 183 0 REAL SYM_CONST value = 2.45
13: test_var1 1 -1 249 0 INTEGER SYM_VAR
14: test_func 1 -1 427 0 INTEGER SYM_FUNC lbl = 4 ar_size = 24
15: test_array1 2 -1 143 0 INTEGER SYM_ARRAY card = 3
ENDING TEST PROGRAM RUN -----------------------------