Lab 2 : Symbol Table

In this lab, you will write functions to administrate a symbol table. If you are unsure about scope-rules, you should read pages 394 - 396 (as far as section 7.2) in the course textbook.

Introduction

In the symbol table information is gathered about the names 1 that appear in the program. The symbol table primarily helps:

  • In checking the program’s semantic correctness.

    Using the symbol table you can, for example, determine whether a variable has been declared and in that case if it is of an integer or floating-point type.

  • In generating code.

    The symbol table keeps track, for example, of memory requirements for various variables.

Furthermore, the following features are desirable:

  • Support for block structures and scope rules, if the language requires that.

    In block-structured languages the same identifier can occur several times, but in different blocks. The symbol table, therefore, ought to be organised so you have access to the variables which are visible at a given time.

  • Compact storage.

    An implementation requiring little memory is to be aimed for.

  • Quick addition of, and searching for names.

    The need to search fast collides (as usual) with memory space demands, as mentioned above.

  • Resident storage.

    The symbol table may be needed later, if the compiler works in several runs, or for diagnosis of execution errors.

The information stored in the symbol table is a list of pairs (name, attribute), also often referred to as bindings:

Name

The name forms a search key in the symbol table. A problem which must be dealt with is how to efficiently store all significant characters in the identifier.

Attributes which are dependent on the language to be compiled

These include type information, array dimensions, the number of parameters in procedures and functions, as well as line numbers for declarations.

Attributes which are dependent on the language we are compiling to

For example, if we are generating machine code, we will have problems with references to addresses in memory, whose values are unknown at code generation time. If we choose to generate assembler, we can instead refer to a so-called label (symbolic position) whose value is determined later. Assembler will then perform the translation to absolute addresses in memory.

The operations we would like to perform on the symbol tables are:

  • Look up a name

  • Add a new name

  • Add an attribute to a name

  • Retrieve the value of an attribute of a name

  • Remove a name and its associated information

Data structures for symbol tables

When judging various data structures, the following should be taken into consideration:

  • Memory space,

  • Time taken to add and search for a name,

  • How difficult it is to introduce scoping.

Linear lists

Searching is performed in linear fashion from beginning to end. The search halts as soon as the name you are looking for, is found.

Linear lists are easy to implement and require little memory space. Scoping is easily introduced using the LIFO principle. However, search times are unacceptable. Reports have shown that a compiler, which implements symbol tables using linear lists, can spend 25% of compilation time on searching.

Trees

The symbol table can be in the form of a tree; then each subprogram has a symbol table attached to its node in the abstract syntax tree. The main program has a similar table for globally declared objects. The advantage of this structure is that it is easy to represent scoping and is quicker than linear lists.

Hashing

When high demands are made on the search time, hashing is used. Entry to the hash table is calculated using a suitable key transformation. If the symbol is not in the entry, you have to follow the link field. By making the hash table large enough you can control the search time.

However, extra space is required if you do not want to risk filling the hash table 2. Introducing scoping is somewhat more difficult than for linear lists and trees.

The symbol table in DIESEL

The symbol table to be implemented, uses hashing with linking. In DIESEL there are seven different types of symbols: constants, variables, arrays, parameters (formal), procedures, functions and name types (integer, real, and void). Each symbol type is represented by a separate class, and all symbols inherit from the abstract class symbol. Let us first examine what information is common to all symbols (and is found in the symbol class):

class symbol

The symbol table consists of entries of subclasses to symbol. This class contains data that are common to all symbol types. This class is never used directly. Use the derived classes instead.

Subclassed by array_symbol, constant_symbol, function_symbol, nametype_symbol, parameter_symbol, procedure_symbol, variable_symbol

Public Members

pool_index id

Index to the string_pool, ie, its name.

The object’s identifier is stored in the string table (i.e. the spelling, see previous lab). Index to string table (called id in the lab skeleton) is a value denoting where an identifier starts in the string table. Example:

The scanner from lab1 has created a string table that look like this (the number denotes the length of the following identifier):

7GLOBAL.4VOID7INTEGER...

The id to GLOBAL. is 0 because it starts on the first position in the string table (the index to the first position in an array in C++ is 0).

sym_type tag

This field is only used to give us a way to differentiate between the various symbol classes when we need to do a safe downcast from the superclass. The default value of tag is SYM_UNDEF.

sym_index type

In practice type is used only by constants, variables, arrays, parameters and functions; all other symbol types are of type void. The type field contains an index to a name type which in DIESEL only can be integer_type, real_type, or void_type. These will be preinstalled in the symbol table.

sym_index hash_link

If the object you are searching for not is found immediately after the key transformation, you follow the hash link backwards in the table.

hash_index back_link

This link points back to the hash table from the symbol table and can be used to speed up certain types of lookups. It is possible to complete the lab without using it, though.

block_level level

The lexical level states how deeply nested the object is in the program. For example, an object on the first level is global. Eight levels is the maximum in Diesel.

int offset

Offset specifies which relative position the object has in the memory space which has been reserved for a given lexical level on the runtime stack. This is used during code generation.

There are specialized symbols with additional attributes:

class constant_symbol : public symbol

Derived symbol type, used for constants.

Public Members

constant_value const_value

Value of a constant, can be int or float. The value is stored in a union; check the #type attribute to figure out if you should access ivar or rvar.

class variable_symbol : public symbol

Derived symbol type, used for variables.

class array_symbol : public symbol

Derived symbol type, used for arrays.

Study the following declaration:

var arr : array[10] of real;

The type attribute (see above) in this case is an index to the name type real. We see that we also need an attribute, #array_cardinality, which in this example would have the value 10. To be slightly more general, we also have an attribute, #index_type, which at present can only point to the name type #integer_type.

Public Members

sym_index index_type

Points to the index type in the symbol table.

int array_cardinality

Note: cardinality = nr of elements,.

class parameter_symbol : public symbol

Parameters require particular checking, as the formal parameters are to be bound to the actual ones. Assume the following declaration:

procedure p(i : integer; r : real);

which is then called with the following actual parameters:

p(22/7, 355/113);

To perform a semantic check, we must find the formal parameters (i and r) in the symbol table and compare their type attributes with the actual parameters. In this example a type conflict occurs when the value of the expression 22/7 (which is a real) is to be bound to the integer variable i.

For this reason the formal parameters are linked together in the symbol table. The preceding attribute points to the previous parameter in the link chain. The procedure (or function) points to the last parameter. It might seem counter-intuitive that the parameters are stored backwards like this, but it will actually make life easier later on.

The order of storing the actual paramerers.

The handling of parameters is actually one of the trickiest parts to understand correctly in this entire lab course. Some work has already been done for you, but you will have to do a lot of it yourself. Make sure to read the code comments carefully when dealing with parameters, and be sure to have this lab material handy as well. The order of the parameters will be flipped around during compiling more than once. Once you have that sequence clear, getting them to work will be a lot easier.

Public Members

int size

Nr of bytes parameter needs.

parameter_symbol *preceding

Link to preceding parameter, if any.

Procedures and functions have identical members but different semantics:

class procedure_symbol : public symbol

Derived symbol type, used for procedures.

Public Members

int ar_size

Activation record size.

Specifies the memory space that is needed to store the local variables belonging to the procedure/function on the runtime stack. A procedure that declares an array of ten integers, two reals and three integers requires (10+2+3)*8 bytes.

int label_nr

The label number which is assigned to the procedure or function in assembler code.

parameter_symbol *last_parameter

List of parameters. We store them in reverse order to make type-checking easier later on.

class function_symbol : public symbol

Derived symbol type, used for functions.

Public Members

int ar_size

Activation record size.

Specifies the memory space that is needed to store the local variables belonging to the procedure/function on the runtime stack. A procedure that declares an array of ten integers, two reals and three integers requires (10+2+3)*8 bytes.

int label_nr

The label number which is assigned to the procedure or function in assembler code.

parameter_symbol *last_parameter

List of parameters. We store them in reverse order to make type-checking easier later on.

The symbol table structure

The symbol table itself is made up of an array of pointers to symbol objects, which belong to one of the classes mentioned above. A symbol is accessed via its index, a sym_index, rather than by a pointer. Why? To allow us to use hashing.

A variable called sym_pos keeps track of the current position in the table. It will always contain the index to the last symbol that was entered into the table. Note that the first element in an array in C++ has index 0 (zero).

The symbol table also contains a string pool, where identifiers and string constants are stored.

A hash table is used for fast access to symbols.

A block table, or display, is used to keep track of the current nesting depth. Read up on displays in the course book. It isn’t really that complicated once you get the hang of it.

To try to make things clearer, let us study how the symbol table is constructed step by step.

At the beginning the table is empty:

The symbol table at compilation start.

In the diagram, we have chosen (for reasons of space) to show only the attributes we care about right now (id and hash_link). We assume the following functions exist (see the top of symtab.hh for some type definitions used here):

class symbol_table

Public Functions

sym_index install_symbol(const pool_index, const sym_type tag)

Installs a symbol in the symbol table and returns its index. This method installs a symbol in the symbol table and returns its index. If the symbol already existed at the same lexical level (i.e. two objects with the same identifi and value of the attribute level), the object will not be installed, and the index of the symbol that already exists is returned.

If a symbol is installed, its tag will be set to SYM_UNDEF (which is done by default in the symbol constructor). The appropriate tag will then be set in the enter_ method that called symbol_table::install_symbol. In this way name collisions can be discovered. For instance: The variable m is declared twice on the same lexical level, and that is of course not correct. When the first m is encountered the method symbol_table::enter_variable will try to install the symbol in the symbol table. It calls symbol_table::install_symbol which creates and install a symbol of the correct type. The symbol’s tag is now set to SYM_UNDEF. When symbol_table::install_symbol finishes, the execution continues inside symbol_table::enter_variable and the method checks if the recent installed symbol’s tag is set to SYM_UNDEF, and if it’s true, it changes the tag to SYM_VAR. When the second m is encountered the procedure is repeated, but this time symbol_table::install_symbol returns the sym_index to the first m. When checking the tag of this symbol, symbol_table::enter_variable gets the answer SYM_VAR and not SYM_UNDEF. Therefore symbol_table::enter_variable knows that a symbol with this name already exists in the symbol table at this lexical level and outputs an error message. The second m is never installed. Furthermore, the level attribute will be set to the current lexical level, symbol_table#current_level (which is currently zero).

Note that the first argument to symbol_table::install_symbol is an index to the string table (symbol_table#string_pool). This is, of course, what the scanner outputs in the YYSTYPE::pool_p field of the yylval union. If you had problems understanding what it was for earlier, hopefully some of that has become more clear now!

The second argument is the tag that the symbol should have, it denotes which symbol class should be used (constant_symbol, variable_symbol, etc.).

sym_index lookup_symbol(const pool_index)

Look for a symbol with the same identifier as the argument specifies in the string table, and returns its symbol table index. The search is performed according to the scoping rules, i.e. from the current lexical level and outwards. Make sure you understand what this means! If the object is not found, NULL_SYM is returned.

void open_scope()

Opens a new lexical level.

The routine is called when the name of a procedure or function has been installed. Note that proc lies outside the lexical level that i and loc have, in this example:

procedure proc(i : integer);
  var loc : integer;
begin
end;

The #block_tablekeeps track of the active blocks in the program. When we have treated the whole proc procedure above, we must make sure that the loc variable is not “visible”. We shall soon see how this is achieved.

sym_index close_scope()

Closes the current lexical level and returns a symbol table index to the new lexical level.

The routine is called when we have finished with a procedure or function. The local variables that were there will then become “invisible”, i.e., the following program code can not reference them.

Private Members

char *string_pool

The actual string pool.

block_level current_level

Current nesting depth.

sym_index *block_table

Table of level sym_index pointers. They point at the start of a new scope/block.

When creating a new symbol table object, the name types global, integer_type, real and void as well as the standard functions trunc and read, and the standard procedure write are installed in the symbol table.

To save space in the examples below, we skip global, void, trunc, and integer; and starts with the name type real instead. The principle is the same.

How the figures should be read:

A dotted arrow means that the link is removed in this picture:

dotted

A dashed arrow means that the link is added in this picture:

dashed

A normal (filled) arrow means that the link was added in any previous picture:

normal

Inserting REAL into the empty symbol table:

Inserting *REAL* into the symbol table.

The symbol’s real pool_pos value is hashed. A symbol of the type real is created, and it’s index to the string table is set to real’s pool_pos \textcircled{\raisebox{-0.9pt}{2}}. Since there is no other symbol on this position in the hash table the hash link is set to nothing, i.e., NULL_SYM \textcircled{\raisebox{-0.9pt}{3}} (this is symbolized by the diagonal line in the square named hash link) . The index to the symbol is saved in the hash table \textcircled{\raisebox{-0.9pt}{1}}. The pointer sym_pos is increased by one \textcircled{\raisebox{-0.9pt}{4}}.

Step 2: The symbol write is installed in the same manner as real \textcircled{\raisebox{-0.9pt}{1}}-\textcircled{\raisebox{-0.9pt}{3}}:

Inserting WRITE into the symbol table.

Step 3: Inserting READ into the symbol table and linking back to REAL:

Step 3. Inserting READ into the symbol table and linking back to REAL.

The symbol’s read pool_pos is hashed and happens to have the same hash value as read, therefore there is a hash table collision between read and real \textcircled{\raisebox{-0.9pt}{1}}. The symbol is installed and its id is set to the pool_pos value of read \textcircled{\raisebox{-0.9pt}{2}}. The index to the symbol is saved in the hash table \textcircled{\raisebox{-0.9pt}{3}}, and the previous index, that pointed to the first symbol (real), is lost \textcircled{\raisebox{-0.9pt}{4}}. Since we just can’t throw away real, we set the hash_link of read to point to real \textcircled{\raisebox{-0.9pt}{5}}. We can now find both read and real in the symbol table by following the hash link. The sym_pos variable is increased for every object added, and always points to the last installed object. The pool_pos variable plays a similar role for the string table.

We can now start to deal with the program, which is assumed to look like this:

program prog;
  var a : integer; b : integer; c : integer;
  procedure p1;
    var a : integer;
  begin
    c := b + a;
  end;
begin
  c := b + a;
end.

The first identifier to be detected by the scanner is prog. We install it:

Installing *prog*.

The method that called install_symbol() (enter_procedure()) sets, among other things, the tag of PROG to SYM_PROC 3. This is immediately followed by a call to open_scope() as a new lexical level begins; the global level:

prog is seen as a procedure, so we increase the lexical scope.

The current lexical level (or depth) is now 1. The first element of the block table now points to the last object inserted \textcircled{\raisebox{-0.9pt}{1}} (the current sym_pos is saved into the block table), which will be the procedure/function we just entered. This is a convention which means that we always know which procedure or function we are in, namely:

sym_table[block_table[current_level]]

Incidentally, this is identical to:

symtab->get_symbol(sym_tab->current_environment())

which is how you can reference the current environment from outside the symbol table. More about that in the next lab, however.

current_level thus points to the current block level and therefore goes up and down. When compilation has finished current_level should be the same as when compilation started (zero).

Exercise

This exercise is not mandatory, but it helps you understand the implementation.

Assume that REAL has been stored in the string table (pool table) in position 9. Work out what happens at the following call:

i := lookup_symbol(9);

The next identifiers to be installed are a, b and c. All three give rise to collisions during hashing. Therefore they are all installed in the same manner as read. This description is valid for all the three pictures: A hash table collision occurs \textcircled{\raisebox{-0.9pt}{1}}. The symbol is installed and its id is set to the pool_pos value of the symbol \textcircled{\raisebox{-0.9pt}{2}}. The index to the symbol is saved in the hash table \textcircled{\raisebox{-0.9pt}{3}} and the previous index that pointed to some other symbol is removed \textcircled{\raisebox{-0.9pt}{4}}. The hash link of the installed symbol is set to point to the other symbol instead \textcircled{\raisebox{-0.9pt}{5}}, which would be lost otherwise.

Installing a:

Installing *a*.

Installing b:

Installing *b*.

Installing c:

Installing *c*.

Take a look at how prog can be found by following the hash links from c. Also, note how write, which is on another lexical level than the current one, is still “visible” via the hash links.

The next identifier is p1. There is no collision this time and p1 is installed in the same manner as real and write. Since p1 is a procedure, open_scope() is called after the installation. The current lexical level is now 2 and the current sym_pos has been saved in the block table \textcircled{\raisebox{-0.9pt}{1}}.

Installing *p1*.

We now have two active blocks. Compilation continues and we find identifier a. We install it while we start at the scope p1:

Installing *a* from inside *p1*.

Exercise

This exercise is not mandatory, but helpful for understanding the implementation.

There are now two variables called a in the program. Why is it that the a we just found does not collide with the first one and generate the error message “already declared”?

After the declaration we find the statement

c := b + a;

which results in three calls to lookup_symbol():

lookup_symbol(poolindex of c);

We hash c and follow the index from the hash table to the symbol table. Is c there? Yes, return its index in the table (7).

lookup_symbol(poolindex of b);

We hash b and follow the index from the hash table to the symbol table. Is b there? No, c was there so we follow the hash link backwards. Is b there? Yes, return its index (6).

lookup_symbol(poolindex of a);

We hash a and follow the index from the hash table to the symbol table. Is a there? Yes, return its index (9).

We can see, then, how lookup_symbol() always follows the scope rules. If you didn’t see it, read about scope rules in the course book or the lecture notes, then look through these figures again.

We now come to the end of procedure p1 and therefore call close_scope().

Consider what will happen now. Which symbols in the symbol table will “disappear” and which will remain “visible”?

The answer is that p1’s local variable a will “disappear”. p1 itself is still visible (the main program can, of course, call it).

Now we have to reorganise the symbol table so that lookup_symbol() does not find a. The algorithm we use is (study with care; this is important):

for each symbol from sym_pos down to block_table[current_level] + 1
  if the hash table points to the symbol
    let it instead point to what the symbol points to with its hash link.

There is only one symbol that is affected, the symbol a, which was recently installed. The symbol is marked with 'I' in the following figure. Follow the back link from the symbol or rehash its id to find out which index in the hash table that is to be changed \textcircled{\raisebox{-0.9pt}{1}}. Follow the hash link from a \textcircled{\raisebox{-0.9pt}{2}}. Save the index of the symbol that was found at the end of the hash link in the hash table \textcircled{\raisebox{-0.9pt}{3}} on the position found in \textcircled{\raisebox{-0.9pt}{1}}. The previous index is of course lost \textcircled{\raisebox{-0.9pt}{4}}. The hash link of the affected symbol must be set to NULL_SYM \textcircled{\raisebox{-0.9pt}{5}}. At last, close the scope \textcircled{\raisebox{-0.9pt}{6}}.

Exit from `p1`, remove the `a` variable form the hash table.

The symbol table look like this when the scope of p1 is closed:

Symbol table after the scope of *p1* was closed.

The condition as to whether the hash table points to the object can only be answered by searching the whole hash table or re-hashing. We can avoid this by placing a backward pointer in the symbol table. If install_symbol() really does install, it sets the backward pointer to the hash index which was obtained at the key transformation. Use this feature if you want to. This lab can be solved without using it if desired.

We have now reached the main program which consists of the line:

c := b + a;

Convince yourself that lookup_symbol() follows the scope rules.

Finally we find end and call close_scope() again. The symbols which are affected this time are marked I-V.

The symbols which are affected this time are marked I-V.
Closing the scope.

Symbol II’s (p1’s) hash link points to NULL_SYM and therefore the arrow from the hash table is removed (i.e. points to NULL_SYM). Symbol III’s (c’s) hash link points to symbol IV (b) which makes the arrow from c, b, prog point to symbol IV instead. In the next step the pointer from c, b, prog is set to the symbol prog. Symbol V (a) makes the pointer from a, write point to the symbol write.

The final result:

The final result.

Observations

One detail has been left out on purpose. When we scanned the statement c := b + a new copies of c, b and a should have been inserted in the string table (but only there!). It would be smarter to check whether a was already in the string table before installation. If it is, you can “share” its index. This is left as an extra exercise. If you want to implement this, you will need to change your scanner.l file slightly (hint: The method pool_forget() might be useful if you do).

If you want to save even more space in the string table, you can use the observation that names that have been defined in a block that has already been parsed are no longer visible. Hence you can move pool_pos back to the position which applied before the block.

This is not appropriate if you want to save the symbol table (why would you want to do this?).

A relatively expensive operation is the comparison made between identifiers at collisions in the hash table. If you are prepared to only have six significant characters for identifiers, you can get rid of the string table completely and code the six characters as a 32-bit integer (as letters and digits can be coded as numbers less than 40 and 40⁶ < 2³²). Other options include using a tree instead of a linked list (making conflicts cheaper, but adding variables and closing a scope more expensive), or increasing the size of the hash table and the quality of the hashing algorithm (resulting in fewer conflicts).

Implementation

Before you start with the implementation read The files you need to change and Other files of interest; you might also want to skim through the other sections for some useful hints.

You are to do four things:

Write the methods open_scope() and close_scope(). They are called upon entering and leaving an environment (i.e., a procedure/function) respectively.

Write the method lookup_symbol(). Given a pool_index to a symbol, you should search from the innermost (current) environment and outwards, to see if such a symbol exists. If it does, return its sym_index. Otherwise, return NULL_SYM. Note that if you choose to implement shared strings (see the non-mandatory exercise above) the scanner will have to call this method. When?

Write the method install_symbol(). Given the pool_index of a symbol, and a sym_type tag, you should create and install the correct symbol type in the symbol table. If a symbol with the same identifier already was installed on this lexical level, you shouldn’t re-install it but instead return its sym_index. This method is the biggest part of this lab. Remember that there is a superclass symbol from which all the various concrete symbol classes are derived. You will need to instantiate the correct symbol class and initialize the various symbol attributes: back_link (if you use it), id, hash_link, level, offset, and tag (tag will be set to SYM_UNDEF in the symbol constructor when creating a new symbol). The second argument to install_symbol() is used to find out which symbol class to install. If any of the tables becomes filled, fatal() must be called.

Write the method enter_procedure(). You can get a lot of help from looking at the pre-written method enter_function(); they will be very similar. One thing you will need to do in this method is a safe down-casting from class symbol to subclass procedure_symbol. Look at enter_function() to see how this is done. If it is still confusing, ask your lab assistant. The technique of safe down-casting in this way will be used in many more places throughout this lab course, so you might as well spend the time to understand it right away.

The files you need to change

symtab.cc

contains the symbol table implementation. This is the file you will need to edit to complete this lab.

scanner.l

flex input file. Unless you choose to implement shared strings (not mandatory), you should not need to edit this file any further (assuming you got it right in the previous lab, of course).

Other files of interest

Note: Other than the Makefile, use the same files you were given in the first lab.

symtab.hh

contains all definitions concerning symbols and the symbol table.

symbol.cc

contains the symbol class implementations.

error.hh error.cc

global error and debug routines.

symtabtest.cc

used for testing. Edit this file to simulate various calls to the symbol table. See the comments in the file for more information.

Makefile

this is not the same as in the first lab!

Declarations and routines available in symtab.hh, symtab.cc

const block_level MAX_BLOCK = 8

Max allowed nesting levels (size of block table).

const hash_index MAX_HASH = 512

Max size of hash table.

const sym_index MAX_SYM = 1024

Max size of symbol table.

const pool_index BASE_POOL_SIZE = 1024

Base size of string pool.

const sym_index NULL_SYM = -1

Signifies ‘no symbol’.

const int ILLEGAL_ARRAY_CARD = -1

Signifies a non-int array size. The code using this constant has already been written for you.

The types block_index, hash_index, pool_index and sym_index index the various tables.

The union constant_value is used in constant symbols for storing the constant’s value.

The tag type of an object in the symbol table is one of the elements in sym_type. The install_symbol() method initiates the tag type to SYM_UNDEF if the installation actually takes place.

The symbol table, sym_tab, is a global variable pointing to an instance of the symbol_table class; its index sym_pos is private. You can never access the actual table from outside the symbol_table class. There are access methods for the data members you will need to set or query.

The block table, block_table, and its index, current_level, are private data members of the symbol_table class (the code comments refer to the block table and associated methods as the “Display”, which is what it really is).

The string pool, string_pool, and its index, pool_pos, are private data members of the symbol_table class.

The types void_type, integer_type, and real_type are just global constants denoting the three types symbols can have in diesel. void_type means “no type”, equivalent to a Pascal procedure, for example. Using these types, other phases of the compiler can check the type of an object in a convenient way:

if (sym_tab->get_symbol_type(pos) == integer_type) {
  // ...
}

All the symbol classes are declared in the file symtab.hh, and their implementations can be found in the file symbol.cc. You will need to use this information a lot in this lab.

String table methods:

class symbol_table

Public Functions

pool_index pool_install(char*)

Install a string (an identifier or a string constant) in the string pool. Returns the index to the installed string.

char *pool_lookup(const pool_index)

Given a pool_index into the string pool, returns the string it points to.

bool pool_compare(const pool_index, const pool_index)

Compare two strings taking their respective pool_index as arguments. Returns true if they are identical, and false otherwise. Note that it is not enough to just check that the indices are identical. Why not?

pool_index pool_forget(const pool_index)

Remove last installed entry from the string pool. This will only be useful if you opt to implement shared strings.

char *fix_string(const char*)

Remove double '' in strings constants.

Internalizes a string constant. You hopefully use this method in several labs.

char *capitalize(const char*)

Capitalizes a string. You hopefully use this method in several labs.

Hash table methods:

class symbol_table

Public Functions

hash_index hash(const pool_index)

Given a pool_index to a string, return its hash index.

Display methods:

class symbol_table

Public Functions

sym_index current_environment()

Returns the symbol table index to the current environment, i.e., procedure/function.

Symbol table methods:

class symbol_table

Public Functions

void print(int)

Very useful method for dumping the symbol table to stdout. The value of the argument determines what info will be printed:

1

Print one line of info per symbol table entry.

2

Print (only) the string table, showing the current pool_pos.

3

Print (only) the non-zero elements in the hash table.

any other

Print detailed information about every symbol in the symbol table. Watch out, though: this gets very long if you have more than a few symbols installed.

pool_index get_symbol_id(const sym_index)

Given a symbol table index, return its pool_index, or 0 (zero) if no such symbol existed.

sym_index get_symbol_type(const sym_index)

Given a symbol table index, returns its type (void_type, integer_type, or real_type). If the symbol didn’t exist, returns void_type.

sym_type get_symbol_tag(const sym_index)

Given a symbol table index, returns its tag type (i.e., the tag marking which symbol class it is: SYM_CONST, SYM_VAR).

sym_index enter_constant(position_information*, const pool_index, const sym_index, const long)

Given position information, a pool_index to an identifier’s name, a sym_index to the desired type and an int representing the actual constant value, generate and install a constant of the desired type.

sym_index enter_constant(position_information*, const pool_index, const sym_index, const double)

Given position information, a pool_index to an identifier’s name, a sym_index to the desired type and an int representing the actual constant value, generate and install a constant of the desired type.

sym_index enter_variable(position_information*, const pool_index, const sym_index)

Given position information, a pool_index to an identifier’s name, and a sym_index to the desired type, generate and install a variable of the desired type.

sym_index enter_variable(const pool_index, const sym_index)

This convenience method is used for installing temporary variables for which position information is not relevant. See quads.cc (lab6).

sym_index enter_array(position_information*, const pool_index, const sym_index, const int)

Given position information, a pool_index to an identifier’s name, a sym_index to the desired type, and an array cardinality, generate and install an array of the desired type.

sym_index enter_function(position_information*, const pool_index)

Given position information, and a pool_index to an identifier’s name, generate and install a function.

The type is set later on since it will not be known at symbol install time.

sym_index enter_parameter(position_information*, const pool_index, const sym_index)

Given position information, a pool_index to an identifier’s name, and a sym_index to the desired type, generate and install a parameter of the desired type.

sym_index enter_nametype(position_information*, const pool_index)

Given position information, and a pool_index to an identifier’s name, generate and install a nametype of the desired type.

Since DIESEL doesn’t allow user-defined types, you will never need to use this method. It is used from the symbol table class constructor to install the default types, though.

NOTE: Should perhaps be marked private?

All the enter_ methods call the internal method install_symbol() with the appropriate parameters. The common attributes (described in ) are set in install_symbol(), but the specific attributes are set in the enter_-method that called install_symbol(). A newly installed symbol will have its tag field set to SYM_UNDEF, because otherwise there will be a name collision. Take care to study all the symbol classes and make sure you set the appropriate attributes. Don’t overlook the constructors either; some work has already been done for you.

As far as constants are concerned the values of all attributes at call time to enter_constant() are known, but this is not the case with, for example, enter_procedure(). The size of a procedure (the attribute ar_size) can obviously not be known before we know how many local variables there are.

Study how enter_variable(), for example, uses the fact that current_environment() always returns the nearest surrounding procedure or function to update its ar_size attribute. Pay attention to how the safe down-casting is done; it’s not that hard to understand once you get the idea, and it is a technique used throughout this lab course.

Debugging help

All symbols can be sent directly to cout. The entire symbol table can be printed using the print() method with various arguments (see above). (For those who know C++ well, there are certain things called io manipulators which are used to regulate the level of detail printed about a symbol, but you really shouldn’t need to use them. Their implementation resides at the bottom of the file symbol.cc for those who are interested.)

How do I know it’s correct?

When you compile your program using make, a binary executable file called symtab will be created. It will run the code residing in the symtabtest.cc file, simulating the parsing of a code fragment.

There are three tests you should do: 2a, 2b and 2c. You run these by passing a flag to the executable, for example: ./symtab a. The tests should have different values depending on which test you are running. These are described at the top of every test output (see Example execution).

Test 2a shows if your symbol table works, 2b tests if your program complains about re-declarations and 2c tests all the various enter_-methods.

Reporting your work

  • For the demonstration, run the test program (symtab) for the three test cases 2a-c and discuss your code.

  • Hand in your code as described in Handing in Results.

Example execution

./symtab a
Listing 5 trace/symtab2a.trace
Starting symtabtest.cc...
Inside p2:
symbol:
  id:        c
  type:      INTEGER
  level:     3
  hash_link: 15
  back_link: 99
  offset:    0
  tag:       SYM_VAR 
  class:     variable_symbol

symbol:
  id:        b
  type:      REAL
  level:     2
  hash_link: 11
  back_link: 98
  offset:    0
  tag:       SYM_VAR 
  class:     variable_symbol

symbol:
  id:        a
  type:      INTEGER
  level:     1
  hash_link: -1
  back_link: 97
  offset:    0
  tag:       SYM_VAR 
  class:     variable_symbol

Inside p1:
symbol:
  id:        c
  type:      REAL
  level:     2
  hash_link: 12
  back_link: 99
  offset:    8
  tag:       SYM_VAR 
  class:     variable_symbol

symbol:
  id:        b
  type:      REAL
  level:     2
  hash_link: 11
  back_link: 98
  offset:    0
  tag:       SYM_VAR 
  class:     variable_symbol

symbol:
  id:        a
  type:      INTEGER
  level:     1
  hash_link: -1
  back_link: 97
  offset:    0
  tag:       SYM_VAR 
  class:     variable_symbol

7GLOBAL.4VOID7INTEGER4REAL4READ5WRITE7INT-ARG5TRUNC8REAL-ARG4prog1a1b1c2p11b1c2p21c
-----------------------------------------------------------------------------------^ (pool_pos = 83)

Symbol table (size = 17):
Pos  Name      Lev Hash Back Offs Type      Tag
-----------------------------------------------
  0: GLOBAL.     0   -1  159    0 GLOBAL.   SYM_PROC      lbl = -1 ar_size = 0  
  1: VOID        0   -1   82    0 VOID      SYM_NAMETYPE  
  2: INTEGER     0   -1  462    0 VOID      SYM_NAMETYPE  
  3: REAL        0   -1  324    0 VOID      SYM_NAMETYPE  
  4: READ        0   -1  316    0 INTEGER   SYM_FUNC      lbl = 0  ar_size = 0  
  5: WRITE       0   -1  139    0 VOID      SYM_PROC      lbl = 1  ar_size = 0  
  6: INT-ARG     0   -1  210    0 INTEGER   SYM_PARAM     
  7: TRUNC       0   -1  332    0 INTEGER   SYM_FUNC      lbl = 2  ar_size = 0  
  8: REAL-ARG    0   -1  427    0 REAL      SYM_PARAM     
  9: prog        0   -1   24    0 VOID      SYM_PROC      lbl = 3  ar_size = 24 
 10: a           1   -1   97    0 INTEGER   SYM_VAR       
 11: b           1   -1   98    8 INTEGER   SYM_VAR       
 12: c           1   -1   99   16 INTEGER   SYM_VAR       
 13: p1          1   -1  161    0 VOID      SYM_PROC      lbl = 4  ar_size = 16 
 14: b           2   -1   98    0 REAL      SYM_VAR       
 15: c           2   -1   99    8 REAL      SYM_VAR       
 16: p2          2   -1  162    0 VOID      SYM_PROC      lbl = 5  ar_size = 8  
 17: c           3   -1   99    0 INTEGER   SYM_VAR       
ENDING TEST PROGRAM RUN -----------------------------



./symtab b
Listing 6 trace/symtab2b.trace
Starting symtabtest.cc...
Type conflict, line 0, col 0: Redeclaration: symbol:
  id:        a
  type:      INTEGER
  level:     1
  hash_link: -1
  back_link: 97
  offset:    0
  tag:       SYM_VAR 
  class:     variable_symbol

Inside p2:
symbol:
  id:        c
  type:      INTEGER
  level:     3
  hash_link: 15
  back_link: 99
  offset:    0
  tag:       SYM_VAR 
  class:     variable_symbol

symbol:
  id:        b
  type:      REAL
  level:     2
  hash_link: 11
  back_link: 98
  offset:    0
  tag:       SYM_VAR 
  class:     variable_symbol

symbol:
  id:        a
  type:      INTEGER
  level:     1
  hash_link: -1
  back_link: 97
  offset:    0
  tag:       SYM_VAR 
  class:     variable_symbol

Inside p1:
symbol:
  id:        c
  type:      REAL
  level:     2
  hash_link: 12
  back_link: 99
  offset:    8
  tag:       SYM_VAR 
  class:     variable_symbol

symbol:
  id:        b
  type:      REAL
  level:     2
  hash_link: 11
  back_link: 98
  offset:    0
  tag:       SYM_VAR 
  class:     variable_symbol

symbol:
  id:        a
  type:      INTEGER
  level:     1
  hash_link: -1
  back_link: 97
  offset:    0
  tag:       SYM_VAR 
  class:     variable_symbol

7GLOBAL.4VOID7INTEGER4REAL4READ5WRITE7INT-ARG5TRUNC8REAL-ARG4prog1a1a1b1c2p11b1c2p21c
-------------------------------------------------------------------------------------^ (pool_pos = 85)

Symbol table (size = 17):
Pos  Name      Lev Hash Back Offs Type      Tag
-----------------------------------------------
  0: GLOBAL.     0   -1  159    0 GLOBAL.   SYM_PROC      lbl = -1 ar_size = 0  
  1: VOID        0   -1   82    0 VOID      SYM_NAMETYPE  
  2: INTEGER     0   -1  462    0 VOID      SYM_NAMETYPE  
  3: REAL        0   -1  324    0 VOID      SYM_NAMETYPE  
  4: READ        0   -1  316    0 INTEGER   SYM_FUNC      lbl = 0  ar_size = 0  
  5: WRITE       0   -1  139    0 VOID      SYM_PROC      lbl = 1  ar_size = 0  
  6: INT-ARG     0   -1  210    0 INTEGER   SYM_PARAM     
  7: TRUNC       0   -1  332    0 INTEGER   SYM_FUNC      lbl = 2  ar_size = 0  
  8: REAL-ARG    0   -1  427    0 REAL      SYM_PARAM     
  9: prog        0   -1   24    0 VOID      SYM_PROC      lbl = 3  ar_size = 24 
 10: a           1   -1   97    0 INTEGER   SYM_VAR       
 11: b           1   -1   98    8 INTEGER   SYM_VAR       
 12: c           1   -1   99   16 INTEGER   SYM_VAR       
 13: p1          1   -1  161    0 VOID      SYM_PROC      lbl = 4  ar_size = 16 
 14: b           2   -1   98    0 REAL      SYM_VAR       
 15: c           2   -1   99    8 REAL      SYM_VAR       
 16: p2          2   -1  162    0 VOID      SYM_PROC      lbl = 5  ar_size = 8  
 17: c           3   -1   99    0 INTEGER   SYM_VAR       
ENDING TEST PROGRAM RUN -----------------------------



./symtab c
Listing 7 trace/symtab2c.trace
Starting symtabtest.cc...
Current environment: 0
7GLOBAL.4VOID7INTEGER4REAL4READ5WRITE7INT-ARG5TRUNC8REAL-ARG9test_proc
----------------------------------------------------------------------^ (pool_pos = 70)
Current environment: 9
Constant stored as:
symbol:
  id:        test_const1
  type:      REAL
  level:     1
  hash_link: -1
  back_link: 183
  offset:    0
  tag:       SYM_CONST 
  class:     constant_symbol
  const_value.rval:2.45
Current environment: 0
7GLOBAL.4VOID7INTEGER4REAL4READ5WRITE7INT-ARG5TRUNC8REAL-ARG9test_proc11test_param111test_param211test_const19test_var19test_func11test_array1
------------------------------------------------------------------------------------------------------------------------------------------^ (pool_pos = 138)

Symbol table (size = 15):
Pos  Name      Lev Hash Back Offs Type      Tag
-----------------------------------------------
  0: GLOBAL.     0   -1  159    0 GLOBAL.   SYM_PROC      lbl = -1 ar_size = 0  
  1: VOID        0   -1   82    0 VOID      SYM_NAMETYPE  
  2: INTEGER     0   -1  462    0 VOID      SYM_NAMETYPE  
  3: REAL        0   -1  324    0 VOID      SYM_NAMETYPE  
  4: READ        0   -1  316    0 INTEGER   SYM_FUNC      lbl = 0  ar_size = 0  
  5: WRITE       0   -1  139    0 VOID      SYM_PROC      lbl = 1  ar_size = 0  
  6: INT-ARG     0   -1  210    0 INTEGER   SYM_PARAM     
  7: TRUNC       0   -1  332    0 INTEGER   SYM_FUNC      lbl = 2  ar_size = 0  
  8: REAL-ARG    0   -1  427    0 REAL      SYM_PARAM     
  9: test_proc   0   -1  211    0 VOID      SYM_PROC      lbl = 3  ar_size = 8  
 10: test_param1 1   -1  481    0 INTEGER   SYM_PARAM     
 11: test_param2 1   -1  482    8 REAL      SYM_PARAM     prec = test_param1 
 12: test_const1 1   -1  183    0 REAL      SYM_CONST     value = 2.45
 13: test_var1   1   -1  249    0 INTEGER   SYM_VAR       
 14: test_func   1   -1  427    0 INTEGER   SYM_FUNC      lbl = 4  ar_size = 24 
 15: test_array1 2   -1  143    0 INTEGER   SYM_ARRAY     card = 3   
ENDING TEST PROGRAM RUN -----------------------------



1

A name denotes an object. An identifier is a string, e.g.“abc”.

2

An open hash table means longer search time.

3

The main program is viewed as a procedure.