Symbol Tables DragonBook Section 2.7 additions
CS 524 - Spring 2008

Return to class page
From your Computer Science background, Data Structures and Programming Languages, I can assume that you have already worked with Linked Lists, Binary Search Trees and Hash Tables, and this handout provides a focus on the precise materials from Chapter 8 that we will need for our project later on and provides a focus for preparing for our midterm next week.

Basic Implementation Techniques will review Ordered Lists, Unordered Lists, Binary Search Trees and Hash Tables in case you need a reminder. Also this helps set the terminology.

Block-Structured Symbol Tables

There are 2 main competing structures for implementing symbol tables,

But now we want to look at implementing possibly nested procedures. So let's focus first on the translation process.

We define the visibility rules for names in the presence of multiple scopes:

  1. at any point in the text of a program, only names declared in the current scope and in the open scopes containing the current scope are accessible
  2. if a name is declared in more than one open scope, the INNERMOST declaration, the one nearest the reference, is used to interpret a reference to that name
  3. new declarations can be made only in the current scope

The implication of these rules is that when a scope is closed, all declarations made within the scope become inaccessible.

Our text has Figure 8.5 Nested Scope Example fully worked out in

Lets use an example which I will built in lecture today and start the discussion of how this will interface with our project - Phase 3: Integer Declaration (with implications for much later Phase 7: Subroutines with local and global variables).

          package main is body                -- open level 0
               ma: integer;

               procedure a is           
                    aa,ab: integer;           -- open level 1

                    procedure b is       
                         ba: integer;         -- open level 2

                         procedure c is 
                              ca,cb: integer; -- open level 3
                         begin -- c
                              ...

                         end;  -- c           -- close level 3

                    begin -- b
                        ...
                    end;  -- b                -- close level 2

               begin -- a
                   ...
               end;  -- a                     -- close level 1

          begin -- main;
              ...
          end;  -- main;                      -- closel level 0

An Individual Table for Each Scope

With this idea, we build a separate symbol table as each scope is opened to take care of its variables. This table might be either a binary tree or a hash table. We would need to maintain a Scope_Stack to keep track of the "static nesting" (i.e. known at compile-time). When we parse the procedure a - we open a symbol table for a's vars and PUSH it onto the stack. When we parse procedure b - we open another table for b's vars and PUSH it onto the stack. And so forth for c. We have a global variable Scope_Number keeping track of the current scope being translated.

When translating the body of c, we have the following:

                                   ----------------
                                 / |  table with  |
               -----------      /  |    ca,cb     |
               | level 3 | ----/   ----------------
               -----------
               | level 2 | ----------------------------> ---------------
               -----------                               |  table with |
               | level 1 | -----------> ---------------  |    ba,c     |
               -----------              |  table with |  ---------------
               | level 0 | -------\     |   aa,ab,b   |
               -----------         \    ---------------
               |         |          \
               -----------          ----------------
               Scope_Stack          |  table with  |
                                    |   ma, a      |
                                    ----------------
When translating a statement within the body of c, we must search in the symbol tables of all open scope for that name. Supposed we had a statement (in the body of c):

ma := ba + aa + cb;

We first look up "ma" in the level 3 table and do not find it. Then the level 2 table, the level 1 table and finally the level 0 table, where "ma" is found. Therefore we know all references to ma will involve

(level 0, offset)

but at this point of the translation, we only need to build the stmt structure for an assignment which contains a pointer into the symbol table for the l.h.s. of the assignment - therefore a pointer into the "level 0 table".

"ba" will be referenced and begin the PUSHing onto the semantic stack as we build the information for an EXPRTREE. We first search the level 3 table, and find "ba" in the level 2 table, so the pointer in the EXPRTREE will be to the level-2 table.

It's easy to easy that the "visibility rules" are satisfied since we will find the INNERMOST declaration of "name" first.

When we reach the end of translating the body of procedure c, we can just POP the Scope_Stack, making the variables declared in c inaccessible.

NOTE - the "table" could be either a binary tree or a hash table.

A Single Symbol Table

Alternatively, we could have a single symbol table and keep track explicitly of the scope number of each name. In the example above, each nested scope had variables with different names, but try to imagine what would happen if each nested scope reused the same name in the following example.


Let's see what the competing ideas of binary tree vs. hash table will look like in this instance.

Assuming we using chaining to resolve the conflict in hash table, i.e. when 2 names hash to the same index in the hash table, we add the "newer name" to the FRONT of the linked list of names in the chain. For this example, let's assume:

                       name       hash location
                         ma             3
                          a             5
                         aa             4
                         ab             2
                          b             1
                         ba             3
                          c             5
                         ca             3
                         cb             4
Not a very good hash function but good for examples. Note, we must store both the "name" and the "scope-number".
          -----------
     1    |         | ------> b (1)  -> NULL
          -----------
     2    |         | ------> ab (1) -> NULL
          -----------
     3    |         | ------> ca (3) -> ba (2) -> ma (0) -> NULL
          -----------
     4    |         | ------> cb (3) -> aa (1) -> NULL
          -----------
     5    |         | ------> c (2)  -> a (0)  -> NULL
          -----------
          hash table
When a new scope is entered, we increment the global counter for current_scope_number and proceed with inserting new "name" in the hash table. Note - the higher scope numbered variables will always be the "first find" when we walk the CHAINed LIST of all "name"s that hashed to the same hash table location.

If we are translating a procedure at scope level 3 and are looking for a "name" that hashed to location 2 in the hash table for example, then we need to walk the entire chain to NULL before we can flag the variable as undeclared OR we find the "name" and its scope number.

When we close a scope, then we have to check every entry in the hash table (i=1,2,3,4,5 above) and walk the chains to remove names. Suppose we just closed the procedure c scope (i.e. level 3). We look at each hash table chain in turn. Hash-entry 2 begins with a variable at level 1, so we know we do not have to do anything. Hash-entry 3 has a level 3 name, so we delete it from the front of the list. But the next entry is at level 2, so we are done. Hash table entry 4 has a level 3 var, which we delete, but the next name is level 1, so we are finished with that chain. Since the higher scope-level names are always at the front of the hash-table-chain-list, it is very easy to close a scope. Similarly, we always find the INNERMOST declaration of any "name".

If we use a single binary search tree, then opening and closing scopes is not so simple. The level 3 names will all be at the leaves of the global binary search tree - since they were inserted last!

In summary, if we use "hash tables" then a single hash table will easily handle open/close scope rules.

If we use "binary trees", then a Scope_Stack will easily handle open/close scope rules.


NOTE: We may actually have more elaborate declarations. There could be a variable declared for procedure b appearing after the entire declaration of procedure c. This would mean that this variable would not be visible to procedure c!

                    procedure b is
                         ba: integer;

                         procedure c is
                              ca,cb: integer;
                         begin -- c
                              ...

                         end;  -- c

                         newone: integer; -- c CANNOT "see" newone!
                    begin -- b
                        ...
                    end;  -- b


It is useful to examine how the symbol table is handled in our current running project.