CS 524 - Compiler Construction
Kris Stewart
Phase2: Strings - Due: 05March08

25Feb08- Update

Part 1 Front End

Hold on to your socks. This will be a dive into the data structures and modules that are used by our compiler to implement strings. There are definitely many ways to accomplish upgrades to the compiler. There are may be other ways to implement strings and if you feel that you understand the project, you are definitely encouraged to pursue your own investigations.

Think about Forming Groups

You should talk with your classmates to form groups of two or three persons who will work together on the remaining phases of the project.

One student will be the Point of Contact (POC) and should send email to stewart@rohan.sdsu.edu listing the members of the group, with a cc: to other group members. Always using your class account. When you have Phase3 Integer Declarations (more info later) ready to be checked, the POC sends email to stewart@rohan.sdsu.edu with a Subject field indicating that CS 524 Phase 3 is ready to be checked and cc: to other group member class accounts.

Some Suggestions for Phase2: Strings

I have included everything here for the Front End (i.e. scanning, parsing and semantic analysis) with lots of details that will begin to make sense as you become familiar with the compiler project. I have tried to directly indicate where the changes go to help give you guidance, but there are several modules in our project and you want to approach things incrementally.

I always get the simple.lex working first and see some new token being picked up and get simple.lr partially working - without any semantic actions so I can just watch the semantic stackdumps have the correct tags. (This means you have to change some *.h files too). Then you take the big plunge and implement the actions in sem.c - basically in the order they will be invoked by your grammar and whatever test data you choose to run. KEEP THE TEST DATA VERY SIMPLE AT FIRST - one line in a file is plenty.

My first test file ( q1 ) consisted of the line: writeln ("hi");
My second test file ( q2 ) was : writeln ("hi",x,y);
My third test file ( q3 ) introduced printing ": writeln (" ""hi"" ");

As the semester progresses, the hints will not be this elaborate, but I want to be sure everyone gets started well.

This will be a good way to become familiar with the command line directives that you can use to control debugging output.

Implementing strings:

Create a new directory, Phase2, for this part of the project and copy your files that correctly implemented Phase1: Comments into this directory.
1. Modify simple.lex and simple.lr
Modify simple.lex to recognize strings and return a token (say, Strconst). See the Macro document [under Resources from class page for a definition of our strings. See the SPIM document to see how our target language works with strings (discussed in another handout: Strings: Part2 - Back End dealing with code generation).

** Updated 15Feb08 **
simple.lex for Phase2. This include the module strip quotes to assist you in learning the interface between your project, the Unix tool lex and the process of recognizing the new token of a String Constant.

Note, although you are being given a routine to handle the scanning of strings, you still need to create for yourself the grammar that brings the String Constants into our Ada-like language.

You also must change simple.lr to include Strconst in the list of tokens (at the beginning of this file) which the parser (generated by yacc) expects to have returned by the scanner (generated by lex).

2. Partially test project
Execute make and with the new simple, try

simple -d q1

and the code should be able to print out Token: Strconst, but then it dies.

3. Modify simple.lr to include new grammar rules
We must include grammar rules that describe how Strings can occur in our source language, which is limited to allowing the writestmt to have strings as parameters. These are the changes I used:
actparam        :     expr {createactparam();}
                      |
                      Strconst {pushstring(yytext);
                                createactparam();}
Reading section 3.4.2 on Lex, you see that the variable, yytext, is the buffer that lex maintains and has the most recently scanned input token.

You should also continue the corresponding call to lr_debug so that the simple -y option will give diagnostic output.

4. Partially test project
We want to see that the grammar rule makes sense, which requires editting sem.c to create a stub for pushstring. A stub is the outline of a new module, without any action in the body of the module. It will allow us to link up the project and examine how the grammar change affects it.

I used

pushstring()
{
}

5. Partially test again
You should be able to see the action pushstring getting invoked, but then the compiler dies.

6. Actually implementing strings [Front End]
We don't want an arbitrary limit on the length of strings. Also don't want to allocate some maximum amount for each string (say an array with 80 character elements) since most strings are short and we don't want to waste space. Therefore, we'll have a linked list.

The way I chose to implement strings used the global constant list Prog.const_listptr (declared in astdef.h and initialized InitSemantics in sem.c). Every time a string is encountered, it will be added to this constant list. The AST that is being built for a write statement (since that's the only place strings occur) has a paramlist for the items in the write. When the item is a string, then we need a pointer into the constant list indicating which string it is. You'll find several routines (discussed later in these notes) in codegen.c that manipulate information given a pointer to the constant list which are already written.

7. Modify structures for strings.
a-i) paramnode (in semform.h)
needs an additional tag String, in addition to reference and Value. Consider the source code writeln (x,3,"hi");. This gives an example of a reference parameter (x), a value parameter (3) and a string parameter ("hi") in the same list.
a-ii) the actual paramnode now has a field for a string,
the paramnode is a pointer into the global constant list that holds all strings (note, now only the

" " blank string

is there). The union defined for PARAMNODE in semform.h needs the addition of
                struct constlist *where
b) Semantic Stack refers to strings
the semantic stack needs to hold string constants - add to SEMSTACKTAG (in semstack.h) an entry for strconstentry (12). Add a field to the semstackentry to hold the character string, i.e.
                 char *strconst;  /* strconstentry */

8. Modify sem.c
Modify sem.c to accomodate new semantic actions from simple.lr
a) pushstring(s) char *s;

You want to take the input string s and place it on the top of the semantic stack. Look at how pushid takes the identifier name and moves it to the semstack. You must allocate space for the string in the semstack and then change the tag to be strconstentry. Watch out how you had lex handle the string. If you use the example from the book, then the first and last " has been stripped off. When you get to the point of generating SPIM code, it will have to be

S1: .asciiz "whatever"

We already have a routine (constorage in codegen.c) that will traverse the global constant list and write this out - assuming it is given just the string which is the case when you've strip_quotes. You'll need to tidy things up (once you get everything else running) and make sure your listing file (s.list) has the same input, but don't worry about this at first.

b) createactparam
there will be either a strconstentry or exprentry on the stack. need to check which one. In either case, change to parmentry on Top of stack. Put in a "switch" on the kind of entry that is on the stack. If an exprentry, you already have the code. If strconstentry, you want to AddStringConstant (located in sem.c - no changes needed) using the top of the stack. Then set the param type to String and param->pe.where to Prog.const_listptr, which now points to the new entry in the constant list. For both cases, you already have the code that allocates the new paramnode and links it into the front of the linked list. The difference is the kind of information you store in the paramnode.
c) print_stack
needs to have the capability to output a strconstentry. Just mimic the code that is already there for identry.

9. Partially test the project
You should be able to get a little further and the stack should contain a string that looks correct. Then your compiler will die.

Look at s.out - you already will have the new strings dumped out in the LABEL CONST section of the output SPIM code.

*** p.s. while you are in sem.c, I noticed a module (MergeAddOps) which is no longer invoked by the grammar and never used. So I would recommend removing this small module from sem.c

10. Make the changes needed in printree.c
printree.c produces the Abstract Syntx Tree, which will guide a lot of our developing understanding of grammers, so that you can get the AST which has the writeln list of parameters.

this will involve printparam

Additional hints will be available for the code generation portion [Back End]. Examine the spim document to see how strings are handled.

Return to Class Page