CS 524 - Compiler Construction
Kris Stewart
Phase2: Strings - Due: 05March08
25Feb08- Update
Part 1 Front End
Hold on to your socks. This will be a dive into the
data structures and modules that are used by our compiler
to implement strings. There are definitely many ways to accomplish
upgrades to the compiler. There are may be other ways to implement
strings and if you feel that you understand the project, you are definitely
encouraged to pursue your own investigations.
Think about Forming Groups
You should talk with your classmates to form groups of two or three persons
who will work together on the remaining phases of the project.
One student will be the Point of Contact (POC) and
should send email to stewart@rohan.sdsu.edu listing the members of the
group, with a cc: to other group members. Always using your class account.
When you have Phase3 Integer Declarations (more info later) ready to
be checked, the POC sends email to stewart@rohan.sdsu.edu with a Subject
field indicating that CS 524 Phase 3 is ready to be checked and cc: to
other group member class accounts.
Some Suggestions for Phase2: Strings
I have included everything here for the Front End (i.e.
scanning, parsing and semantic analysis) with lots of details that
will begin to make sense as you become familiar with the
compiler project. I have tried to directly indicate where the
changes go to help give you guidance, but there are several modules
in our project and you want to approach things incrementally.
I always get the simple.lex working first and see some new
token being picked up and get simple.lr partially working -
without any semantic actions so I can just watch the
semantic stackdumps have the correct tags. (This means you have to
change some *.h files too). Then you take the big plunge
and implement the actions in sem.c - basically in the order
they will be invoked by your grammar and whatever test data
you choose to run. KEEP THE TEST DATA VERY SIMPLE AT FIRST
- one line in a file is plenty.
My first test file (
q1
) consisted of the line: writeln ("hi");
My second test file (
q2
) was : writeln ("hi",x,y);
My third test file (
q3
) introduced printing ": writeln (" ""hi"" ");
As the semester progresses, the hints will not be this
elaborate, but I want to be sure everyone gets started
well.
This will be a good way to become familiar with the command
line directives that you can use to control debugging
output.
Implementing strings:
Create a new directory, Phase2, for this
part of the project and copy your files that correctly implemented
Phase1: Comments into this directory.
- 1. Modify simple.lex and simple.lr
-
Modify simple.lex to recognize strings and return a
token (say, Strconst). See the
Macro document [under
Resources from class page for a
definition of our strings. See the
SPIM document to see how our target language works
with strings (discussed in another handout: Strings: Part2 - Back End
dealing with code generation).
** Updated 15Feb08 **
simple.lex for Phase2. This include the module
strip quotes to assist you in learning the interface between your project,
the Unix tool lex and the process of recognizing the new token of a String Constant.
Note, although you are being given a routine to handle the scanning of strings,
you still need to create for yourself the grammar that brings the String Constants
into our Ada-like language.
You also must change simple.lr to include Strconst in the list
of tokens (at the beginning of this file) which the parser (generated by yacc)
expects to have returned by the scanner (generated by lex).
- 2. Partially test project
- Execute make and with the new simple, try
simple -d q1
and the code should be able to
print out Token: Strconst, but then it dies.
- 3. Modify simple.lr to include new grammar rules
-
We must include grammar rules that describe how Strings can occur in
our source language, which is limited to allowing the writestmt to have
strings as parameters. These are the changes I used:
actparam : expr {createactparam();}
|
Strconst {pushstring(yytext);
createactparam();}
Reading section 3.4.2 on Lex, you see that the variable, yytext, is
the buffer that lex maintains and has the most recently scanned
input token.
You should also continue the corresponding call to lr_debug
so that the simple -y option will give diagnostic output.
- 4. Partially test project
-
We want to see that the grammar rule makes sense, which requires
editting sem.c to create a stub for
pushstring. A stub is the outline of a new module, without
any action in the body of the module. It will allow us to link up
the project and examine how the grammar change affects it.
I used
pushstring()
{
}
- 5. Partially test again
-
You should be able to see the action pushstring getting invoked,
but then the compiler dies.
- 6. Actually implementing strings [Front End]
-
We don't want an arbitrary limit on the length of strings.
Also don't want to allocate some maximum amount for each
string (say an array with 80 character elements) since most
strings are short and we don't want to waste space.
Therefore, we'll have a linked list.
The way I chose to implement strings used the global
constant list Prog.const_listptr (declared in astdef.h and
initialized InitSemantics in sem.c). Every time a string is
encountered, it will be added to this constant list. The
AST that is being built for a write statement (since that's
the only place strings occur) has a paramlist for the items
in the write. When the item is a string, then we need a
pointer into the constant list indicating which string it
is. You'll find several routines (discussed later in these
notes) in codegen.c that manipulate information given a
pointer to the constant list which are already written.
-
7. Modify structures for strings.
-
a-i) paramnode (in semform.h)
- needs an additional tag
String, in addition to reference and Value. Consider the source
code writeln (x,3,"hi");. This gives an example of a
reference parameter (x), a value parameter (3) and a
string parameter ("hi") in the same list.
-
a-ii) the actual paramnode now has a field for a string,
-
the paramnode is a pointer into the global constant list
that holds all strings (note, now only the
" " blank string
is there).
The union defined for PARAMNODE in semform.h needs the
addition of
struct constlist *where
- b) Semantic Stack refers to strings
-
the semantic stack needs to hold string constants - add
to SEMSTACKTAG (in semstack.h) an entry for strconstentry
(12). Add a field to the semstackentry to hold the
character string, i.e.
char *strconst; /* strconstentry */
-
8. Modify sem.c
- Modify sem.c to accomodate new semantic actions from simple.lr
- a) pushstring(s)
char *s;
-
You want to take the input string s and place it on the top
of the semantic stack. Look at how pushid takes the
identifier name and moves it to the semstack. You must
allocate space for the string in the semstack and then change
the tag to be strconstentry. Watch out how you had lex
handle the string. If you use the example from the book,
then the first and last " has been stripped off. When you
get to the point of generating SPIM code, it will have to be
S1: .asciiz "whatever"
We already have a routine (constorage in codegen.c) that
will traverse the global constant list and write this out -
assuming it is given just the string which is
the case when you've strip_quotes. You'll need to tidy
things up (once you get everything else running) and make
sure your listing file (s.list) has the same input, but
don't worry about this at first.
-
b) createactparam
- there will be either a strconstentry or
exprentry on the stack. need to check which one. In either
case, change to parmentry on Top of stack. Put in a
"switch" on the kind of entry that is on the stack. If an
exprentry, you already have the code. If strconstentry, you
want to AddStringConstant (located in sem.c - no changes
needed) using the top of the stack. Then set the param type
to String and param->pe.where to Prog.const_listptr, which
now points to the new entry in the constant list. For both
cases, you already have the code that allocates the new
paramnode and links it into the front of the linked list.
The difference is the kind of information you store in the
paramnode.
-
c) print_stack
- needs to have the capability to output a
strconstentry. Just mimic the code that is already there
for identry.
- 9. Partially test the project
-
You should be able to get a little further and the stack should contain
a string that looks correct. Then your compiler will die.
Look at s.out - you already will have the new strings dumped out in the
LABEL CONST section of the output SPIM code.
*** p.s. while you are in sem.c, I noticed a module (MergeAddOps)
which is no longer invoked by the grammar and never used. So I
would recommend removing this small module from sem.c
- 10. Make the changes needed in printree.c
-
printree.c produces the Abstract Syntx Tree, which
will guide a lot of our developing understanding of grammers, so that
you can get the AST which has the writeln list of parameters.
this will involve printparam
Additional hints will be available for the code
generation portion [Back End]. Examine the spim
document to see how strings are handled.
Return to Class Page