CS 524 Compiler Construction
Kris Stewart
Strings Hints - Back End

Phase 2 due Friday 29Feb08

25Feb08 ** update **
NOTE: This is the second part of Hints for Implementing Strings. The Part 1: Front End should be read first. I am trying to point out the changes that are needed in using SPIM's LABEL to provide a Label Name for each string. Notice, your variables are handled with register/offset addressing and strings will be handled differently.

q0 with Tokens, SemStack, AST, SPIM, may be useful as a guide for your own testing.

Test 1 uses listdoit which is a script file you have in your Phase0 directory
Test 1 with parse info update Next week we start Chapter 7: Semantic Processing and the semstack will be important
Test2 Test2 with Tokens and SemStack
Two different ways simple.lex and simple.lr and codegen.c are described here:
```
I want to point out a change to the "FrontEnd" that might help with
Scanning/Parsing.  There are several ways to handle strings and I want
to outline two of them now.  Our text p. 67 gives two lex-rules to 
recognize strings and I found \"([^\n\"]|\"\")*\" to be most useful, 
using the character class [^\n\"] to specify both not EOL and not "
----------------------------------------------------------------
a) 
simple.lex 
\"([^\n\"]|\"\")*\"             return token(StrConst);

along with the stripquotes routine slightly modified to translate the
"" in the input buffer, yyytext, to \"

and then in simple.lr
actparam        :       expr
		{lr_debug ("actparam  -> expr");
		CreateActParam(); }
		|
		StrConst
		{lr_debug ("actparam -> StrConst");
		strip_quotes(yytext);
		PushString(yytext);
		CreateActParam(); 

in order to preserve keeping a valid LISTING file
----------------------------------------------------------------
** OR **
----------------------------------------------------------------
b)
simple.lex
\"([^\n\"]|\"\")*\"          { stripquotes(); return token( StrConst );}

along with the stripquotes routine that only removes the front and end 
" of the input buffer

and then in simple.lr
actparam        :       expr
		{lr_debug ("actparam  -> expr");
		CreateActParam(); }
		|

		STrConst
		{lr_debug ("actparam  -> Sconst");
		PushString( yytext );
		CreateActParam(); }

and then in codegen.c, the routine ConstStorage can perform the
translation of "" to \" just in time for SPIM
```
You should examine the WriteInstr routine in codegen.c to see how the Wrln instruction is able to generate SPIM code to write the new line string.
11. codegen.c
(a) GenCode provides a good model for strings

char *new_line = "\\n";
and the use of
AddStringConstant (new_line);
to add this string to the front of the list is a good model to look at. NOTE: the routine AddStringConstant has already been written for you in sem.c and is used in codegen.c to add the sample of a newline to the initial project. You will be using AddStringConstant a lot more now in the translation of source code with strings, so it did make sense to place this module in sem.c from the beginning. This string will always be the first string (since it's called in codegen.c not sem.c where the new strings from the parse are handled) in the string list.
You don't need to change anything in GenCode.

b) CalcConstOffsets()
I recommend that you assign each string a unique label number. Notice, the beginning version of the compiler created
S0: .asciiz "\n"
to correspond with the new_line string above. Continue this model. This is used to translate writeln differently from the write. Construct a test problem, say
```
x:=2;
write(x);
writeln(x);
```
to see the use of the initial string constant S0 above. The write(x); is translated to a WriteProc while the writeln(x); becomes a WritelnProc - both are Standard Procedure Calls (StdProcCall) in the GenStmt routine in codegen.c
CalcConstOffsets traverses the global list of strings that was constructed by sem.c when building the Abstract Syntax Tree for a write or writeln statement. This routine computes a new label number for each string, setting the appropriate field in the string list.
S1, S2 and such will be used in the SPIM code your compiler generates. Note, since the new_line string is added to the list last [in codegen.c after the "FrontEnd" semantic processing by sem.c], it will be the first labelled string and therefore will have label S0. Examine GenStmt for the details on how S0 is used for the WriteLn but not for the Write discussed in the paragraphs above.
c) ConstStorage()
This routine "walks" the global string list and generates the appropriate compiler directives
label: .asciiz "whatever"
at the very end of your generated SPIM code. This needs no change because the label number has already been generated by CalcConstOffsets. given a pointer into the string list, this routine will write the label address of that string (which is what you need in GenWrites).
d) GenWrites(params)
Check the type of param - by using a switch. If it's a String, then params->pe.where gives the information you will need. Again, use the WriteInstr portion for the Wrln as a model.
TEST IT
IT SHOULD BE WORKING FOR ALL CASES INVOLVING SIMPLE QUOTES STRINGS

A final consideration is to consider the special case of the double quote ("). q3_listdoit provides the sample source code:
```
-- turn on tokens and semantic stack
-- examine the existing "writeln(x);" and the new
-- phase 2 writlne with strings and expressions
%%t
%%ss
x:=2;
writeln(x);
writeln(" ""hi"" there"," x = ",x);
```
and sample target code, with Constant storage area:
```
#
# Finish up by writing out constants 
.word 0
CONST:          #Constant storage area
.data
S0:     .asciiz "\n"
.data
S1:     .asciiz " x = "
.data
S2:     .asciiz " \"hi\" there"
#
# Reserve space for global variables
.word 0
VARS:           # space for Global Variables
.data
_x:     .word 0 # Offset at 0
.data
```
since SPIM handles the double quote using \"
There are several ways to handle processing the embedded double quote. Care must be taken since your compiler "Front End" must generate a valid listing file reflecting the users input source ccde. One possible way is to have the routine ConstStorage in codegen.c check the contents of each string for the "" which the source language, MACRO, requires and replace with the /' which the target language (SPIM) requires. Alternatively, you could have the "Front End" handle this during the scanning by lex and parsing by yacc. An updated email will be sent to the class outlining these choices. listing file
Return to Class Homepage

CS 524 Compiler Construction Kris Stewart Strings Hints - Back End

Phase 2 due Friday 29Feb08

TEST IT IT SHOULD BE WORKING FOR ALL CASES INVOLVING SIMPLE QUOTES STRINGS

CS 524 Compiler Construction
Kris Stewart
Strings Hints - Back End

TEST IT
IT SHOULD BE WORKING FOR ALL CASES INVOLVING SIMPLE QUOTES STRINGS