Return to CS 524 class page
Upd: 26March08 * Project Deadlines
This is a long document. I have tried to collect information in one place that you will need as the semester goes along. Do not be intimidated by all the details. The project is set up to ease you into things. The first phase is a one line change to the file simple.lex; each following phase is more involved, but tries to isolate on a single file for the change. At the end of the semester, you should feel comfortable with making big changes - but not until the end of the semester. The beginning code that you are given is a working compiler with many flags to govern its debugging and output capabilities. You should examine your copy of the early compiler to get comfortable with its capabilities. Its diagnostic output will be very helpful later on.
A compiler is an essential tool that most students in computer science are familiar with only as a tool, you've never seen the insides. I hope that after this course you will have a better understanding of, and perhaps some sympathy for, some of the compilers you use and their lack of meaningful diagnostic messages, especially at run time.
There are several distinct parts in a compiler. This description will give an overview of the code that comprises your project for CS 524.
We use the simulated machine: MIPS R2000. Dr. James Larus,
from the University of Wisconsin, developed the simulator, SPIM,
to be used in conjunction with the text, "Computer Organization
& Design: The Hardware/Software Interface" by Patterson and
Hennessy.
Top
Move into this directory
and copy the files from stewart's directory
Do not forget the final period in the "cp" above. You may get a warning about some files that were not copied, do not worry about them.
generating the following text response, make-output.txt, and you now will have your own version of the initial compiler. You can work through some of the following examples using the compiler options.
and see what the options and compiler directives available are.
rohan% simple
in simple.c
No parameters specified
usage :
simple [-l list_name][-o code_file][-nc][-nl][-d][-a][-i][-s][-v] source_file
-o name : rename the assembly code file
-l name : rename the output list file
-nc : do not generate code
-nl : do not generate listing
-d : turn off Tokentrace/Semstack flag
-a : turn on assembler dump to screen
-i : dump Intermediate Representation
-s : dump Symbol Table after parse
-v : dump variables addresses
-p : turn on PtraceSem
-c : turn on PtraceCode
-b : turn on PtraceSymtab
-t : turn on PtraceTree
-y : view parser reductions (yacc)
Compiler directives (default values in parens):
%%dmp : Dump the Symbol Table now (off)
%%c : Toggle code generation (on)
%%t : Toggle token trace (off)
%%ss : Toggle semantic stack display (off)
%%pr : Toggle procedure trace (off)
Some interesting, informative samples that you just copied from stewart's directory are:
Using a shorter, simpler test file: texp0
NOTE:
If any source code example has a readln in it (which tfirst does),
then SPIM is going to patiently wait (forever!) for you to type
a number as input for the read.
So, if the system appears hung, the example you are
running might be waiting on a readln - try typing a number.
Top
man lex
man yacc
For more details, the text discusses both of these. You have a
makefile which will properly invoke these tools as needed in our
project.
We will also be using the Gnu gcc compiler. It provides some decent run-time debugging help and the makefile is set up to generated the symbol table needed for the debugger to run. There is a lot of option output available from the compiler project itself, but when all else fails you can use the symbolic debugger in the following way. Suppose you have just produced a new version of the compiler project (called "simple" by default) and when running the project (using, for example, simple tfirst) you encounter a run-time error.
dbx simple
run -d tfirst
should give you the line number and module in your source code where
the run-time error occurred.
exittype exit to leave dbx.
You may also be interested in RCS (Revision Control System) for handling the many updates your project will be going through over the semester. A brief introduction is available:
man rcsintro
man rcs
You are given a compiler for SIMPLE which compiles a "simple language" that includes arithmetic operations (multiplication, division, addition and subtraction), the If-then statement, Assignment statement and I/O statements. We will extend the SIMPLE compiler to implement Macro (subset of Ada/CS). A self-contained description of Macro is available under Documents of this handout and linked above.
The following extensions will be developed over the semester (you should refer to the Macro description for the definition of our language for more details on how these extensions are to look):
** 26March08Update**
GETTING TO HERE GIVES YOU A 'C' ON THE PROJECT
GETTING TO HERE GIVES YOU A 'B' ON THE PROJECT
Extra Credit - Can bring your code score to 10/10 - Due by Last Day Finals Week. You may request an extension on time to complete project only for the extra credit Phase.
Only groups who have Proceduress in by its deadline can make this request. You must send email to stewart@rohan.sdsu.edu that you intend to pursue this.
It would be fun to see, but I think it asks too much, so I will just leave this as an indication of how incomplete the compiler is:
To receive an A on the project, your group must budget its time
to have things completed within the semester.
Top
Once you have tested your project executable, simple, and feel is it running to your satisfaction you should:
When you begin working as a group, you will have identified the members of the group to your instructor and the Point of Contact (POC). The POC submits the notice that the project is ready to be check and must cc: the other members of the group on this email note.
Do not change the name of the executable. Do not change any of the protections on your file because the script is able to access it as Superuser and I do not want to worry about any unauthorized person trying to access your file.
Over the semester, we will extend the grammar of our language and you will extend the capabilities of your compiler. I would recommend the following order below.
%%t
as the first line of the your new test file, first-d, and make sure all new tokens are being recognized.
simple first-d
simple -y tfirst
to be sure things are working properly with the grammar.
simple -i tfirst
beta :=2;
write (beta,3);
writeln;
readln(gamma);
writeln(gamma);
alpha := 2 + beta * gamma;
writeln(alpha);
if beta = 2 then
beta := beta + gamma * alpha + alpha;
writeln(beta);
end if;
You see we have:
reserved words readln write writeln if then end if
symbols := + * - / ; ( ) = ,
variable names any sequence of characters, beginning with
an alpha character
For our project, this is defined in the file simple.lex
The first three phases are accomplished using the technique of syntax directed translation. Our compiler analyzes the input source and produces an "intermediate form", called an "abstract syntax tree" (AST).
For the program above, there is a compiler flag that you can use (-i) that will output this intermediate form. One way to represent the program above is through the sequence of statements (2 writeln's are omitted here for space):
:= write readln :=
/ \ | | / \
beta 2 \ list \list alpha +
beta 3 gamma / \
2 *
/ \
beta gamma
ifstmt
/ \
/ \
= thenlist
/ \ | writeln
beta 2 | := stmt
Once you set up your copy of the initial project by copying
from stewart's file space, you should try
simple -i tfirst
to get a feeling for how this intermediate representation (the
AST) is presented using crude output capabilities.
# Register Usage:
# $s0 for global variables
#
.text
.globl main
main:
la $s0, GVARS
#
# Start Code
#
# Generate Assignment Statement
li $t0,2
sw $t0,4($s0)
#
# Generate Write statement
li $v0, 1
lw $a0,4($s0)
syscall
li $v0, 1
li $a0,3
syscall
#
# Generate Writeln statement
la $a0, S0
li $v0, 4
syscall
#
# Generate Readln statement
li $v0, 5
syscall
sw $v0,8($s0)
#
# Generate Writeln statement
li $v0, 1
lw $a0,8($s0)
syscall
la $a0, S0
li $v0, 4
syscall
#
# Generate Assignment Statement
lw $t0,4($s0)
lw $t1,8($s0)
mul $t0,$t0,$t1
add $t0,$t0,2
sw $t0,0($s0)
#
# Generate Writeln statement
li $v0, 1
lw $a0,0($s0)
syscall
la $a0, S0
li $v0, 4
syscall
#
# Generate If-Then statement
lw $t0,4($s0)
seq $t0,$t0,2
beq $t0, 0,IF0
#
# Generate Assignment Statement
lw $t0,8($s0)
lw $t1,0($s0)
mul $t0,$t0,$t1
lw $t1,4($s0)
add $t0,$t0,$t1
lw $t1,0($s0)
add $t0,$t0,$t1
sw $t0,4($s0)
#
# Generate Writeln statement
li $v0, 1
lw $a0,4($s0)
syscall
la $a0, S0
li $v0, 4
syscall
#
# End If
IF0:
#
# Halt execution
li $v0 10
syscall
#
# Finish up by writing out constants
.word 0
CONST: #Constant storage area
.data
S0: .asciiz "\n"
#
# Reserve space for global variables
.word 0
GVARS: # space for Global Variables
.data
_alpha: .word 0 # Offset at 0
.data
_beta: .word 0 # Offset at 4
.data
_gamma: .word 0 # Offset at 8
.data
Temp_Wr:
.word 0 #just for alignment of write(exprtree)
q tfirst
will invoke the compiler, simple, and uses command line options
to renamed the generated assembler code, tfirst.s, and listing
file, tfirst.list; then tfirst.s is run by SPIM to produce the
following:
execute the code in spim:
SPIM Version 6.2 of January 11, 1999
Copyright 1990-1998 by James R. Larus (larus@cs.wisc.edu).
All Rights Reserved.
See the file README for a full copyright notice.
Loaded: /opt/spim/bin/trap.handler
23 <-- appeared as output
7 <-- I typed the value 7 because the code is waiting on readln
7 <-- resulting output from the code above
16 <-- 2 + 2 (beta) * 7 (gamma)
130 <-- 2 (beta) + 7 (gamma) * 16 (alpha) + 16 (alpha)
There are several other script files available that produce
different amounts of output. In the script "q" above, q stands for
quick and provides a minimal level of output.
A more information script file is "doit" and one that produces much more output, that I would recommend using in the following manner:
listdoit tfirst >& tfirst.out
You need the redirect >& in order to capture any error messages
that are the output from SPIM. (There should be any at this
time, but there will be later as the semester progresses.)
-p : turn on PtraceSem - code in sem.c
-c : turn on PtraceCode - code in codegen.c
-b : turn on PtraceSymtab - code in symtab.c
-t : turn on PtraceTree - code in printree.c
I employed the following naming conventions for all the header files:
You should NOT manipulate the following to header files: