-
Overview and History
-
Translator, transforming human-oriented programming languages
into computer-oriented machine languages. For most users,
it is a black-box.
Allows users to ignore machine-dependent details of machine language and
develop machine-independent skills and expertise. Particularly
valuable with the explosive growth in computing now.
Term compiler coined in the early 1950s by Grace Murray Hopper
(Admiral Hopper).
First compilers were Fortran in the late 1950s - had to perform
translations and optimizations to be competitive with assembly language
programmers in those days. The early days of computers were VERY
MACHINE ORIENTED and the PROGRAMMER TIME WAS CHEAP.
Thankfully this has changed and portability of programming skills and
availability of tools to assist the programmer are assumed to be offered.
-
What do Compilers do?
-
A User's View of a Compiler
-------------------
| |
Programming Language -----> | Compiler | ----> Machine
(source) | | (target)
-------------------
Compiler as translator - source languages produced are many and
they will not be used unless there is a compiler to assist the programmer.
The choice of target code can vary:
- Pure Machine Code
- Virtual Machine Code
- Assembly Language (Symbolic) Format
- Memory-Image (Load-and-Go) Format
Interpreters - program is merely data to be manipulated. Locus of control
during executioni resides in the interpreter and not the user program
Characteristics of Interpreter:
- Modification of or addition to user programs as execution proceeds.
- Language in which the type of object that a variable denotes may
change dynamically - user program continually reexamined as execution
proceeds and symbols need not have a fixed meaning.
- Better diagnostics, since source text analysis is intermixed with
execution of the program.
- Significant degree of machine independence
Large overheads possible
- As execution proceeds, program text must be continuously
reexamined, with identifier bindings, types and operations
potentially being reconsidered at each reference.
- Substantial space overhead may be involved. Interpreter and
all support routines must be avaialbe.
-
The Structure of a Compiler
-
Compiler must perform two major tasks: analysis of the
source program being compiled and synthesis of a
machiner-language program that when executed will correctly perform
the activities described by that source program.
Almost all modern compilers are syntax-directed.
That is, the compilation process is driven by the syntactic structure of
the source program, as recognized by the parser.
For this class, our parser will be created by the UNIX tool
.
The parser builds this syntactic structure out of tokens,
the lowest-level symbols used to define a programming language syntax.
For this class, our lexical analysis will be done by
defining regular expressions and using the UNIX tool
.
to create our scanner
Scanning is covered in Ch3: Lexical Analysis.
The parser-creator takes as input a formal syntax specification, typically as a
context-free grammar (CFG). The parser reads tokens and
groups them into units as specified by the productions of
the CFG. Grammars and parser are discussed in Chapter 4: Syntax Analysis.
The class goal is to become comfortable with Bottom-Up parsing.
The semantic routines actually supply the meaning
(semantics) of the program, based on the syntactic structure.
Lex checks our spelling
Yacc checks our grammar
If the semantics are correct, i.e. the source program makes sense, we
will generate an intermediate representation (IR) of the program. This
is the translation step.
The IR can then serve as the input to generate the Target Code.
Overview of our Project
[File/Print Preview/ 130% to see on screen well]
Color coded: Yellow for Semantic Routines; Green for Symbol Table routines;
Blue for Code geneation routines.
NOTE: we will start to work with this in lab on Friday. You will
receive your Rohan account for the class then.
- The Syntax and Semantics of Programming Languages
-
The complete definition of a program language must include specifications
of its syntax (structure) and its semantics (meaning). Given
the almost universal use of CFGs as a syntactic specification mechanism for
programming languages, syntax is usually divided into context-free
and contest-sensitive components. Context-free syntax defines
legal sequences of symbols, independent of what the symbol means.
A := B + C;
might specify the assignment statement for an arithmetic expression.
Similarly,
D := B < C;
might specify an assignment statement for a bolean variable.
These examples imply that the variable D will be a logical variable,
because the right hand side is a logical expression. This is meaning.
What are the data types for B and C?
How do we carefully specify this?
-
Compiler Design and Programming Language Design
-
Our primary interest is the design and implementation of compilers for
modern programming languages. An interesting aspect of this study is
how programming langauge design and compiler design
influence one another. A language without a compiler is not likely
to be used by many. Designing a language that is easy to compile is
not always the road taken.
Languages that are easy to compile have many advantages:
- They are often easier to learn, read and understand. If a feature
is hard to compile, it may well be hard to understand.
- They will have quality compilers on a wide variety of machines.
- Often, better code will be generated.
- Fewer compiler bugs will occur.
- The compiler will be smaller, cheaper, faster, more reliable,
and more widely used.
- Compiler diagnostics and program development features will often
be better.
What attributes must be found in a programming language to allow compilation?
Many considerations apply, but mostly the issue is what can be determined
and bound before execution begins and what must be deferred until after
execution begins.
- Can the scope and binding of each identifier reference be determined
before execution begins? In other words, can the data object, or perhaps
a pointer to the data obejct, denoted by an identifier, be determined at
compile-time? If not, it may be necessary to effectively look up an
identifier in a run-time symbol table each time it is used.
- Can the type of an object be determined before execution begins?
If not, the meaning of each operator may need to be continually reevaluated.
For example, A * B may mean something very different if A and
B are arrays rather than integer variables.
- Can existing program text be changed or added to during execution?
In general, compilable languages will be those that have structure, identifier
scoping, and type binding fixed at compile-time (before execution begins).
Knowing that these program components are statis is essential to a compiler
if it is to be able to completely translate operations to concrete
target machine instructions.
NOTE: we will be developing this vocabulary over the semester.
-
Influences on Computer Design
-
In this course we have chosen to focus on RISC (Reduced Instruction Set
Computers).