Chapter 0: Introduction
Text: Dragon Book (Compilers: Principles, Techniques & Tools)

Kris Stewart, GMCS 535
23Jan08

Overview and History

Translator, transforming human-oriented programming languages into computer-oriented machine languages. For most users, it is a black-box.

Allows users to ignore machine-dependent details of machine language and develop machine-independent skills and expertise. Particularly valuable with the explosive growth in computing now.

Term compiler coined in the early 1950s by Grace Murray Hopper (Admiral Hopper). First compilers were Fortran in the late 1950s - had to perform translations and optimizations to be competitive with assembly language programmers in those days. The early days of computers were VERY MACHINE ORIENTED and the PROGRAMMER TIME WAS CHEAP.

Thankfully this has changed and portability of programming skills and availability of tools to assist the programmer are assumed to be offered.

What do Compilers do?

A User's View of a Compiler

                            -------------------
                            |                 |
Programming Language -----> |   Compiler      | ----> Machine
   (source)                 |                 |        (target)
                            -------------------

Compiler as translator - source languages produced are many and they will not be used unless there is a compiler to assist the programmer.

The choice of target code can vary:

  1. Pure Machine Code
  2. Virtual Machine Code
  3. Assembly Language (Symbolic) Format
  4. Memory-Image (Load-and-Go) Format

Interpreters - program is merely data to be manipulated. Locus of control during executioni resides in the interpreter and not the user program

Characteristics of Interpreter:

  1. Modification of or addition to user programs as execution proceeds.
  2. Language in which the type of object that a variable denotes may change dynamically - user program continually reexamined as execution proceeds and symbols need not have a fixed meaning.
  3. Better diagnostics, since source text analysis is intermixed with execution of the program.
  4. Significant degree of machine independence

Large overheads possible

  1. As execution proceeds, program text must be continuously reexamined, with identifier bindings, types and operations potentially being reconsidered at each reference.
  2. Substantial space overhead may be involved. Interpreter and all support routines must be avaialbe.

The Structure of a Compiler
Compiler must perform two major tasks: analysis of the source program being compiled and synthesis of a machiner-language program that when executed will correctly perform the activities described by that source program.

Almost all modern compilers are syntax-directed. That is, the compilation process is driven by the syntactic structure of the source program, as recognized by the parser.

For this class, our parser will be created by the UNIX tool

yacc

.

The parser builds this syntactic structure out of tokens, the lowest-level symbols used to define a programming language syntax. For this class, our lexical analysis will be done by defining regular expressions and using the UNIX tool

lex

. to create our scanner

Scanning is covered in Ch3: Lexical Analysis.

The parser-creator takes as input a formal syntax specification, typically as a context-free grammar (CFG). The parser reads tokens and groups them into units as specified by the productions of the CFG. Grammars and parser are discussed in Chapter 4: Syntax Analysis. The class goal is to become comfortable with Bottom-Up parsing.

The semantic routines actually supply the meaning (semantics) of the program, based on the syntactic structure.

Lex checks our spelling
Yacc checks our grammar

If the semantics are correct, i.e. the source program makes sense, we will generate an intermediate representation (IR) of the program. This is the translation step. The IR can then serve as the input to generate the Target Code.

Overview of our Project [File/Print Preview/ 130% to see on screen well]

Color coded: Yellow for Semantic Routines; Green for Symbol Table routines; Blue for Code geneation routines.

NOTE: we will start to work with this in lab on Friday. You will receive your Rohan account for the class then.

The Syntax and Semantics of Programming Languages
The complete definition of a program language must include specifications of its syntax (structure) and its semantics (meaning). Given the almost universal use of CFGs as a syntactic specification mechanism for programming languages, syntax is usually divided into context-free and contest-sensitive components. Context-free syntax defines legal sequences of symbols, independent of what the symbol means.

A := B + C;

might specify the assignment statement for an arithmetic expression.

Similarly,

D := B < C;

might specify an assignment statement for a bolean variable. These examples imply that the variable D will be a logical variable, because the right hand side is a logical expression. This is meaning.

What are the data types for B and C?

How do we carefully specify this?

Compiler Design and Programming Language Design

Our primary interest is the design and implementation of compilers for modern programming languages. An interesting aspect of this study is how programming langauge design and compiler design influence one another. A language without a compiler is not likely to be used by many. Designing a language that is easy to compile is not always the road taken.

Languages that are easy to compile have many advantages:

  • They are often easier to learn, read and understand. If a feature is hard to compile, it may well be hard to understand.
  • They will have quality compilers on a wide variety of machines.
  • Often, better code will be generated.
  • Fewer compiler bugs will occur.
  • The compiler will be smaller, cheaper, faster, more reliable, and more widely used.
  • Compiler diagnostics and program development features will often be better.

What attributes must be found in a programming language to allow compilation? Many considerations apply, but mostly the issue is what can be determined and bound before execution begins and what must be deferred until after execution begins.

  • Can the scope and binding of each identifier reference be determined before execution begins? In other words, can the data object, or perhaps a pointer to the data obejct, denoted by an identifier, be determined at compile-time? If not, it may be necessary to effectively look up an identifier in a run-time symbol table each time it is used.
  • Can the type of an object be determined before execution begins? If not, the meaning of each operator may need to be continually reevaluated. For example, A * B may mean something very different if A and B are arrays rather than integer variables.
  • Can existing program text be changed or added to during execution?

In general, compilable languages will be those that have structure, identifier scoping, and type binding fixed at compile-time (before execution begins). Knowing that these program components are statis is essential to a compiler if it is to be able to completely translate operations to concrete target machine instructions.

NOTE: we will be developing this vocabulary over the semester.

Influences on Computer Design

In this course we have chosen to focus on RISC (Reduced Instruction Set Computers).

Return to Class page