CS 524 Spr08 students - keep in mind our focus will be on LR grammars, which are introduced in Sections 4.5 (Bottom-Up Parsing) and 4.6 (Intro to LR Parsing: Simple LR). Our text covers both LL grammars (top-down parsing) and LR grammars (bottom-up parsing).
A Context Free Grammer (CFG) is defined by the following four components:
Starting with S, nonterminals are rewritten using productions until only terminals remain (at which point the derivation is done). The set of strings derivable from S comprises the context-free grammar G, denoting L(G).
These components are often grouped into a "four-tuple"
(V_{t}, V_{n}, S, P), which is the formal definition
of the CFG. The vocabulary V of a CFG is the set of terminal
and nontemrinal symbols, i.e.
In describing CFGs and their parsers, it is sometimes important to distinguish whether a single symbol is required or whether a string of symbols is possible. Similarly, sometimes only a terminal or nonterminal symbol is appropriate, and at other times any vocabulary symbol may occur. To clarify exactly what sorts of symbol strings are expected, we will use the following notational conventions:
Using this notation, a production would be written as
A → α or A → X_{1} … X_{m}
Often more than one production shares the same right-hand-side. Rather than repeat the left-hand side, an "or notation" is used:
A → α | β | … | ζ
If A → γ is a production, then α A β ⇒ αγβ, where ⇒ denotes a one-step derivation (using production A → γ ).
We extend ⇒ to ⇒ ^{+}, derived in one or more step, and ⇒ ^{*}, derived in zero or more steps.
If A⇒^{*}β, then β is said to be a sentential form of the CFG.
SF(G) is the set of sentential forms of grammar G.
We can produce two different derivations for ID - ID - ID and this is not a good thing.
Grammars that allow different parse trees for the same terminal string are termed ambiguous. They are rarely used because a unique structure (that is, parse tree) cannot be guaranteeds for all inputs, and hense a unique translation guided by the parse tree may not be obtained.
We would like an algorithm that checks to see if a grammar is ambiguous. However, it is impossible to decide whether a given CFG is amgibuous (Hopcroft and Ullman 1969), so such an algorithm is impossible to create. Fortunately, for certain grammar classes, we can prove that constituent grammars are unambigous.
The most potentially serious flaw that a grammar might have is that it generates the "wrong language".
begin < Stmts > end $
We will draw the diagram for a Top-down parse and Bottom-up parse for the input stream
A bottom-up parse proceeds by discovering subtrees and linking them into increasingly larger trees.
The productions predicted by a top-down parser represent a leftmost derivation; hence such parsers are said to produce a leftmost parse. The sequence of producton recognized by a bottom-up parser is termed a rightmost parse; and is the exact reverse of the production sequence that represents a rightmost derivation.
Another intuitive example of how to disambiguate a grammar is provided by the Expression Grammar. intuitive-grm-spr08.html