LR-Grammars

LR-Grammars LR(0), LR(1), and LR(K)

Deterministic Context-Free Languages • DCFL • A family of languages that are accepted by a Deterministic Pushdown Automaton (DPDA) • Many programming languages can be described by means of DCFLs

Prefix and Proper Prefix • Prefix (of a string) • Any number of leading symbols of that string • Example: abc • Prefixes: , a, ab, abc • Proper Prefix (of a string) • A prefix of a string, but not the string itself • Example: abc • Proper prefixes: , a, ab

Prefix Property • Context-Free Language (CFL) L is said to have the prefix property whenever w is in L and no proper prefix of w is in L • Not considered a serve restriction • Why? • Because we can easily convert a DCFL to a DCFL with the prefix property by introducing an endmarker

Suffix and Proper Suffix • Suffix (of a string) • Any number of trailing symbols • Proper Suffix • A suffix of a string, but not the string itself

Example Grammar • This is the grammar that will be used in many of the examples: • S’  Sc • S  SA | A • A  aSb | ab

LR-Grammar • Left-to-right scan of the input producing a rightmost derivation • Simply: • L stands for Left-to-right • R stands for rightmost derivation

LR-Items • An item (for a given CFG) • A production with a dot anywhere in the right side (including the beginning and end) • In the event of an -production: B   • B · is an item

Example: Items • Given our example grammar: • S’  Sc, S  SA|A, A  aSb|ab • The items for the grammar are: S’·Sc, S’S·c, S’Sc· S·SA, SS·A, SSA·, S·A, SA· A·aSb, Aa·Sb, AaS·b, AaSb·, A·ab, Aa·b, Aab·

Some Notation • * = 1 or more steps in a derivation • *rm = rightmost derivation • rm = single step in rightmost derivation

Right-Sentential Form • A sentential form that can be derived by a rightmost derivation • A string of terminals and variables  is called a sentential form if S* 

More terms • Handle • A substring which matches the right-hand side of a production and represents 1 step in the derivation • Or more formally: • (of a right-sentential form  for CFG G) • Is a substring  such that: • S *rm w • w =  • If the grammar is unambiguous: • There are no useless symbols • The rightmost derivation (in right-sentential form) and the handle are unique

Example • Given our example grammar: • S’  Sc, S  SA|A, A  aSb|ab • An example right-most derivation: • S’  Sc  SAc  SaSbc • Therefore we can say that: SaSbc is in right-sentential form • The handle is aSb

More terms • Viable Prefix • (of a right-sentential form for ) • Is any prefix of  ending no farther right than the right end of a handle of . • Complete item • An item where the dot is the rightmost symbol

Example • Given our example grammar: • S’  Sc, S  SA|A, A  aSb|ab • The right-sentential form abc: • S’ *rm Ac  abc • Valid prefixes: • A  ab for prefix ab • A  ab for prefix a • A  ab for prefix  • Aab is a complete item,  Ac is the right-sentential form for abc

LR(0) • Left-to-right scan of the input producing a rightmost derivation with a look-ahead (on the input) of 0 symbols • It is a restricted type of CFG • 1st in the family of LR-grammars • LR(0) grammars define exactly the DCFLs having the prefix property

Computing Sets of Valid Items • The definition of LR(0) and the method of accepting L(G) for LR(0) grammar G by a DPDA depends on: • Knowing the set of valid items for each prefix  • For every CFG G, the set of viable prefixes is a regular set • This regular set is accepted by an NFA whose states are the items for G

Continued • Given an NFA (whose states are the items for G) that accepts the regular set • We can apply the subset construction to this NFA and yield a DFA • The DFA whose state is the set of valid items for 

NFA M • NFA M recognizes the viable prefixes for CFG • M = (Q, V  T, , q0, Q) • Q = set of items for G plus state q0 • G = (V, T, P, S) • Three Rules • (q0,) = {S| S is a production} • (AB,) = {B| B is a production} • Allows expansion of a variable B appearing immediately to the right of the dot • (AX, X) = {AX} • Permits moving the dot over any grammar symbol X if X is the next input symbol

Theorem 10.9 • The NFA M has property that (q0, ) contains A iff A is valid for  • This theorem gives a method for computing the sets of valid items for any viable prefix • Note: It is an NFA. It can be converted to a DFA. Then by inspecting each state it can be determine if it is a valid LR(0) grammar

Definition of LR(0) Grammar • G is an LR(0) grammar if • The start symbol does not appear on the right side of any productions •  prefixes  of G where A is a complete item, then it is unique • i.e., there are no other complete items (and there are no items with a terminal to the right of the dot) that are valid for 

Facts we now know: • Every LR(0) grammar generates a DCFL • Every DCFL with the prefix property has a LR(0) grammar • Every language with LR(0) grammar have the prefix property • L is DCFL iff L has a LR(0) grammar

DPDA’s from LR(0) Grammars • We trace out the rightmost derivation in reverse • The stack holds a viable prefix (in right-sentential form) and the current state (of the DFA) • Viable prefixes: X1X2…Xk • States: s1, s2,…,sk • Stack: s0X1s1…Xksk

Reduction • If sk contains A • Then A is valid for X1X2…Xk •  = suffix of X1X2…Xk • Let •  = Xi+1…Xk • w such that X1…Xkw is a right-sentential form.

Reduction Continued • There is a derivation: • S *rm X1…XiAw rm X1…Xkw • To obtain the right-sentential form (X1…Xkw) in a right derivation we reduce  to A • Therefore, we pop Xi+1…Xk from the stack and push A onto the stack

Shift • If sk contains only incomplete items • Then the right-sentential form (X1…Xkw) cannot be formed using a reduction • Instead we simply “shift” the next input symbol onto the stack

Theorem 10.10 • If L is L(G) for an LR(0) grammar G, then L is N(M) for a DPDA M • N(M) = the language accepted by empty stack or null stack

Proof • Construct from G the DFA D • Transition function: recognizes G’s prefixes • Stack Symbols of M are • Grammar Symbols of G • States of D • M has start state q and other states used to perform reduction

We know that: • If G is LR(0) then • Reductions are the only way to get the right-sentential form when the state of the DFA (on the top of the stack) contains a complete item • When M starts on input w it will construct a right-most derivation for w in reverse order

What we need to prove: • When a shift is called for and the top DFA state on the stack has only incomplete items then there are no handles • (Note: if there was a handle, then some DFA state on the stack would have a complete item)

Suppose  state A (complete item) • Each state is put onto the top of the stack • It would then immediately be reduced to A • Therefore, a complete item cannot possibly become buried on the stack

Proof continued • The acceptance of G occurs when the top of the stack contains the start symbol • The start symbol by definition of LR(0) grammars cannot appear on the right side of a production • L(G) always has a prefix property if G is LR(0)

Conclusion of Proof • Thus, if w is in L(G), M finds the rightmost derivation of w, reduces w to S, and accepts • If M accepts w, then the sequence of right-sentential forms provides a derivation of w from S • N(M) = L(G)

Corollary of Theorem 10.10 • Every LR(0) grammar is unambiguous • Why? • The rightmost derivation of w is unique • (Given the construction we provided)

LR(1) Grammars • LR grammar with 1 look-ahead • All and only deterministic CFL’s have LR(1) grammars • Are greatly important to compiler design • Why? • Because they are broad enough to include the syntax of almost all programming languages • Restrictive enough to have efficient parsers (that are essentially DPDAs)

LR(1) Item • Consists of an LR(0) item followed by a look-ahead set consisting of terminals and/or the special symbol $ • $ = the right end of the string • General Form: • A  , {a1, a2, …, an} • The set of LR(1) items forms the states of a viable prefix by converting the NFA to a DFA

A grammar is LR(1) if • The start symbol does not appear on the right side of any productions • The set of items, I, valid for some viable prefix includes some complete item A, {a1,…,an} then • No ai appears immediately to the right of the dot in any item of I • If B, {b1,…,bk} is another complete item in I, then ai  bj for any 1  i  n and 1  j  k

Accepting LR(1) language: • Similar to the DPDA used with LR(0) grammars • However, it is allowed to use the next input symbol during it’s decision making • This is accomplished by appending a $ to the end of the input and the DPDA keeps the next input symbol as part of the state

LR(1) Rules for Reduce/Shift • If the top set of items has a complete item A, {a1, a2, …, an}, where A  S, reduce by A if the current input symbol is in {a1, a2, …, an} • If the top set of items has an item S, {$}, then reduce by S and accept if the current symbol is $ (i.e., the end of the input is reached) • If the top set of items has an item AaB, T, and a is the current input symbol, then shift

Regarding the Rules • Guarantees that at most one of the rules will be applied for any input symbol or $ • Often for practicality the information is summarized into a table • Rows: sets of items • Columns: terminals and $

LR-Grammars

LR-Grammars

Presentation Transcript

Grammars

Grammars

Grammars

LR(K) Grammars

LR Parsing

Context-Free Grammars – Regular Grammars

Grammars

Grammars

LR Parsing

Grammars

Grammars

Grammars

Grammars

7. Bottom-up Parsing 7.1 How a Bottom-up Parser Works 7.2 LR Grammars 7.3 LR Table Generation

Grammars

LR Parsing

LR Parsing