Bottom-Up Parsing

Bottom-Up Parsing

Bottom-Up Parsing • A bottom-up parser creates the parse tree of the given input starting from leaves towards the root. S  ...   (the right-most derivation of )  (the bottom-up parser finds the right-most derivation in the reverse order) • Bottom-up parsing is also known as SR parsing because its two main actions are shift and reduce. • At each shift action, the current symbol in the input string is pushed to a stack. • At each reduction step, the symbols at the top of the stack (this symbol sequence is the right side of a production) will replaced by the non-terminal at the left side of that production. • There are also two more actions: accept and error.

Shift-Reduce Parsing • A shift-reduce parser tries to reduce the given input string into the starting symbol. a string  the starting symbol reduced to • At each reduction step, a substring of the input matching to the right side of a production rule is replaced by the non-terminal at the left side of that production rule. Rightmost Derivation: S   Shift-Reduce Parser finds: S  ...   * rm rm rm

Shift-Reduce Parsing -- Example S  aABb input string: aaabb A  aA | a aaAbb B  bB | b aAbb  reduction aABb S S  aABb  aAbb  aaAbb  aaabb • How do we know which substring to be replaced at each reduction step? rm rm rm rm

Handle • Informally, a handle of a string is a substring that matches the right side of a production rule. • But not every substring matches the right side of a production rule is handle • Handle: It is a symbol or right side of any production rule, taken and replaced by left side to a production to the start symbol • If the grammar is unambiguous, then every right-sentential form of the grammar has exactly one handle.

Handle Pruning • A right-most derivation in reverse can be obtained by handle-pruning. S=0  1  2  ...  n-1  n= input string • Start from n, find a handle Ann in n, and replace n in by An to get n-1. • Then find a handle An-1n-1 in n-1, and replace n-1 in by An-1 to get n-2. • Repeat this, until we reach S. rm rm rm rm rm

A Shift-Reduce Parser E  E+T | T Right-Most Derivation of id+id*id T  T*F | F E  E+T  E+T*F  E+T*id  E+F*id F  (E) | id  E+id*id  T+id*id  F+id*id  id+id*id Right-Most Sentential FormReducing Production id+id*id F  id F+id*id T  F T+id*id E  T E+id*id F  id E+F*id T  F E+T*idF  id E+T*FT  T*F E+TE  E+T E Handlesare red and underlined in the right-sentential forms.

A Stack Implementation of A Shift-Reduce Parser • There are four possible actions of a shift-parser action: • Shift : The next input symbol is shifted onto the top of the stack. • Reduce: Replace the handle on the top of the stack by the non-terminal. • Accept: Successful completion of parsing. • Error: Parser discovers a syntax error, and calls an error recovery routine. • Initial stack just contains only the end-marker $. • The end of the input string is marked by the end-marker $.

A Stack Implementation of A Shift-Reduce Parser E  E+T | T ; T  T*F | F ; F  (E) | id RMD id+id*id ; E  E+T  E+T*F  E+T*id  E+F*id  E+id*id  T+id*id  F+id*id  id+id*id StackInputAction $ id+id*id$ shift $id +id*id$ reduce by F  id Parse Tree $F +id*id$ reduce by T  F $T +id*id$ reduce by E  T E 8 $E +id*id$ shift $E+ id*id$ shift E 3 + T 7 $E+id *id$ reduce by F  id $E+F *id$ reduce by T  F T 2 T 5 * F 6 $E+T *id$ shift $E+T* id$ shift F 1 F 4 id $E+T*id $ reduce by F  id $E+T*F $ reduce by T  T*F id id $E+T $ reduce by E  E+T $E $ accept

Shift-Reduce Parsers • There are two main categories of shift-reduce parsers • LR-Parsers • covers wide range of grammars. • SLR – simple LR parser • LR – most general LR parser • LALR – intermediate LR parser (look ahead LR parser) • SLR, LR and LALR work same, only their parsing tables are different. • Operator-Precedence Parser • simple, but only a small class of grammars.

LR Parsers • The most powerful shift-reduce parsing (yet efficient) is: LR parsing. left to right right-most scanning derivation • LR parsing is attractive because: • LR parsing is most general non-backtracking shift-reduce parsing, yet it is still efficient. • An LR-parser can detect a syntactic error as soon as it is possible to do so a left-to-right scan of the input.

LR Parsing Algorithm input stack output

LR – Parsing Algorithm Step 1 : The stack is initialized with [0]. The current state will always be the state that is at the top of the stack. Step 2: Given the current state and the current terminal on the input stream an action is looked up in the action table. There are four cases: • a shift sn: • the current terminal is removed from the input stream • the state n is pushed onto the stack and becomes the current state • a reduce rm: • the number m is written to the output stream • for every symbol in the right-hand side of rule m a state is removed from the stack • an accept: the string is accepted • no action: a syntax error is reported Step 3 : Step 2 is repeated until either the string is accepted or a syntax error is reported.

SLR parsing • A problem with LL(1) parsing is that most grammars need extensive rewriting to get them into a form that allows unique choice of production. • A class of bottom-up methods for parsing called LR parsers exist which accept a much larger class of grammars. • The main advantage of LR parsing is that less rewriting is required to get a grammar in acceptable form, also there are languages for which there exist LR-acceptable grammars but no LL(1) grammars. • We start our discussion with SLR for the following reasons: • It is simpler. • In practice, LALR(1) handles few grammars that are not also handled by SLR. • When a grammar is in the SLR class, the parse-tables produced by SLR are identical to those produced by LALR(1). • Understanding of SLR principles is sufficient to know how a grammar should be rewritten when a LALR(1) parser generator rejects it.

SLR parsing • The letters SLR stand for Simple, Left and Right. Left input is read from left to right & Right  a RMD is built. • LR parsers are table-driven bottom-up parsers and use two kinds of actions involving the input stream and a stack: • shift: A symbol is read from the input and pushed on the stack. • reduce: On the stack, a number of symbols that are identical to the right-hand side of a production are replaced by the left-hand side of that production. • When all of the input is read, the stack will have a single element, which will be the start symbol of the grammar. • Our aim is to make the choice of action depend only on the next input symbol and the symbol on top of the stack. To achieve this, we construct a DFA.

SLR parsing • Conceptually, this DFA reads the contents of the stack, starting from the bottom. • If the DFA is in an accepting state when it reaches the top of the stack, it will cause reduction by a production that is determined by the state and the next input symbol. • If the DFA is not in an accepting state, it will cause a shift. • Letting the DFA read the entire stack at every action is not very efficient, so, instead, we keep track of the DFA state every time we push an element on the stack, storing the state as part of the stack element.

SLR parsing • We represent the DFA as a table, where we cross-index a DFA state with a symbol (terminal or nonterminal) and find one of the following actions: • shift n: Read next input symbol, push state n on the stack. • go n: Push state n on the stack. • reduce p: Reduce with the production numbered p. • accept: Parsing has completed successfully. • error: A syntax error has been detected. • Note that the current state is always found at the top of the stack. Shift and reduce actions are found when a state is cross-indexed with a terminal symbol. • Go actions are found when a state is cross-indexed with a nonterminal.

Constructing SLR parse tables • An SLR parse table has as its core a DFA. • We first construct an NFA using a new techniques to convert this into a DFA. • We study the procedure by considering the grammar below.

Constructing SLR parse tables • The first step is to extend the grammar with a new starting production. • Doing this the grammar above yields the grammar below.

Constructing SLR parse tables • The next step is to make an NFA for each production. This is done by treating both terminals and non terminals as alphabet symbols. • The accepting state of each NFA is labeled with the number of the corresponding production. NFAs for the productions of the grammar above

Constructing SLR parse tables • The NFAs in the figure above make transitions both on terminals and non terminals. • Transitions by terminal corresponds to shift actions and transitions on non terminals correspond to go actions. • A go action happens after a reduction, whereby some elements of the stack (corresponding to the right-hand side of a production) are replaced by a nonterminal (corresponding to the left-hand side of that production).

Constructing SLR parse tables • To achieve this we must, whenever a transition by a nonterminal is possible, also allow transitions on the symbols on the right-hand side of a production for that nonterminal so these eventually can be reduced to the nonterminal. • We do this by adding epsilon-transitions to the NFAs. • Whenever there is a transition from state s to state t on a nonterminal N, we add epsilon-transitions from s to the initial states of all the NFAs for productions with N on the left-hand side. • The epsilon-transitions are shown in table below.

Constructing SLR parse tables • Together with epsilon-transitions, the NFAs form a single, combined NFA. • This NFA has the starting state A (the starting state of the NFA for the added start production) and an accepting state for each production in the grammar.

Conversion of NFA to DFA • We must now convert this NFA into a DFA using the subset construction. ε-closure (A) = {A, C, E, I, J} ε-closure (B) = {B} ε-closure (C) = {C, I, J} ε-closure (D) = {D} ε-closure (E) = { }={E} ε-closure (F) = {C, E, F, I, J} ε-closure (G) = { } ={ G} ε-closure (H) = {H} ε-closure (I) = {I} ε-closure (J) = { } ={J} ε-closure (K) = {K, I, J} ε-closure (L) = {L}

NFA to DFA conversionT along with S & empty

DFA

Table representation • Instead of showing the resulting DFA graphically, we construct a table where transitions on terminals are shown as shift actions and transitions on non terminals as go actions. • The table below shows the DFA constructed from the NFA made by adding epsilon-transitions

Constructing SLR parse tables SLR DFA for the grammar above

Operator-Precedence Parser • Operator grammar • small, but an important class of grammars • we may have an efficient operator precedence parser (a shift-reduce parser) for an operator grammar. • In an operator grammar, no production rule can have: •  at the right side • two adjacent non-terminals at the right side. • Ex: EAB EEOE EE+E | Aa Eid E*E | Bb O+|*|/ E/E | id not operator grammar not operator grammar operator grammar

Operator – Precedence Parser (OPP) • Bottom-up parsers for a class of CFG can be easily developed using operator grammars. • Operator grammars have the property that no production right side is empty or has two adjacent non terminals. These parser rely on the following three precedence relations:

OPP • There are two common ways of determining precedence relation hold between a pair of terminals. 1. Based on associativity and precedence of operators 2. Using operator precedence relation. • For Ex, * have higher precedence than +. We make + < * and * > +

Opr Val Input String Action $ $ id+id*id$ Shift < $ $id +id*id$ Shift $+ $id id*id$ Shift $+ $id id *id$ Shift $+* $id id id$ Shift $+* $id id id $ Reduce > $+ $id id2 $ Reduce $ $id + id2 $ Accept

Disadvantages of Operator Precedence Parsing • Disadvantages: • It is difficult to handle operators (like - unary minus)which have different precedence (the lexical analyzer should handle the unary minus). • Small class of grammars. • Difficult to decide which language is recognized by the grammar. • Advantages: • Simple and Easy to implement • powerful enough for expressions in programming languages • Can be constructed by hand after understanding the grammar. • Simple to debug

Bottom-Up Parsing