Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

# Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

## Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. MIT 6.035Top-Down Parsing Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

2. Orientation • Language specification • Lexical structure – regular expressions • Syntactic structure – grammar • This Lecture - recursive descent parsers • Code parser as set of mutually recursive procedures • Structure of program matches structure of grammar

3. Starting Point • Assume lexical analysis has produced a sequence of tokens • Each token has a type and value • Types correspond to terminals • Values to contents of token read in • Examples • Int 549 – integer token with value 549 read in • if - if keyword, no need for a value • AddOp + - add operator, value +

4. Basic Approach • Start with Start symbol • Build a leftmost derivation • If leftmost symbol is nonterminal, choose a production and apply it • If leftmost symbol is terminal, match against input • If all terminals match, have found a parse! • Key: find correct productions for nonterminals

5. Graphical Illustration of Leftmost Derivation Sentential Form NT1 T1 T2 T3 NT2 NT3 Apply Production Here Not Here

6. Grammar for Parsing Example StartExpr ExprExpr+Term ExprExpr-Term ExprTerm TermTerm* Int TermTerm/ Int Term Int • Set of tokens is { +, -, *, /, Int }, where Int = [0-9][0-9]* • For convenience, may represent each Int n token by n

7. Parsing Example Parse Tree Remaining Input Start 2-2*2 Sentential Form Start Current Position in Parse Tree

8. Parsing Example Parse Tree Remaining Input Start 2-2*2 Expr Sentential Form Expr Applied Production StartExpr Current Position in Parse Tree

9. Parsing Example Parse Tree Remaining Input Start 2-2*2 Expr Sentential Form Expr - Term Expr - Term Applied Production ExprExpr-Term

10. Parsing Example Parse Tree Remaining Input Start 2-2*2 Expr Sentential Form Expr - Term Term - Term Term Applied Production ExprTerm

11. Parsing Example Parse Tree Remaining Input Start 2-2*2 Expr Sentential Form Expr - Term Int - Term Term Applied Production Int Term Int

12. Parsing Example Parse Tree Remaining Input Match Input Token! Start 2-2*2 Expr Sentential Form Expr - Term 2 - Term Term Int 2

13. Parsing Example Parse Tree Remaining Input Match Input Token! Start -2*2 Expr Sentential Form Expr - Term 2 - Term Term Int 2

14. Parsing Example Parse Tree Remaining Input Match Input Token! Start 2*2 Expr Sentential Form Expr - Term 2 - Term Term Int 2

15. Parsing Example Parse Tree Remaining Input Start 2*2 Expr Sentential Form Expr - Term 2 - Term*Int Term Term * Int Applied Production Int 2 TermTerm* Int

16. Parsing Example Parse Tree Remaining Input Start 2*2 Expr Sentential Form Expr - Term 2 - Int * Int Term Term * Int Applied Production Int 2 Term Int Int

17. Parsing Example Parse Tree Remaining Input Match Input Token! Start 2*2 Expr Sentential Form Expr - Term 2 - 2* Int Term Term * Int Int 2 Int 2

18. Parsing Example Parse Tree Remaining Input Match Input Token! Start *2 Expr Sentential Form Expr - Term 2 - 2* Int Term Term * Int Int 2 Int 2

19. Parsing Example Parse Tree Remaining Input Match Input Token! Start 2 Expr Sentential Form Expr - Term 2 - 2* Int Term Term * Int Int 2 Int 2

20. Parsing Example Parse Tree Remaining Input Parse Complete! Start 2 Expr Sentential Form Expr - Term 2 - 2*2 Term Term * Int 2 Int 2 Int 2

21. Summary • Three Actions (Mechanisms) • Apply production to expand current nonterminal in parse tree • Match current terminal (consuming input) • Accept the parse as correct • Parser generates preorder traversal of parse tree • visit parents before children • visit siblings from left to right

22. Policy Problem • Which production to use for each nonterminal? • Classical Separation of Policy and Mechanism • One Approach: Backtracking • Treat it as a search problem • At each choice point, try next alternative • If it is clear that current try fails, go back to previous choice and try something different • General technique for searching • Used a lot in classical AI and natural language processing (parsing, speech recognition)

23. Backtracking Example Parse Tree Remaining Input Start 2-2*2 Sentential Form Start

24. Backtracking Example Parse Tree Remaining Input Start 2-2*2 Expr Sentential Form Expr Applied Production StartExpr

25. Backtracking Example Parse Tree Remaining Input Start 2-2*2 Expr Sentential Form Expr + Term Expr + Term Applied Production ExprExpr+Term

26. Backtracking Example Parse Tree Remaining Input Start 2-2*2 Expr Sentential Form Expr + Term Term + Term Term Applied Production ExprTerm

27. Backtracking Example Parse Tree Remaining Input Match Input Token! Start 2-2*2 Expr Sentential Form Expr + Term Int + Term Term Applied Production Int Term Int

28. Backtracking Example Parse Tree Remaining Input Can’t Match Input Token! Start -2*2 Expr Sentential Form Expr + Term 2 - Term Term Applied Production Int 2 Term Int

29. Backtracking Example Parse Tree Remaining Input So Backtrack! Start 2-2*2 Expr Sentential Form Expr Applied Production StartExpr

30. Backtracking Example Parse Tree Remaining Input Start 2-2*2 Expr Sentential Form Expr - Term Expr - Term Applied Production ExprExpr-Term

31. Backtracking Example Parse Tree Remaining Input Start 2-2*2 Expr Sentential Form Expr - Term Term - Term Term Applied Production ExprTerm

32. Backtracking Example Parse Tree Remaining Input Start 2-2*2 Expr Sentential Form Expr - Term Int - Term Term Applied Production Int Term Int

33. Backtracking Example Parse Tree Remaining Input Match Input Token! Start -2*2 Expr Sentential Form Expr - Term 2 - Term Term Int 2

34. Backtracking Example Parse Tree Remaining Input Match Input Token! Start 2*2 Expr Sentential Form Expr - Term 2 - Term Term Int 2

35. Left Recursion + Top-Down Parsing = Infinite Loop • Example Production: TermTerm*Num • Potential parsing steps: Term Term Term Num Term Num Term * * Term Num *

36. General Search Issues • Three components • Search space (parse trees) • Search algorithm (parsing algorithm) • Goal to find (parse tree for input program) • Would like to (but can’t always) ensure that • Find goal (hopefully quickly) if it exists • Search terminates if it does not • Handled in various ways in various contexts • Finite search space makes it easy • Exploration strategies for infinite search space • Sometimes one goal more important (model checking) • For parsing, hack grammar to remove left recursion

37. Eliminating Left Recursion • Start with productions of form • A A  • A   • ,  sequences of terminals and nonterminals that do not start with A • Repeated application of A A  builds parse tree like this: A  A A   

38. Eliminating Left Recursion • Replacement productions • A A  A  R R is a new nonterminal • A   R  R • R   New Parse Tree A Old Parse Tree A R  R  A  R    

39. Hacked Grammar Original Grammar Fragment TermTerm* Int TermTerm/ Int Term Int New Grammar Fragment Term Int Term’ Term’* Int Term’ Term’/ Int Term’ Term’ 

40. Parse Tree Comparisons Original Grammar New Grammar Term Term Int Term’ Int Term * Term’ * Int Int * Int Term’ * Int 

41. Eliminating Left Recursion • Changes search space exploration algorithm • Eliminates direct infinite recursion • But grammar less intuitive • Sets things up for predictive parsing

42. Predictive Parsing • Alternative to backtracking • Useful for programming languages, which can be designed to make parsing easier • Basic idea • Look ahead in input stream • Decide which production to apply based on next tokens in input stream • We will use one token of lookahead

43. Predictive Parsing Example Grammar StartExpr ExprTermExpr’ Expr’  + Expr’ Expr’  - Expr’ Expr’  Term Int Term’ Term’* Int Term’ Term’/ Int Term’ Term’ 

44. Choice Points • Assume Term’ is current position in parse tree • Have three possible productions to apply Term’* Int Term’ Term’/ Int Term’ Term’  • Use next token to decide • If next token is *, apply Term’* Int Term’ • If next token is /, apply Term’/ Int Term’ • Otherwise, apply Term’ 

45. Predictive Parsing + Hand Coding = Recursive Descent Parser • One procedure per nonterminal NT • Productions NT 1 , …,NT n • Procedure examines the current input symbol T to determine which production to apply • If TFirst(k) • Apply production k • Consume terminals in k (check for correct terminal) • Recursively call procedures for nonterminals in k • Current input symbol stored in global variable token • Procedures return • true if parse succeeds • false if parse fails

46. Example Boolean Term() if (token = Int n) token = NextToken(); return(TermPrime()) else return(false) Boolean TermPrime() if (token = *) token = NextToken(); if (token = Int n) token = NextToken(); return(TermPrime()) else return(false) else if (token = /) token = NextToken(); if (token = Int n) token = NextToken(); return(TermPrime()) else return(false) else return(true) Term Int Term’ Term’* Int Term’ Term’/ Int Term’ Term’ 

47. Multiple Productions With Same Prefix in RHS • Example Grammar NT if then NT if then else • Assume NT is current position in parse tree, and if is the next token • Unclear which production to apply • Multiple k such that TFirst(k) • if  First(if then) • if  First(if then else)

48. Solution: Left Factor the Grammar • New Grammar Factors Common Prefix Into Single Production NT if then NT’ NT’ else NT’ • No choice when next token is if! • All choices have been unified in one production.

49. Nonterminals • What about productions with nonterminals? NTNT11 NTNT2  2 • Must choose based on possible first terminals that NT1 and NT2 can generate • What if NT1 or NT2 can generate ? • Must choose based on 1 and 2

50. NT derives  • Two rules • NT implies NT derives  • NTNT1 ... NTnandfor all 1i n NTi derives  implies NT derives 