1 / 39

Lecture 3 Syntactic Definition

Lecture 3 Syntactic Definition. KU | Fall 2018 | Drew Davidson. Announcements. Entry Surveys (E1) processed Lots of really useful info – thanks! Office hours now set on course website Lecture slides now posted on website P1 is now released L2 video / assignment is up. Live Assignments.

read
Télécharger la présentation

Lecture 3 Syntactic Definition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 3Syntactic Definition KU | Fall 2018 | Drew Davidson

  2. Announcements • Entry Surveys (E1) processed • Lots of really useful info – thanks! • Office hours now set on course website • Lecture slides now posted on website • P1 is now released • L2 video / assignment is up Live Assignments L2 P1 H1

  3. Live Assignments P1 H2 Last Time: Implementing TokenizersReview Lecture 2 – Implementing Scanners RegEx -free NFA DFA -NFA Tokenizer Thompson’s Construction Algorithm Transition Action Table -elimination Rabin-Scott Powerset Construction

  4. Live Assignments P1 H2 Last Time: Implementing TokenizersReview Lecture 2 – Implementing Scanners RegEx -free NFA DFA -NFA Tokenizer Thompson’s Construction Algorithm Transition Action Table -elimination Rabin-Scott Powerset Construction

  5. Live Assignments P1 H2 Last Time: Implementing TokenizersReview Lecture 2 – Implementing Scanners RegEx -free NFA DFA -NFA Tokenizer Thompson’s Construction Algorithm Transition Action Table -elimination Rabin-Scott Powerset Construction

  6. Live Assignments P1 H2 Last Time: Implementing TokenizersReview Lecture 2 – Implementing Scanners RegEx -free NFA DFA -NFA Tokenizer Thompson’s Construction Algorithm Transition Action Table -elimination Rabin-Scott Powerset Construction

  7. Live Assignments P1 H2 Last Time: Implementing TokenizersReview Lecture 2 – Implementing Scanners RegEx -free NFA DFA -NFA Tokenizer Thompson’s Construction Algorithm Transition Action Table -elimination Rabin-Scott Powerset Construction

  8. Live Assignments P1 H2 From FSMs to Tokenizers…Where we left off last time… • Give our FSMs the ability to put chars back Amount to rewind Token to return , A S3 (letter, digit) letter S2 S • Add an EOF (end of file) alphabet symbol letter, digit

  9. A Simple TokenizerDFA -> Tokenizer • Consider a language with 2 statement types • Assignment: ID = expr • Increment: ID += expr • Where expr is of the form • ID + ID • ID < ID • ID <= ID • Identifiers follow C conventions

  10. A Simple TokenizerDFA -> Tokenizer ‘=‘ (‘=‘) (‘=‘) ‘=‘ B C ‘<‘ ‘+‘ A A A A A A G H A E F I ‘=‘ S ‘_’ |letter ‘_‘|letter|digit D (‘_‘|letter|digit) (: any other character)

  11. A Simple TokenizerDFA -> Tokenizer ‘=‘ (‘=‘) (‘=‘) ‘=‘ B C ‘<‘ ‘+‘ A A A A A A F A G H I E ‘=‘ S ‘_’ |letter ‘_‘|letter|digit D (‘_‘|letter|digit) (: any other character)

  12. Fill in the Transition Action table ‘=‘ (‘=‘) (‘=‘) ‘=‘ B C ‘<‘ ‘+‘ A A A A A A F E G H A I ‘=‘ S ‘_’ |letter ‘_‘|letter|digit D (‘_‘|letter|digit) (: any other character)

  13. ‘=‘ (‘=‘) (‘=‘) ‘=‘ B C ‘<‘ ‘+‘ A A A A A A H G A E F I ‘=‘ S ‘_’ |letter ‘_‘|letter|digit D (‘_‘|letter|digit) (: any other character)

  14. Tokenization Implemented!Lecture 2 – Implementing Scanners

  15. COMPILER Code Generation Execution Runtime Environment Optimization Intermediate Representation Parsing SDT Semantics Lexical Analysis Syntactic Definiton

  16. Live Assignments P1 H2 This TimePreview Lecture 3 – Defining Syntax How Language Syntax is Formally Defined • Check in on our compiler • Quick review of Context-Free Grammars • Why we need ‘em • How we use ‘em Syntactic Definition

  17. Building the CompilerProgress Pics Source code (sequence of chars) Scanner Lexical analysis • Our Enhanced-RegEx scanner can emit a stream of tokens: Parser Syntactic analysis In progress + X Z Y = Semantic analysis • … but doesn’t enforce structure IR (Intermediate Representation) code generation IR optimization Code generation Machine code optimization Output code in T

  18. Building the CompilerProgress Pics • Our Enhanced-RegEx scanner can emit a stream of tokens: + X Z Y = • … but doesn’t enforce structure An unstructured, unordered soup of tokens

  19. Regular Languages: Lack StrengthCFGs: Why we need ‘em Cannot specify source code constructs we need using RegExes • i.e. No DFAs can recognize exactly the constructs we need Cute, but weak

  20. Regular Languages: Matching ProblemCFGs: Why we need ‘em Consider language of nested parentheses: Examples: ( ) (( )) () ()

  21. Regular Languages: Matching ProblemCFGs: Why we need ‘em Consider language of nested parentheses: cannot be matched by a regular expression (it is not a regular language) • Intuition: An FSM can only handle a finite depth of parentheses that we can handle • Lets sketch the proof

  22. Nested Parens: Proof SketchCFGs: Why we need ‘em S Assume an FSM can recognize Let be the number of states in . Feed left-parens into We must have revisited some state on two input positions and . There must be a path from to a final state. But this means that it accepts some suffix of closed parens at input and , but both cannot be correct ? ? ?

  23. A Brief Reality CheckCFGs: Why we need ‘em Question 1: Given the previous, can we recognize the language C-Style comments with regex? /* … */

  24. Need More Powerful Languages ClassCFGs: Why we need ‘em Chomsky Hierarchy: Recursively enumerable Context-Sensitive Context-Free Regular

  25. Why Not Max Out Power Level?CFGs: Why we need ‘em Question: Why not use something more powerful for tokenization? Expressive power comes with a price • Less efficient matching • Fewer properties of the language

  26. Defining Languages with GrammarsCFGs: How we use ‘em • A set of (recursive) rewriting rules to rewrite sequence of symbols • Any “completed” sequence represents a string in the language

  27. Defining Languages with GrammarsCFGs: How we use ‘em • A set of (recursive) rewriting rules to rewrite sequence of symbols • Any “completed” sequence represents a string in the language CFG = (N,,P,S) where: • N: set of nonterminal symbols • : set of terminal symbols • P: set of productions • S: start nonterminal in N Rules where LHS: a single nonterminal symbol RHS: a sequence of any symbols

  28. Defining Languages with GrammarsCFGs: How we use ‘em Example: N = { A } = { (, ), } S = A P CFG = (N,,P,S) where: • N: set of nonterminal symbols • : set of terminal symbols • P: set of productions • S: start nonterminal in N

  29. Defining Languages with GrammarsCFGs: How we use ‘em Producing a string Example: N = { A } = { (, ), } S = A P Begin sequence with start symbol A Apply a production in P (a derivation step) Get a new sequence Apply another production in P Get a new sequence Apply another production in P Get a new sequence All terminals, this string is in language

  30. Simplifying Notation: ShorthandCFGs: How we use ‘em Example: N = { A } = { (, ), } S = A P Say N and Implicit: Whatever symbols appears in productions Say S Implicit: LHS of top production Collapse rules with the same LHS using bar of context-free grammar notation

  31. Simplifying Notation: ShorthandCFGs: How we use ‘em Example: N = { A } = { (, ), } S = A P Denote grammar as Say N and Implicit: Whatever symbols appears in productions A ( A ) | Or equivalently as Say S Implicit: LHS of top production A ( A ) | Collapse rules with the same LHS using bar

  32. Simplifying Notation: ShorthandCFGs: How we use ‘em • EBNF (Backus Normal Form) Denote grammar as A ::= ( A ) | A ( A ) | Or equivalently as A ( A ) |

  33. Some languages denoted in BNFCFGs: How we use ‘em A ::= ( A ) | l o l FlGo l F ::= l G ol | ro f l G ::= G o | FlGol lGo o l l o o l Frofl a a Y Y a Y Y ::= aY Z ::= w t f a aa Y … Accepts no strings (not even the empty string)

  34. Parse TreesCFGs: How we use ‘em lGol lGo o l l o o l F Represent Derivations • Nodes are symbols in a tree • Rooted at start symbol • Children are derivation step • Leaves are final string (if all nonterminals) F l G o l G o

  35. CFG use in the CompilerCFGs: How we use ‘em Compile Push Symbols

  36. CFG use in the CompilerCFGs: How we use ‘em CFG for PL Syntactic Structure Productions specify valid programs • Let set of terminals be the tokens in the language • Let the nonterminals be the groupings of language constructs • (loops, statements, functions, calls, etc) • The grammar will recognize (or reject) the stream of tokens from the Lexer Let’s see an example with this grammar Productions Prog ::= beginStmtsend Stmts ::= Stmtssemi Stmt | Stmt Stmt ::= idassignExpr Expr ::= id | Exprcrossid

  37. Parse Tree Prog Derivation Sequence Prog beginStmtsend beginStmtssemi Stmt end beginStmtsemi Stmt end beginidassignExprsemi Stmt end beginidassignExprsemi idassignExpr end beginidassignidsemi idassignExpr end beginidassignidsemi idassignExprcrossid end beginidassignidsemi idassignidcrossid end begin Stmts end Stmts semi Stmt Prod. 1 Stmt id assign Expr Prod. 2 Prod. 3 assign Expr id Expr cross id Prod. 4 id id Prod. 4 Productions Prog ::= beginStmtsend Stmts ::= Stmtssemi Stmt | Stmt Stmt ::= idassignExpr Expr ::= id | Exprcrossid Prod. 5 Prod. 6 Prod. 6

More Related