Lecture 3 Syntactic Definition

Lecture 3Syntactic Definition KU | Fall 2018 | Drew Davidson

Announcements • Entry Surveys (E1) processed • Lots of really useful info – thanks! • Office hours now set on course website • Lecture slides now posted on website • P1 is now released • L2 video / assignment is up Live Assignments L2 P1 H1

Live Assignments P1 H2 Last Time: Implementing TokenizersReview Lecture 2 – Implementing Scanners RegEx -free NFA DFA -NFA Tokenizer Thompson’s Construction Algorithm Transition Action Table -elimination Rabin-Scott Powerset Construction

Live Assignments P1 H2 From FSMs to Tokenizers…Where we left off last time… • Give our FSMs the ability to put chars back Amount to rewind Token to return , A S3 (letter, digit) letter S2 S • Add an EOF (end of file) alphabet symbol letter, digit

A Simple TokenizerDFA -> Tokenizer • Consider a language with 2 statement types • Assignment: ID = expr • Increment: ID += expr • Where expr is of the form • ID + ID • ID < ID • ID <= ID • Identifiers follow C conventions

Tokenization Implemented!Lecture 2 – Implementing Scanners

COMPILER Code Generation Execution Runtime Environment Optimization Intermediate Representation Parsing SDT Semantics Lexical Analysis Syntactic Definiton

Live Assignments P1 H2 This TimePreview Lecture 3 – Defining Syntax How Language Syntax is Formally Defined • Check in on our compiler • Quick review of Context-Free Grammars • Why we need ‘em • How we use ‘em Syntactic Definition

Building the CompilerProgress Pics Source code (sequence of chars) Scanner Lexical analysis • Our Enhanced-RegEx scanner can emit a stream of tokens: Parser Syntactic analysis In progress + X Z Y = Semantic analysis • … but doesn’t enforce structure IR (Intermediate Representation) code generation IR optimization Code generation Machine code optimization Output code in T

Building the CompilerProgress Pics • Our Enhanced-RegEx scanner can emit a stream of tokens: + X Z Y = • … but doesn’t enforce structure An unstructured, unordered soup of tokens

Regular Languages: Lack StrengthCFGs: Why we need ‘em Cannot specify source code constructs we need using RegExes • i.e. No DFAs can recognize exactly the constructs we need Cute, but weak

Regular Languages: Matching ProblemCFGs: Why we need ‘em Consider language of nested parentheses: Examples: ( ) (( )) () ()

Regular Languages: Matching ProblemCFGs: Why we need ‘em Consider language of nested parentheses: cannot be matched by a regular expression (it is not a regular language) • Intuition: An FSM can only handle a finite depth of parentheses that we can handle • Lets sketch the proof

Nested Parens: Proof SketchCFGs: Why we need ‘em S Assume an FSM can recognize Let be the number of states in . Feed left-parens into We must have revisited some state on two input positions and . There must be a path from to a final state. But this means that it accepts some suffix of closed parens at input and , but both cannot be correct ? ? ?

A Brief Reality CheckCFGs: Why we need ‘em Question 1: Given the previous, can we recognize the language C-Style comments with regex? /* … */

Need More Powerful Languages ClassCFGs: Why we need ‘em Chomsky Hierarchy: Recursively enumerable Context-Sensitive Context-Free Regular

Why Not Max Out Power Level?CFGs: Why we need ‘em Question: Why not use something more powerful for tokenization? Expressive power comes with a price • Less efficient matching • Fewer properties of the language

Defining Languages with GrammarsCFGs: How we use ‘em • A set of (recursive) rewriting rules to rewrite sequence of symbols • Any “completed” sequence represents a string in the language

Defining Languages with GrammarsCFGs: How we use ‘em • A set of (recursive) rewriting rules to rewrite sequence of symbols • Any “completed” sequence represents a string in the language CFG = (N,,P,S) where: • N: set of nonterminal symbols • : set of terminal symbols • P: set of productions • S: start nonterminal in N Rules where LHS: a single nonterminal symbol RHS: a sequence of any symbols

Defining Languages with GrammarsCFGs: How we use ‘em Example: N = { A } = { (, ), } S = A P CFG = (N,,P,S) where: • N: set of nonterminal symbols • : set of terminal symbols • P: set of productions • S: start nonterminal in N

Defining Languages with GrammarsCFGs: How we use ‘em Producing a string Example: N = { A } = { (, ), } S = A P Begin sequence with start symbol A Apply a production in P (a derivation step) Get a new sequence Apply another production in P Get a new sequence Apply another production in P Get a new sequence All terminals, this string is in language

Simplifying Notation: ShorthandCFGs: How we use ‘em Example: N = { A } = { (, ), } S = A P Say N and Implicit: Whatever symbols appears in productions Say S Implicit: LHS of top production Collapse rules with the same LHS using bar of context-free grammar notation

Simplifying Notation: ShorthandCFGs: How we use ‘em Example: N = { A } = { (, ), } S = A P Denote grammar as Say N and Implicit: Whatever symbols appears in productions A ( A ) | Or equivalently as Say S Implicit: LHS of top production A ( A ) | Collapse rules with the same LHS using bar

Simplifying Notation: ShorthandCFGs: How we use ‘em • EBNF (Backus Normal Form) Denote grammar as A ::= ( A ) | A ( A ) | Or equivalently as A ( A ) |

Some languages denoted in BNFCFGs: How we use ‘em A ::= ( A ) | l o l FlGo l F ::= l G ol | ro f l G ::= G o | FlGol lGo o l l o o l Frofl a a Y Y a Y Y ::= aY Z ::= w t f a aa Y … Accepts no strings (not even the empty string)

Parse TreesCFGs: How we use ‘em lGol lGo o l l o o l F Represent Derivations • Nodes are symbols in a tree • Rooted at start symbol • Children are derivation step • Leaves are final string (if all nonterminals) F l G o l G o

CFG use in the CompilerCFGs: How we use ‘em Compile Push Symbols

CFG use in the CompilerCFGs: How we use ‘em CFG for PL Syntactic Structure Productions specify valid programs • Let set of terminals be the tokens in the language • Let the nonterminals be the groupings of language constructs • (loops, statements, functions, calls, etc) • The grammar will recognize (or reject) the stream of tokens from the Lexer Let’s see an example with this grammar Productions Prog ::= beginStmtsend Stmts ::= Stmtssemi Stmt | Stmt Stmt ::= idassignExpr Expr ::= id | Exprcrossid

Parse Tree Prog Derivation Sequence Prog beginStmtsend beginStmtssemi Stmt end beginStmtsemi Stmt end beginidassignExprsemi Stmt end beginidassignExprsemi idassignExpr end beginidassignidsemi idassignExpr end beginidassignidsemi idassignExprcrossid end beginidassignidsemi idassignidcrossid end begin Stmts end Stmts semi Stmt Prod. 1 Stmt id assign Expr Prod. 2 Prod. 3 assign Expr id Expr cross id Prod. 4 id id Prod. 4 Productions Prog ::= beginStmtsend Stmts ::= Stmtssemi Stmt | Stmt Stmt ::= idassignExpr Expr ::= id | Exprcrossid Prod. 5 Prod. 6 Prod. 6

Lecture 3 Syntactic Definition

Lecture 3 Syntactic Definition

Presentation Transcript

Syntactic Analysis

Binding (syntactic)

Syntactic relatedness

Syntactic Structures

G Cloud 3 Definition

Syntactic Note

Syntactic Parsing

Syntactic Devices

Syntactic categories

Syntactic Patterning

Unit 3: DEFINITION

Syntactic Attributes

Syntactic Processes

Syntactic Analysis

CHAPTER 3 - INPUT DEFINITION

LIRICS WP3: Morpho-syntactic and syntactic annotations

Syntactic Structures

Lecture on Data Definition Language

Syntactic Features

Lecture 5: Syntactic Analysis