Programming Language Syntax: Regex vs CFG
Learn about the differences between regular expressions (regex) and context-free grammar (CFG) in programming language syntax. Understand how to specify token patterns and patterns of tokens using CFG, and how to build unambiguous expressions with associativity and precedence.
Programming Language Syntax: Regex vs CFG
E N D
Presentation Transcript
Programming Language Syntax 2 http://flic.kr/p/zCyMp
Think-Pair-Share Activity Assuming the following INTEGER regex: Try to build a regex that matches arithmetic expressions, such as: • 55 • 2 + 5 • 4 * 5 / -3 • (9 * 4)/(2 - +4) • ((4 + 7) * 10) * (69 + 7) / (44 - (22 + 66) * +5)
Here’s one way • Except this isn’t a regex • Regexes cannot have recursive constructs • It’s actually a context-free grammar (CFG) • Like regexes with recursion • Expressed (more or less) in Backus-Naur Form (BNF)
Recall from last time… ANTLR generates for you … But how do you tell ANTLR what your language is like?
You specify token patterns usingRegular Expressions … You specify patterns of tokens using aContext-Free Grammar
Important distinctions betweenregexes and CFG rules in ANTLR • Naming: • Regex names start with uppercase letter • CFG-rule names start with lowercase letter • Character versus token handling: • Regexes process stream of characters • CFG rules process stream of tokens
Will this regex match these strings? “4 4 4 4”? “4444”? No Yes
Will this CFG production match these strings? “4 4 4 4”? “4444”? Yes Yes
Backus-Naur Form (BNF) ANTLR BNF BNF BNF
Extended BNF (EBNF) ANTLR BNF Things not in BNF EBNF
CFG Terminology terminals non-terminal productions
CFG Derivation Series of replacement operations that shows how to derive a string of terminals from the start symbol
Derivation Example CFG: String to derive:
CFG: Derivation: String to derive:
Parse Tree: Graphical Representation of Derivation Can you think of another possible derivation? Hint: This one is a “right-most” derivation
Here’s a “left-most” derviation A grammar with multiple possible derivations is ambiguous Makes generating parser more difficult
Two concepts important to expressions • Associativity: Group based on L-to-R order • 10 - 4 - 3 means (10 - 4) - 3 versus 10 - (4 - 3) • Precedence: Group based on operator • 3 + 4 * 5 means 3 + (4 * 5) versus (3 + 4) * 5
Think-Pair-Share Activity • Rewrite this CFG to be unambiguous • Left associative • Multiplication/division have higher precedence than addition/subtraction
Solution • Create parse tree for: • 3 + 4 * 5 • 10 - 4 - 3
What’s next? • Homework 1 assigned!