1 / 20

CPSC 388 – Compiler Design and Construction

CPSC 388 – Compiler Design and Construction. Scanners – Regular Expressions. Announcements. Last day to Add/Drop Sept 11 Wipe Down Computer Keyboards and Mice ACM programming contest Read Chapter 3 Homework 2 due this Friday PROG1 due this Friday

komala
Télécharger la présentation

CPSC 388 – Compiler Design and Construction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPSC 388 – Compiler Design and Construction Scanners – Regular Expressions

  2. Announcements • Last day to Add/Drop Sept 11 • Wipe Down Computer Keyboards and Mice • ACM programming contest • Read Chapter 3 • Homework 2 due this Friday • PROG1 due this Friday • FSA for Java string constants (Anybody figure it out?)

  3. Homework 1 returned • Summary – In addition to your summary please answer the following questions in your report: • Context – What is the context of the research presented in the paper? What new ideas or concepts do you think were presented in the paper? • Evaluation – How was the research evaluated? What evaluation techniques would you like to see to compare the research to the state of the art before the paper? • Significance – What is the significance of the research? Do you feel the work is a minor improvement or a major step in the field? • Grammar / spelling / clarity / support for statements made

  4. Scanner Generator Scanner Generator .Jlex file Containing Regular Expressions .java file Containing Scanner code To understand Regular Expressions you need to understand Finite-State Automata

  5. FSA Formal Definition (5-tuple) Q – a finite set of states Σ – The alphabet of the automata (finite set of characters to label edges) δ – state transition functionδ(statei,character)  statej q – The start state F – The set of final states

  6. Types of FSA • Deterministic (DFA) • No State has more than one outgoing edge with the same label • Non-Deterministic (NFA) • States may have more than one outgoing edge with same label. • Edges may be labeled with ε, the empty string. The FSA can take an epsilon transition without looking at the current input character.

  7. Terms to Know • Alphabet (Σ) – any finite set of symbols e.g. binary, ASCII, Unicode • String – finite sequence of symbols e.g. 010001, banana, bãër • Language – any countable set of strings e.g. Empty set Well-formed C programs English words

  8. Regular Expressions • Easy way to express a language that is accepted by FSA • Rules: • ε is a regular expression • Any symbol in Σ is a regular expression If r and s are any regular expressions then so is: • r|s denotes union e.g. “r or s” • rs denotes r followed by s (concatination) • (r)* denotes concatination of r with itself zero or more times (Kleene closer) • () used for controlling order of operations

  9. Example Regular Expressions

  10. Precedence in Regular Expressions • * has highest precedence, left associative • Concatenation has second highest precedence, left associative • | has lowest associative, left associative

  11. More Regular Expression Examples

  12. You Try • What is the language described by each Regular Expression? a* (a|b)* a|a*b (a|b)(a|b) aa|ab|ba|bb (+|-|ε)(0|1|2|3|4|5|6|7|8|9)*

  13. Regular Definitions If Σ is an alphabet of basic symbols, then a regular definition is a sequence of definitions of the form: D1→ R1 D2 → R2 … Dn→ Rn • Each di is a new symbol not in Σ andnot the same as any other of the d’s. • Each ri is a regular expression over ΣU(d1,d2,…,di-1)

  14. Regular Definitions Example Example C identifiers: Σ = ASCII letter_→ a|b|c|…|z|A|B|C|…|Z|_ digit→ 0|1|2|…|9 id→letter_(letter_|digit)*

  15. Regular Definitions Example Example Unsigned Numbers (integer or float): Σ = ASCII digit→ 0|1|2|…|9 digits→digit digit* optionalFraction→ . digits | ε optionalExponent → (E(+|-| ε)digits)| ε number→digits optionalFraction optionalExponent

  16. Special Characters in Reg. Exp. What does each of the following mean? * | () [] + ? “” . \ ^ $ – Kleene Closure – or – grouping – creates a character class – Positive Closure – zero or one instance – anything in quotes means itself, e.g. “*” – matches any single character (except newline) – used for escape characters (newline, tab, etc.) – matches beginning of a line – matches the end of a line

  17. Extensions to Regular Expressions • + means one or more occurrence (positive closure) • ? means zero or one occurrence • Character classes • a|r|t can be written [art] • a|b|…|z can be written [a-z] As long as there is a clear ordering to characters • [^a-z] matches any character except a-z

  18. Example Using Character Classes ^[^aeiou]*$ Matches any complete line that does not contain a lowercase vowel How do you tell which meaning of ^ is intended?

  19. Try It • Create Character Classes for: • First ten letters (up to “j”) • Lowercase consonants • Digits in hexadecimal • Create Regular Expressions for: • Case Insensitive keyword such as SELECT (or Select or SeLeCt) in SQL • Java string constants • Any string of whitespace characters

  20. Creating a Scanner • Create a set of regular expressions, one for each token to be recognized • Convert regular expressions into one combined DFA • Run DFA over input character stream • Longest matching regular expression is selected • If a tie then use first matching regular expression • Attach code to run when a regular expression matches

More Related