1 / 98

Fall 2014-2015 Compiler Principles Lecture 1: Lexical Analysis

Fall 2014-2015 Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University. Agenda. Understand role of lexical analysis in a compiler Lexical analysis theory Implementing professional scanner via scanner generator. Javascript example. var currOption = 0;

ruby-small
Télécharger la présentation

Fall 2014-2015 Compiler Principles Lecture 1: Lexical Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fall 2014-2015 Compiler PrinciplesLecture 1: Lexical Analysis Roman Manevich Ben-Gurion University

  2. Agenda • Understand role of lexical analysis in a compiler • Lexical analysis theory • Implementing professional scanner via scanner generator

  3. Javascript example var currOption = 0; // Choose content to display in lower pane. function choose ( id ) { var menu = ["about-me", "publications", "teaching", "software", "activities"]; for (i = 0; i < menu.length; i++) { currOption = menu[i]; var elt = document.getElementById(currOption); if (currOption == id && elt.style.display == "none") { elt.style.display = "block"; } else { elt.style.display = "none"; } } } Can you some identify basic units in this code?

  4. Javascript example keyword ? ? ? ? ? var currOption = 0; // Choose content to display in lower pane. function choose ( id ) { var menu = ["about-me", "publications", "teaching", "software", "activities"]; for (i = 0; i < menu.length; i++) { currOption = menu[i]; var elt = document.getElementById(currOption); if (currOption == id && elt.style.display == "none") { elt.style.display = "block"; } else { elt.style.display = "none"; } } } ? ? Can you some identify basic units in this code?

  5. Javascript example keyword identifier operator numeric literal punctuation comment var currOption = 0; // Choose content to display in lower pane. function choose ( id ) { var menu = ["about-me", "publications", "teaching", "software", "activities"]; for (i = 0; i < menu.length; i++) { currOption = menu[i]; var elt = document.getElementById(currOption); if (currOption == id && elt.style.display == "none") { elt.style.display = "block"; } else { elt.style.display = "none"; } } } string literal whitespace Can you some identify basic units in this code?

  6. Role of lexical analysis High-levelLanguage(scheme) LexicalAnalysis Syntax Analysis Parsing AST SymbolTableetc. Inter.Rep.(IR) CodeGeneration Executable Code • First part of compiler front-end • Convert stream of characters into stream of tokens • Split text into most basic meaningful strings • Simplify input for syntax analysis

  7. + num * num x From scanning to parsing 59 + (1257 * xPosition) program text Lexical Analyzer Lexicalerror valid token stream Grammar:E id E num E E+EE  E*EE  ( E ) Parser valid syntaxerror Abstract Syntax Tree

  8. Scanner output var currOption = 0; // Choose content to display in lower pane. function choose ( id ) { var menu = ["about-me", "publications“, "teaching", "software", "activities"]; for (i = 0; i < menu.length; i++) { currOption = menu[i]; var elt = document.getElementById(currOption); if (currOption == id && elt.style.display == "none") { elt.style.display = "block"; } else { elt.style.display = "none"; } } } Stream of TokensLINE: ID(value) 1: VAR1: ID(currOption)1: EQ1: INT_LITERAL(0)1: SEMI3: FUNCTION3: ID(choose)3: LP3: ID(id)3: EP3: LCB...

  9. Tokens

  10. What is a token? • Lexeme – substring of original text constituting an identifiable unit • Identifiers, Values, reserved words, … • Record type storing: • Kind • Value (when applicable) • Start-position/end-position • Any information that is useful for the parser • Different for different languages

  11. Example tokens

  12. C++ example 1 vector<vector<int>> myVector >>operator >, >two tokens or ? Splitting text into tokens can be tricky How should the code below be split?

  13. C++ example 2 vector<vector<int> > myVector >, >two tokens Splitting text into tokens can be tricky How should the code below be split?

  14. Separating tokens • Lexemes are recognized but get consumed rather than transmitted to parser • ifi fi/*comment*/f

  15. Preprocessor directives in C

  16. First step of designing a scanner ? • Define each type of lexeme • Reserved words: var, if, for, while • Operators: < = ++ • Identifiers: myFunction • Literals: 123 “hello” • Annotations: @SuppressWarnings • How can we define lexemes of unbounded length

  17. First step of designing a scanner ? • Define each type of lexeme • Reserved words: var, if, for, while • Operators: < = ++ • Identifiers: myFunction • Literals: 123 “hello” • Annotations: @SuppressWarnings • How can we define lexemes of unbounded length • Regular expressions

  18. Agenda • Understand role of lexical analysis in a compiler • Convert text to stream of tokens • Lexical analysis theory • Implementing professional scanner via scanner generator

  19. Regular expressions

  20. Regular languages refresher • Formal languages • Alphabet = finite set of letters • Word = sequence of letter • Language = set of words • Regular languages defined equivalently by • Regular expressions • Finite-state automata

  21. Regular expressions • Empty string:Є • Letter: a • Concatenation: R1 R2 • Union: R1 | R2 • Kleene-star: R* • Shorthand: R+ stands for R R* • scope: (R) • Example: (0* 1*) | (1* 0*) • What is this language?

  22. Exercise 1 - Question • Language of Java identifiers • Identifiers start with either an underscore ‘_’or a letter • Continue with either underscore, letter, or digit

  23. Exercise 1 - Answer • Language of Java identifiers • Identifiers start with either an underscore ‘_’or a letter • Continue with either underscore, letter, or digit • (_|a|b|…|z|A|…|Z)(_|a|b|…|z|A|…|Z|0|…|9)*

  24. Exercise 1 – Better answer • Language of Java identifiers • Identifiers start with either an underscore ‘_’or a letter • Continue with either underscore, letter, or digit • (_|a|b|…|z|A|…|Z)(_|a|b|…|z|A|…|Z|0|…|9)* • Using shorthand macrosFirst = _|a|b|…|z|A|…|ZNext = First|0|…|9R = First Next*

  25. Exercise 2 - Question • Language of rational numbers in decimal representation (no leading, ending zeros) • Positive examples: • 0 • 123.757 • .933333 • 0.7 • Negative examples: • 007 • 0.30

  26. Exercise 2 - Answer • Language of rational numbers in decimal representation (no leading, ending zeros) • Digit = 1|2|…|9Digit0 = 0|DigitNum = Digit Digit0*Frac = Digit0* Digit Pos = Num | .Frac | 0.Frac| Num.FracPosOrNeg = (Є|-)PosR = 0 | PosOrNeg

  27. Exercise 3 - Question Equal number of opening and closing parenthesis: [n]n = [], [[]], [[[]]], …

  28. Exercise 3 - Answer Equal number of opening and closing parenthesis: [n]n = [], [[]], [[[]]], … Not regular Context-free Grammar:S ::= [] | [S]

  29. Finite automata

  30. Finite automata transition acceptingstate b c a start b startstate An automaton is defined by states and transitions

  31. Automaton running example b c a start b Words are read left-to-right

  32. Automaton running example b c a start b Words are read left-to-right

  33. Automaton running example b c a start b Words are read left-to-right

  34. Automaton running example wordaccepted b c a start b Words are read left-to-right

  35. Word outside of language b c a start b

  36. Word outside of language b c a start b Missing transition means non-acceptance

  37. Word outside of language b c a start b

  38. Word outside of language b c a start b

  39. Word outside of language b c a start b Final state is not an accepting state

  40. Exercise - Question b c a start b What is the language defined by the automaton below?

  41. Exercise - Answer b c a start b • What is the language defined by the automaton below? • a b* c • Generally: all paths leading to accepting states

  42. A little about me • Joined Ben-Gurion University two years ago • Research interests • Advanced compilation and synthesis techniques • Language-supported parallelism • Static analysis and verification

  43. I am here for • Teaching you theory and practice of popular compiler algorithms • Hopefully make you think about solving problemsby examples from the compilers world • Answering questions about material • Contacting me • e-mail: romanm@cs.bgu.ac.il • Office hours: see course web-page • Announcements • Forums (per assignment)

  44. Tentative syllabus mid-term exam

  45. Nondeterministic Finite automata

  46. Non-deterministic automata b c a start c a b Allow multiple transitions from given state labeled by same letter

  47. NFA run example b c a start c a b

  48. NFA run example b c a start c a b Maintain set of states

  49. NFA run example b c a start c a b

  50. NFA run example b c a start c a b Accept word if any of the states in the set is accepting

More Related