1 / 14

LANGUAGE TRANSLATORS: WEEK 14

LANGUAGE TRANSLATORS: WEEK 14. LECTURE: REGULAR EXPRESSIONS FINITE STATE MACHINES LEXICAL ANALYSERS INTRO TO GRAMMAR THEORY TUTORIAL: CAPTURING LANGUAGES USING REGULAR EXPRESSIONS. LEXICAL ANALYSIS. Is the first step in the translation/compilation process

kirti
Télécharger la présentation

LANGUAGE TRANSLATORS: WEEK 14

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LANGUAGE TRANSLATORS: WEEK 14 LECTURE: REGULAR EXPRESSIONS FINITE STATE MACHINES LEXICAL ANALYSERS INTRO TO GRAMMAR THEORY TUTORIAL: CAPTURING LANGUAGES USING REGULAR EXPRESSIONS

  2. LEXICAL ANALYSIS • Is the first step in the translation/compilation process input language ====> output language • means putting the raw characters of the input into TOKENS.

  3. LEXICAL ANALYSIS PHASE • The language of TOKENS e.g. Identifiers is always a regular language. • REGULAR EXPRESSIONS generate regular languages (as do Regular Grammars..) The tokens of languages are often specified by regular expressions. • Finite State Machines consume regular languages

  4. REGULAR EXPRESSIONS • One line method of specifying a language • equivalent to `type 3’ or regular grammars • used to parameterize UNIX/LINUX file processing commands

  5. REGULAR EXPRESSIONS - DEFINITION EXAMPLE DEFINITION a | b ‘|’ means choice a | b | c = [abc] ‘[..]’ is shorthand for multiple choice e ‘e‘ means the empty word (abc)* ‘*’ means repetition 0,1 or more .. (abcd)+ ‘+’ means repetition 1 or more times

  6. REGULAR EXPRESSIONS - EXAMPLES • [a - z A - Z][a - z A - Z 0 - 9]* defines the language of IDENTIFIERS in some programming languages • (xyz)* defines the language {e , xyz, xyzxyz, xyzxyzxyz, ..} • [abcd]+ defines the language {a, b, c, d, aa, ab, ac, ad, ba, bb, bc, bd, ca, ..} Putting choice and repetition together produces complicated regular languages

  7. Finite State Machines • Can be defined by annotated nodes and arcs. • Can translate Reg. Exps into FSMs but must add ERROR STATES onto the FSMs

  8. Regular Expression ==> NDFSM ab [ab] a* then NDFSM ==> FSM.. a b a b a

  9. Example • Specify a language of alphabet { w,x,y,z} with the only restrictions being that • 1. no strings contain both x and y, and • 2. If there is a y and w in a string, then the first w ALWAYS occurs before the first y SOLUTION: • 1. Write down exs and counter exs • 2. Decide on any ambiguities 3.. Use Case Analysis to sub-divide the problem language = (a) strings of { w,x,z} UNION (b)strings of { w,y,z} with restriction 2. - Part (a): = [w x z]+ - Part (b): can assume y is always in a string = [y z]+ | z* w [wz]* y [x y z]* -. Put together answer = [w x z]+ | [y z]+ | z* w [wz]* y [x y z]*

  10. A LEXICAL ANALYSER - GENERATOR (e.g. LEX, JLEX) - how they work • INPUT REGULAR EXPRESSIONS • TRANSLATE REGULAR EXPRESSION INTO NON-DETERMINISTIC FSM • TRANSLATE NON-DETERMINISTIC FSM INTO DETERMINISTIC FSM (which is easily described as a simple program)

  11. EXAMPLE INPUT TOA LEXICAL ANALYSER - GENERATOR %% ";" { return new Symbol(sym.SEMI); } "+" { return new Symbol(sym.PLUS); } "*" { return new Symbol(sym.TIMES); } "(" { return new Symbol(sym.LPAREN); } ")" { return new Symbol(sym.RPAREN); } [0-9]+ { return new Symbol(sym.NUMBER, new Integer(yytext())); } [ \t\r\n\f] { /* ignore white space. */ } . { System.err.println("Illegal character: "+yytext()); } example; if string (231+3)*3 was input to the generated lexical analyser the output would be: LPAREN (NUMBER,231) PLUS (NUMBER,3) RPAREN TIMES (NUMBER,3)

  12. { for (;;) switch (next_char) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': /* parse a decimal integer */ int i_val = 0; do { i_val = i_val * 10 + (next_char - '0'); advance(); } while (next_char >= '0' && next_char <= '9'); return new Symbol(sym.INT, new Integer(i_val)); case 'p': advance(); return new Symbol(sym.PRINT); case 'r': advance(); return new Symbol(sym.REPEAT); case 'u': advance(); return new Symbol(sym.UNTIL); case '=': advance(); return new Symbol(sym.ASSIGNS); case ';': advance(); return new Symbol(sym.SEMI); case '+': advance(); return new Symbol(sym.PLUS); case '-': advance(); return new Symbol(sym.MINUS); case '(': advance(); return new Symbol(sym.LPAREN); case ')': advance(); return new Symbol(sym.RPAREN); case 'x': advance(); return new Symbol(sym.ID,"x"); case 'y': advance(); return new Symbol(sym.ID,"y"); case 'z': advance(); return new Symbol(sym.ID,"z"); case -1: return new Symbol(sym.EOF); default: advance(); break; } } }; Simple Lexical Analyser public class scanner { protected static int next_char; protected static void advance() throws java.io.IOException { next_char = System.in.read(); } public static void init() throws java.io.IOException { advance(); } public static Symbol next_token() throws java.io.IOException

  13. Introduction to Grammar Theory • Grammars can be used to generate the syntax of all formal languages – the structural complexity of a language is determined by the simplest grammar that can generate it. • In order to create parsers, we are interested in “properties of grammars”. For example, the “first set” of a string w of terminals and non-terminals is the set of TERMINAL symbols (tokens) that may be at the front of ANY string derived from w using the grammar rules.

  14. Summary: • Regular expressions are a quick and easy way to specify simple forms of language. They can be easily translated into FSMs (which have nice properties e.g. they have linear time complexity in their execution) • There are tools (JLEX) which input regular expressions and output a lexical analyser which recognises the language they define.

More Related