1 / 12

Lexical Analysis

Lexical Analysis. Consider the program:. #include &lt;stdio.h&gt; main() { double value = 0.95; printf(&quot;value = %f<br>&quot;, value); }. How is this translated into meaningful machine instructions? First, each separate entity must be recognised: e.g. the 5th line is processed as

thyra
Télécharger la présentation

Lexical Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lexical Analysis Consider the program: #include <stdio.h> main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine instructions? First, each separate entity must be recognised: e.g. the 5th line is processed as <key_t, double> <id_t, value> <eq_t, => <real_t, 0.95> <semi_t,;> This process is known as lexical analysis

  2. Application: Lex lex A program generator Series of regular expressions A lexical analyser Lex input file: ... definitions ... %% ... regular expression/action pairs ... %% ... user-definedfunctions ...

  3. LexRegular Expressions meta-characters (do not match themselves): ( ) [ ] { } < > + / , ^ * | . \ " $ ? - % Let c be a character, x,y, regular expressions, s a string, m,n integers and i an identifier. regular expressions: c any character except meta-characters [...] the list of chars enclosed (may be range) [­...] the list of chars not enclosed . any ASCII char except newline xy concatenation of x and y x/yx, only if followed by y (y not read) x{m,n} m to n occurrences of x ­xx, only at beginning of line x$ x, only at end of line "s" exactly what is in the quotes (except for "\" and following character) x* same as x* x+ same as x+ x? an optional x (same as x+l) x|yx or y {i} definition of i

  4. LexRegular Expressions (cont.) meta characters are obtained by preceding with "\". regular expresions are terminated by space or tab backslash, tab and newline represented by \\, \t, \n

  5. Definitions if identifier string appears in the definition section, string replaces identifier in {identifier}. L [a-zA-Z] %% {L}+; Anything enclosed between %{ ... %} in this section will be copied straight into lex.yy.c include and define statements, all variables, all function definitions, and any comments should be placed here. E.g. %{ #include <stdio.h> /* an example program */ %} is same as: %% [a-zA-Z]+;

  6. Actions A C-language statement followed by ; Example: [0-9]+ printf("Integer\n"); [a-zA-Z]+ printf("String\n"); will output "Integer" after receiving a digit string, and "String" after receiving a character string. Input: 12+19=sum; will be result in: Integer +Integer =String ; Note: a recognised regular expression is held in the string yytext. Its length is held in the integer yylen.

  7. Running Lex To run a lex program "example.l", type lex example.l cc lex.yy.c -ll a.out "-ll" calls the lex library. This library contains a "main" program, which calls yylex(). You can override this by defining your own "main".

  8. %{ /* simple word recognition */ %} L [a-zA-Z] %% [ \t]+ ;/* ignore whitespace */ is|are printf("verb: %s; ", yytext); a|the printf("determiner: %s; ", yytext); dog | cat | male | female printf("noun: %s; ", yytext); {L}+ printf("unknown: %s; ", yytext); .|\n ECHO; %% main() { yylex(); } Example Lex Program

  9. Example Session % word the dog is a male <cr> determiner: the;noun: dog; verb: is; determiner: a; noun: male; female cat dog is <cr> noun: female; noun: cat; noun: dog; verb: is; catdog is male <cr> unknown: catdog; verb: is; noun: male; <CTRL>-d %

  10. Practical 1: Lexical Analysis Aim: To write a lexical analyser in C using Lex, for the language L, defined below. identifiers: sequence of one or more letters, must be declared before use, int or real. integers: optional sign, one or more digits reals: optional sign, one or more digits, decimal point, one or more digits expressions: bracketed expressions using +, -, * , / and :=. comments: start with !, to end of line print statements: either printi or printr, for printing integers and reals, one argument.

  11. Example L Program ! example L program real a; real baboon; int x y; ! end of declarations x := 300; printi(x); y := 7 - x; a := -0.12 + +12.34 + 12 / 3 * 5 - 5; baboon := a * y; printi(5); printr(baboon);

  12. Required Structure Output should be in the form of <token, attribute> pairs. Every element of the program should be classified. Thus, output for the 9th line should be: <ID_T,y> <BECOMES_T,:=> <INT_T,7> <MINUS_T,-> <ID_T,x>, <SEMI_T,;> Numbers should be converted from strings to the appropriate form. The input must be described by regular expressions. You must use Lex. A "tokens.h" file will be supplied, defining all the different tokens to be used. You should output the token names and not the associated numbers.

More Related