COMP313A Programming Languages

COMP313A Programming Languages Lexical Analysis (2)

Lookahead • <=, <>, < • When we read a token delimiter to establish a token we need to make sure that it is still available • It is the start of the next token! • This is lookahead • Decide what to do based on the character we ‘haven’t read’ • Sometimes implemented by reading from a buffer and then pushing the input back into the buffer • And then starting with recognizing the next token

Classic Fortran example • DO 99 I=1,10 becomes DO99I=1,10 versus DO99I=1.10 • When can the lexical analyzer assign a token? • Push back into input buffer • or ‘backtracking’

Finite Automata • A recogniser determines if an input string is a sentence in a language • Uses a regular expression • Turn the regular expression into a finite automaton • Could be deterministic or non-deterministic

Transition diagram for identifiers • RE • Identifier -> letter (letter | digit)* letter accept start letter other 0 1 2 digit

a start a b b accept 0 1 2 3 b Non-deterministic finite state automata b a start b b a accept 0 1 2 3 a b a Equivalent deterministic finite state automata

Transition Table (NFA) Input Symbol

Transition Table (DFA) Input Symbol

From a Regular Expression to an NFAThompson’s Construction (a | b)* abb e a 2 3 e e start e e a b b 0 1 6 7 8 9 10 e e 4 5 accept b e

Converting an NFA to a DFA • Subset Construction • NFA – each entry in the transition table is a set of states • In the resulting DFA each state will correspond to a set of NFA states • A DFA state keeps track of all the states the NFA can be in after reading an input symbol

Subset Construction • Work out all the states reachable directly from the start state on epsilon transitions (e-closure). Combine these into the start state for the DFA…. • We’ll do the rest on the board in the lecture

LEX (FLEX) • Tool for generating programs which recognise lexical patterns in text • Takes regular expressions and turns them into a program • You will learn the basics in a lab on Thursday

COMP313A Programming Languages