COMP313A Programming Languages
This lecture focuses on lexical analysis, a crucial aspect of programming languages that involves recognizing and processing tokens. We explore the concept of lookahead, which allows a lexer to read ahead in the input to make decisions based on upcoming characters. Examples include the classic FORTRAN tokenization issues. We will cover finite automata and how they determine if an input matches a language using regular expressions. Significant emphasis will be placed on the transition from regular expressions to finite automata, specifically through methods like Thompson’s construction and subset construction. We will also introduce LEX (FLEX) for generating lexical programs.
COMP313A Programming Languages
E N D
Presentation Transcript
COMP313A Programming Languages Lexical Analysis (2)
Lookahead • <=, <>, < • When we read a token delimiter to establish a token we need to make sure that it is still available • It is the start of the next token! • This is lookahead • Decide what to do based on the character we ‘haven’t read’ • Sometimes implemented by reading from a buffer and then pushing the input back into the buffer • And then starting with recognizing the next token
Classic Fortran example • DO 99 I=1,10 becomes DO99I=1,10 versus DO99I=1.10 • When can the lexical analyzer assign a token? • Push back into input buffer • or ‘backtracking’
Finite Automata • A recogniser determines if an input string is a sentence in a language • Uses a regular expression • Turn the regular expression into a finite automaton • Could be deterministic or non-deterministic
Transition diagram for identifiers • RE • Identifier -> letter (letter | digit)* letter accept start letter other 0 1 2 digit
a start a b b accept 0 1 2 3 b Non-deterministic finite state automata b a start b b a accept 0 1 2 3 a b a Equivalent deterministic finite state automata
Transition Table (NFA) Input Symbol
Transition Table (DFA) Input Symbol
From a Regular Expression to an NFAThompson’s Construction (a | b)* abb e a 2 3 e e start e e a b b 0 1 6 7 8 9 10 e e 4 5 accept b e
Converting an NFA to a DFA • Subset Construction • NFA – each entry in the transition table is a set of states • In the resulting DFA each state will correspond to a set of NFA states • A DFA state keeps track of all the states the NFA can be in after reading an input symbol
Subset Construction • Work out all the states reachable directly from the start state on epsilon transitions (e-closure). Combine these into the start state for the DFA…. • We’ll do the rest on the board in the lecture
LEX (FLEX) • Tool for generating programs which recognise lexical patterns in text • Takes regular expressions and turns them into a program • You will learn the basics in a lab on Thursday