ICS312

ICS312 Converting NFAs to DFAs How Lex is constructed

Converting a nfa to a dfa Defn: The e-closure of a state is the set of all states, including S itself, that you can get to via e-transitions. The e-closure of state S is denoted:

Converting a nfa to a dfa (Cont.) Example: The e-closure of state 1 = { 1, 2, 4 } The e-closure of state 3 = { 3, 2, 4 } Defn: The e-closure of a set of states S1, ... Sn is S1È S2È ... È Sn. Example: The e-closure for above states 1 and 3 is { 1, 2, 4 } È { 3, 2, 4 } = { 1, 2, 3, 4 }

To construct a dfa from a nfa Step 1: Let the start state of the dfa be formed from the e-closure of the start state of the nfa. Subsequent steps: If S is any state that you have previously constructed for the dfa and it is formed from say states t1, ... , tr of the nfa, then for any symbol x for which at least one of the states t1, ... , tr has a x-successor, the x-successor of S is the e-closure of the x-successors of t1, ... , tr. Any state of the dfa which is formed from an accepting state, among others, of the nfa becomes an accepting state.

To construct a dfa from a nfa (Cont.1) Example 1: To convert the following nfa: b 5 we get: This constructs a dfa that has no epsilon-transitions and a single accepting state.

To construct a dfa from a nfa (Cont.2) Example 2: To convert the nfa for an identifier to a dfa

To construct a dfa from a nfa (Cont.3) we get:

Minimizing the Number of States in a DFA Step 1: Start with two sets of states (a) all the accepting states, and (b) all the non-accepting states Subsequent steps: Given the sets of states S1, ... Sr consider each set S and each symbol x in turn. If any member of S has a x-successor and this x-successor is in say S', then unless all the members of S have x-successors that are in S', split up S into those members whose x-successors are in S' and the others (which don't have x-successors in S').

Minimizing the Number of States in a DFA(Cont.1) Example 1. Consider the dfa we constructed for an identifier (with renumbered states):

Minimizing the Number of States in a DFA(Cont.2) The sets of states for this dfa are: S1S2 Nonaccepting states Accepting states 1 2 3 4 All states in S2 have the successors letter-successor and digit-successor, and the successor states are all in the set of states S2. Combine all the states of S2 to get:

Minimizing the Number of States in a DFA(Cont.3) Example 2. Consider the dfa: All of the states (1, 2, and 3) are accepting states and all their successors are also accepting states, but state 1 has an a-successor whereas states 2 and 3 do not.

Minimizing the Number of States in a DFA(Cont.4) So, we split the set of accepting states into two sets S1 and S2 where: S1 consists of state 1, and S2 consists of states 2, 3 to get:

HOW LEX WORKS Using the methods described above, Lex constructs a mimimized finite automata for each regular expression in the definition file. Lex generates a C program, which we will refer to as lex.yy.c The finite automatas are represented in lex.yy.c by a set of arrays.

+ 7 4 For instance, a portion of a finite automata such as: . can be represented by entering. in the associated array, a 7 in the column for “+” at row 4.

lex.yy.c keeps track of the latest accepting state it has reached in any of the finite automatas, plus the number of source characters it has read at that point. When it reaches a stage that no transition exists for the next source symbol from any of the states it has reached in any of the finite automatas, it picks the regular expression corresponding to the finite automata in which this last accepting state occurs, and it pushes back onto the remaining input any source characters read after reaching that state. t

Consider, for example, a Lex defn. file containing: {digit}+(”.” {digit}+)? {…return Number;} {digit}+(”.” {digit}+)?e{digit}+ {…return Float;} Finite automata corresponding to the above re’s are: . digit digit digit 1 1 2 3 4 dfa for Number digit digit . e digit digit dfa for Float digit digit e 1 1 2 3 4 4 4 5 6 digit digit digit digit

Example: let the remaining input be 36e8=X1… On reading the “3”, lex.yy.c records that the latest accepting state encountered is state 2 in the dfa for Number, and the no. of source characters read is 1. (It has also reached state 2 in the dfa for Float). On reading the “6”, lex.yy.c records the above again, except that the no. of characters read is 2. On reading the “8”, lex.yy.c records that the latest accepting state is state 6 in the dfa for Float, and no. of characters read is 4. On reading the “=”, lex.yy.c finds that state 6 has no “=“ successor. This is the 5th character read. So the last accepting state (state 6) is in the dfa for Float after 4 characters had been read. Hence Float is taken as matching the remaining input, and the 5th character read, i.e the “=“, is pushed back onto the remaining input.

ICS312

ICS312

Presentation Transcript

ICS312 Set 29

ICS312 Set 3

ICS312 Set 5

ICS312

ICS312 Set 11

ICS312 SET 7

ICS312 Set 6

ICS312 SET 8

ICS312 Set 4

ICS312 Set 12

ICS312 Set 9

ICS312 Set 1

ICS312 Set 2

ICS312 Lecture13

ICS312 Set 15

ICS312 set 30