Regular Languages and Expressions

RegularLanguages and Expressions Surinder Kumar Jain, University of Sydney

Regular Languages & Expressions • Automaton • DFA • NFA • Ε-NFA • CFG as a DFA • Equivalence • Minimal DFA • Expressions • Definition • Conversion from/to Automaton • Regular Langauges • Pumping Lemma – proving regularness • Closures • Equivalence

Deterministic Finite Automaton • A system with many states • Can transition from one state to another • Usually caused by external input • Set of states is finite • System is in one state at any given time

DFA • Mathematical Definition of a DFA • A = (Q, Σ,δ, q0,F) • Q : States, DFA is in one of these finite states at any time. • Σ : Input symbols, DFA changes its state from one state to another state on consuming an input symbol. • δ : Transition function. • Given a state and an input symbols, gives the next DFA state • Function over QxΣ -> Q. • q0 : Initial DFA state • F : Accepting states. Once DFA reaches one of these states, it may not accept any more input symbols.

DFA Example Q = { waiting, pending, rejected, approved, paid } Σ = {receive, reject, accept, pay } δ : (waiting -> receive -> pending), (pending -> reject -> rejected), (pending -> accept -> accepted), (accepted -> pay -> paid) q0 : {waiting} F : { rejected, paid }

Transition Diagrams start receive accept Accepted pay Waiting Pending Paid Paid reject Paid Rejected Q = { waiting, pending, rejected, approved, paid } Σ = {receive, reject, accept, pay } δ : (waiting -> receive -> pending), (pending -> reject -> rejected), (pending -> accept -> accepted), (accepted -> pay -> paid) q0 : {waiting} F : { rejected, paid }

Language • Set of alphabets • Concatenation (joining) • Strings • A subset of strings is a language • A DFA defines a language • Alphabet set is the set of input symbols • Concatenation - one symbol follows another • Acceptance – sequence of symbols takes DFA from start state to one of the accepting states

Non-deterministic Finite Automaton (DFA) • Five-tuple like a DFA, (Q, Σ,δ, q0,F) • Transition function returns a set not one state • Several outgoing arcs with same symbol • In several states at the same time • Language of NFA

Equivalence of DFA & NFA • Any NFA language can be described by some DFA • Adding non-determinism does not give any thing more • Why use NFAs then : • Easier to make for some languages • May have fewer states and less complex • Algorithm to convert NFA to DFA • For n state NFA,DFA may have up to 2n states • Can throw away inaccessible states • Observation : DFA has practically the same number of states as NFA though it often has more transitions

NFA to DFA conversion • For an NFA, N = {Q, Σ, δ, q0, F}, • Construct the DFA, D = {Qd, Σ, δd, {q0}, Fd} • Qd = Powerset of Q • δd(S, a) = Up in S δ(p,a) for every S in Qd. • Fd = S : S is subset of Q and S has an accepting state of NFA • DFA operates on one state at a time, NFA operates on sets of states. • Given a state, NFA gives a set of new states • Make all possible sets of DFA states as NFA states • Transit from one set of states to a new set of all possible state set • Any set with an accepting state is the accepting state in NFA

NFA to DFA conversion complexity • O(2n) (number of subsets of a set) • Efficient algorithm • Do not construct the entire power set • Start with start state • Only construct subsets that can reach an accepting state from the start state • The number of states in DFA is much less than 2n. • DFA has practically the same number of states as NFA though it often has more transitions

εpsilon - NFA • Includes ε (the empty string, not in alphabet set) as a transition • ε is identity in concatenation • a.ε = ε.a = a for all a • Spontaneous transition without an input

Equivalence to NFA • An ε-NFA language can be described by some NFA • Every NFA can be described by some DFA • Adding ε transition does not give any thing more • Why use ε-NFAs then : • Easier to make for some languages • Useful in proving equivalence of languages

Conversion to NFA • Conversion aims to remove ε transitions • Define a new set of states • ε are contained inside the set • No ε arc leaves or enters the new set of states • Epsilon closure (eclose) • For a state, set of all states reachable spontaneously • Follow the ε arcs recursively and include reachable states in the epsilon closure

epsilon-NFA to DFA conversion • For an ε-NFA, N = {Q, Σ, δ, q0, F}, • Construct the DFA, D = {Qd, Σ, δd, {eclose(q0)}, Fd} • Qd = { eclose(q) | q = eclose(q) and q in Q } • δd(S, a) = Up in S δ(p,eclose(a)) for every S in Qd. • Fd = S : S is subset of Q and S has an accepting state of NFA • DFA operates on one state at a time, ε-NFA operates on sets of states with no ε transition leaving the set • Make all eclose sets as DFA states • Transit from one set of states to a new set of all eclose state set • Any set with an accepting state is the accepting state in NFA

Programs as Automatan • An imperative program can be represented as a Control Flow Graph (CFG) with • statements at nodes and • predicates at edges • It can be converted into a CFG with • both statements and predicates at edges • by pushing node statements up incoming edges • Such a CFG is a DFA • Program points are States • Statements are input symbols that change program state from program point to point

Regular Expression • Algebraic expression to denote languages • Composed of symbols “ε”, “Ø”, “+”, “*”, “.”, “(“, “)” and alphabets • The language is generated using rules : • L(ε) = empty set • L(Ø) = empty set • L(a) = a for all alphabets a • L(p+q) = L(p) U L(q) • L(p.q) = { p’.q’ | p’ in L(p) & q’ in L(q) } • L(p*) = { qn | q in L(p) and n >= 0 }, q0= ε, qk=q.qk-1

Regular Expression Example a+b.c The language generated is : { a, b.c } a.b.c*.d the language generated is : { a.b.d, a.b.c.d, a.b.c.c.d, a.b.c.c.c.d, … } A finite way to express an infinite language

Equality of Languages DEFINITION • Two regular expression (or automaton) • are EQUAL • if they both generate same languages Thus (a.b)* + (b.a)* + a.(b.a)* + b.(b.a)* = (ε + b).(a.b)*.(ε+a)

Algebraic laws of regular expressions • p + q = q + p • (p + q) + r = p + (q + r) • (p.q).r = p.(q.r) • Ø + p = p + Ø = p • ε.p = p.ε = p • Ø.p = p.Ø = Ø • p.(q=r) = p.q + p.r • (p + q).r = p.r + q.r • p + p = p • (p*)* = p* • Ø* = ε • ε* = ε • p.p* = p*.p • (p + q)* = (p*.q*)*

Finite Automaton and Regular Expressions • Every language • defined by a finite automaton is also defined by some regular expression • defined by a regular expression is also defined by some DFA

DFA to Regular expression • Hopcroft’s formula • Rij(k) = Rij(k-1)+Rik(k-1).(Rkk(k-1))*.Rkj(k-1) • Rij(n) is the regular expression of all paths from i to j. (n is the number of states) • States are sorted in some order and numbered 1 to n • Rij(k) is regular expression of all paths from i to j passing thru nodes whose sort order is less than k • Computed for all i,j for k=0, then k=1,…,k=n • Rs,f1(n)+…+Rs,fk(n) is the regular expression of the DFA • s is the start state, f1,…,fk are accepting states, n is the number of states.

DFA to RE - complexity • Hopcroft formula is O(n34n), • n3 to compute the table and • 4n as size of regular expression grows by 4 every time. • In practice it is close to O(n3) • By simplifying the regular expression at every step and • using judicious algorithm avoiding recomputation of Rkk(k) • Most DFAs have almost n and not 2n accessible states • A faster state elimination method close to O(n2) is also available

RE to Automatan conversion • Regular expression is converted to ε-NFA • ε-NFA can the be converted to NFA and to DFA • RE to ε-NFA conversion rules : • ε -> One edge (two state) DFA with ε transition • Ø -> Two state DFA with no edges • a -> Two state with “a” transition • + -> A new start/accept statejoining two arguments of + in parallel • . -> Accept of first is start of second • * -> An ε edge joining star/accept of argument and a new start/accept state • Convert resulting ε-NFA to a DFA

Direct conversion • Augment regular expression r to (r).# • Position number for each occurrence of alphabet • Compute for each node of syntax tree • nullable (ε in the language) • firstpos (set of possible first alphabets) • lastpos (set of possible last alphabets) • Compute for each position • followpos (set of possible next alphabet after this position) • Construct the DFA

Applications • Unix text search, search matching patterns (grep) • Lexical/Parser analysis • Parse text against a regular expression • find set of first tokens at this expression root • find set of last tkens at this expression root • can the expression at this root be null set • find set of next tokens after an alphabet position in a regular expression • Efficient search of patterns in very large repository (web text search)

Regular Language DEFINITION • A language (a set of strings) • is defined to be a regular language if • it can be defined by a finite automaton • by a DFA or • by an NFA or • by an ε-NFA or • by a regular expression • Four different ways to describe a regular language

Pumping Lemma • If L is a regular language then there exists • integer n such that • for every string w in L • we can break w into x, y, z such that w=x.y.z • y  ε • |x.y| =< n • x.yk.z is in L (for all k >= 0) • Proof based on • For a DFA of length n • any string of length > n • must revisit a state • Used to prove that a language is not regular

Closure property • Language is a set of string over finite alphabets • Language operators : • Union of two languages L(A  B) = L(A)  L(B) - re • Intersection • Concatenation L(A.B) = { a.b | a in A, b in B} • Kleene Closure L(A*) = { an | a in A, n >= 0 } • a0 = ε for all a and an = an-1 • Compliment L(A’) = { a | a not in A } (with respect to some overall alphabet set) - dfa • Difference L(A-B) = L(A) – L(B) - dfa switch q0 F • Reversal L (A) = { ak.ak-1…a1 | a1…ak-1.ak in A } • Homomorphism – replace an alphabet with another regular expression • Inverse homomorphism

Decision properties • Is the language described empty? • Is a particualr string in the described language? • Do two different of languages actually describe the same language?

Conversions • Decision properties may require conversion between various forms. • Can the conversion be done in reasonable time?

Equivalence of automata • Equivalence of two states • States p and q in an automaton are Defined to be equivalent if • For all input strings applied at state p or q • p ends up in an accepting state • if and only if • q also ends up in an accepting state • The accepting state reached by p does not have to be same accepting state as that reached by q

Minimization of DFA • If two states p and q are equivalent • we can combine them together into a single state • it wont affect the language accepted by the DFA • This process of combining states together is called Minimization • Table-filling algorithm can find if two states are equivalent or not. Complexity O(n2) • Non-equivalent pairs are distinguishable

MinimuMDFA • Minimum DFA is unique • Eliminate all states not reachable from start • Determine which states are equivalent • Partition states into blocks of equivalent states • Equivalence is transitive • Thus no state is in two blocks • Equivalence of two Regular Languages • Convert them into their minimum DFAs • and check for isomorphism • Union method • Make a minimum DFA of the union of the two • Start state of the two original DFAs must be equivalent if and only if DFAs are equivalent

Regular Languages and Expressions

Regular Languages and Expressions

Presentation Transcript

Languages, grammars, and regular expressions

Languages, Grammars, and Regular Expressions

Regular Expressions and Non-regular Languages

Regular Languages Regular Expressions Finite-State Automata

Regular Expressions

Regular expressions Regular languages

Regular Languages and Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions and Regular Languages

Chapter 3 Regular Expressions and Languages

Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions and Non-regular Languages

Chapter 3 Regular languages and expressions

Regular Expressions

3. Regular Expressions and Languages

Regular Expressions

Regular expressions

Regular Expressions