Regular Languages

Giorgi Japaridze Theory of Computability Regular Languages Chapter 1

1.1.a Giorgi JaparidzeTheory of Computability How a finite automaton works 1 q0 0 q2 1 1 0 q1 0 0 1 1 0 0

1.1.b Giorgi JaparidzeTheory of Computability The language of a machine 1 q0 0 q2 1 1 0 q1 0 L(M), “the language ofM”, or “the language recognized by M” --- the set of all strings that the machine M accepts What is the language recognized by our automaton A?

1.1.c Giorgi JaparidzeTheory of Computability Formal definition of a finite automaton • A (deterministic)finite automaton (DFA)is a 5-tuple • (Q, , , s, F), where: • Q is a finite set whose elements are called thestates •  is a finite set called thealphabet •  is a function of the typeQ  Q called thetransition function • s is an element of Q called thestart state • F is a subset of Q called the set of accept states

1.1.d Giorgi JaparidzeTheory of Computability Our automaton formalized 1 q0 0 q2 Q: : : s: F: 1 1 0 q1 0 1 q0 q1 q2 0 A = (Q, , , s, F)

1.1.e Giorgi JaparidzeTheory of Computability Formal definition of computation M = (Q, , , s, F) 1 q0 0 q2 1 1 0 q1 • M acceptsthe string • u1 u2 … un • iff there is a sequence • r1, r2,…, rn, rn+1 • of states such that: • r1=s • ri+1 =(ri,ui), for eachiwith 1 in • rn+1 F 0 u1 u2 … un 0 1 1 0 0 q0 q2 q0 q0 q2 q1 r1, r2,…, rn, rn+1

1.1.f Giorgi JaparidzeTheory of Computability Task: Design an automaton that accepts a bit string iff it contains an even number of “1”s. Designing finite automata

1.2.a Giorgi JaparidzeTheory of Computability NFAs (Nondeterministic Finite Automata) q1 1 q2 0,1 q3 0,1 0 1 0 1 0 q1 0 q1 1 q1 q2 0 q1 q3 1 q1 q2 0 q1 q3

1.2.a Giorgi JaparidzeTheory of Computability NFAs (Nondeterministic Finite Automata) q1 1 q2 0,1 q3 0,1 What language does this NFA recognize?

1.2.b Giorgi JaparidzeTheory of Computability Formal definition of a nondeterministic finite automaton • AnNFA is a 5-tuple (Q, , , s, F), where: • Q is a finite set whose elements are called thestates •  is a finite set called thealphabet •  is a function of the type Q  P(Q) called thetransition function • s is an element of Q called thestart state • F is a subset of Q called theset of accept states

1.2.c Giorgi JaparidzeTheory of Computability Example 1 Q: : : s: F: b b a a a b 1 2 3 3 a,b 2 A = (Q, , , s, F)

1.2.d Giorgi JaparidzeTheory of Computability Formal definition of accepting M = (Q, , , s, F) When M is a DFA When M is an NFA • M acceptsthe string • u1 u2 … un • iff there is a sequence • r1, r2,…, rn, rn+1 • of states such that: • r1=s • ri+1 =(ri,ui), for eachiwith 1 in • rn+1 F • M acceptsthe string • u1 u2 … un • iff there is a sequence • r1, r2,…, rn, rn+1 • of states such that: • r1=s • ri+1 (ri,ui), for eachiwith 1 in • rn+1 F

1.2.e Giorgi JaparidzeTheory of Computability What language does this NFA recognize? 0 0 0 0 0 0 0

1.2.f Giorgi JaparidzeTheory of Computability What language does this DFA recognize? 1 2 0 0 0 0 3 0 0 5 4 0

1.2.g Giorgi JaparidzeTheory of Computability Equivalence of NFAs and DFAs Two machines are said to be equivalentif they recognize the same language. Theorem 1.39 Every NFA has an equivalent DFA. Proof. Consider an NFA N = (Q, , , s, F) We need construct an equivalent DFA D = (Q’, , ’, s’, F’) using a procedure called the subset constructiondescribed on the next slide.

1.2.h Giorgi JaparidzeTheory of Computability The subset construction Constructing DFA D = (Q’, , ’, s’, F’) from NFA N = (Q, , , s, F) • Q’ = P (Q) • ’(R,a) = {q | q=(p,a) for some pR} • s’ = {s} • F’= {R | R is a subset of Q containing an accept state of N} • D obviously works correctly: • at every step in the computation, it clearly enters a state that • corresponds to the subset of states that N could be in at that point.

1.2.i Giorgi JaparidzeTheory of Computability Q’: : ’: s’: F’: Example of applying the subset construction N = (Q, , , s, F) 1 a b  {1} {2} {3} {1,2} {1,3} {2,3} {1,2,3} b b a a 3 a,b 2 • Q’ = P (Q) • ’(R,a) = {q | q=(p,a) for some pR} • s’ = {s} • F’= {R | R is a subset of Q containing an • accept state of N}

1.2.j Giorgi JaparidzeTheory of Computability D a,b The resulting DFA {3} b  a a b {1,3} a {1} b b b {2,3} a {2} a b a,b {1,2,3} {1,2} a

1.2.k Giorgi JaparidzeTheory of Computability D a,b Removing unreachable states {3} b  a a {1} b b {2,3} a b {1,2,3} a

1.2.l Giorgi JaparidzeTheory of Computability D N a,b Testing in work {3} b  1 a a b b a {1} b a b 3 a,b 2 {2,3} a b b a a {1,2,3} a

1.3.a Giorgi JaparidzeTheory of Computability Union:L1  L2 = {x | xL1 or xL2} {Good,Bad}  {Boy,Girl} = {0,00,000,…} {1,11,111,…} = L  = Concatenation:L1 L2 = {xy | xL1 and yL2} {Good,Bad}{Boy,Girl} = {0,00,000,…}{1,11,111,…} = L  = Star: L* = {x1…xk | k0 and each xi is in L} {Boy,Girl}* = {0,00,000,…}* =  * = Regular operations

1.3.b Giorgi JaparidzeTheory of Computability We say that R is a regular expression (RE) iff R is one of the following: 1. a, whereais a symbol of the alphabet 2.  3.  4. (R1)(R2), where R1and R2 are RE 5. (R1)(R2), where R1 andR2 are RE 6. (R1)*, where R1 is a RE What language is represented by the expression: {a} {}  The union of the languages represented by R1 and R2 The concatenation of the languages represented by R1 and R2 The star of the language represented by R1 Regular expressions • The symbol  is often omitted in RE • Some parentheses can be omitted. • The precedence order for the operators is: • * (highest), (medium),  (lowest) Conventions:

1.3.c Giorgi JaparidzeTheory of Computability Regular languages A language is said to beregulariff it can be represented by a regular expression. Language Expression {11} {Boy, Girl, Good, Bad} {,0,00,000,0000,…} {0,00,000,0000,…} {,01,0101,010101,01010101,…} {x | x = 0k where k is a multiple of 2 or 3} {x | x is divisible by 8} {x | x MOD 4 = 3}

1.3.d Giorgi JaparidzeTheory of Computability Exercising reading regular expressions Expression Language (Good  Bad)(Boy  Girl) (Tom  Bob)_is_(good  bad) {Name_is_adjective| Name is an uppercase letter followed by zero or more lowercase letters, and adjective is a lowercase letter followed by zero or more lowercase letters} 0*10* (0 1)*101(0 1)* ((0 1)(0 1))*

1.3.e Giorgi JaparidzeTheory of Computability Regular languages and DFA-recognizable languages are the same Theorem 1.54* A language is regular if and only if some NFA (DFA) recognizes it. Proof – omitted (but given in the textbook). The textbook describes an algorithm for converting any given regular expression to an equivalent NFA, and an algorithm for converting any given NFA to an equivalent regular expression.

1.4.a Giorgi JaparidzeTheory of Computability The computing power of finite automata is severely limited by the fact that their memory (= set of states) is small (= of a fixed size) while inputs can be arbitrarily large. The limitations of the power of DFAs While the memories of real computers are also finite, they are not fixed, in the sense that we assume one can always supply additional memory if needed. To summarize, DFAs are not as powerful as computers can generally be. The next slide gives several examples of non-regular languages, i.e. languages that no DFA can handle (recognize). The non-regularity of those languages can be strictly proven using the tool called pumping lemma. We omit the pumping lemma in this course (but it is in the textbook). Instead, we will simply rely on intuitive arguments. Warning: Generally one cannot safely rely on intuition when making important conclusions, because intuition can sometimes be deceptive. Only strict mathematical proofs can be trusted.

1.4.b Giorgi JaparidzeTheory of Computability Do the following languages look regular to you? Non-regular languages A = { ww | w {0,1}* } Is not regular. Intuitively, this is so because a DFA processing a long input will have forgotten much of the previously seen part of the input when it gets to the middle of the string. But without fully remembering the first half of the string, it is impossible to tell whether the second half coincides with it or not. Is not regular. B = { 0n1n | n0} Intuitively, this is so because a DFA processing a long input 0n1n will be unable to remember exactly how many 0s it has seen by the time when the 1s start. But without that information it is impossible to tell whether the remaining 1* part of input has the same length as the already seen 0* part. C = {w | w contains the same number of“0”s as“1”s} Is not regular. An intuitive reason here is similar to the one for language B. D = {w | w contains the same number of“01”s as“10”s} Is regular. Intuitively, it may appear to you that if C is irregular, “even more so” should be D. But you’ve been warned about the deceptiveness of intuition. The following slide shows a DFA that recognizes D, so that D is regular!

1.4.c Giorgi JaparidzeTheory of Computability A DFA recognizing D D = {w | w contains the same number of“01”s as“10”s} 1 0 1 0 0 1 1 1 0 0

Regular Languages