2.5 Regular Expressions

q 0 q 3 q 1 q 2 0 0 1 0 0, 1 1 1 0 0 1 q 5 q 4 1 2.5 Regular Expressions A language L accepted by an FA M is a regular language. An FA M is a recognizer that can be used to determine whether a given string  is in L or not. Although an FA M can be used to represent a regular language L. But it is not easy to describe what the set L is according to the machine M. For example, can you describe the set L 1 accepted by the following FA M 1?

It is not easy to understand what a program does. In a sense an FA M is a program. It is still not easy to understand what an FA does. Therefore, an FA is not a good descriptor to represent the set accepted by an FA is. The language L 1 accepted by the machine M 1.can be described as the set of strings of 0’s and 1’s either ending with 00 or containing a substring 010. Regular expressions are as good and powerful as natural language to describe regular languages. Therefore, the regular expression is a pretty good descriptor for regular languages. Theorem 2 and theorem 3 will prove that both regular expressions and finite state automata define the same class of regular languages.

Definition 9 : The regular expressions over an alphabet  are defined recursively by the following: (1)  is a regular expression and represents the empty language. (2) is a regular expression and represents the language {}. (3) a   is a regular expression and represents the language {a}. (4) If r and s are regular expressions representing the sets R and S, then (r+s), (r  s) and (r*) are regular expressions represents the languages RS, RS and R*, respectively.

Definition 10 : If r is a regular expression, then the set represented by r is denoted by L( r ). Note : Regular expressions are strings over the alphabet { (, ), +, , *, , } . Left parenthesis ( and right parenthesis ) are parts of regular expression. Convention : As long as there is no ambiguity in a regular expression, we may omit parentheses according to the precedence rule that * is higher than concatenation  and +, and concatenation  is higher than +. And we may also omit concatenation  . Convention : In order to distinguish regular expressions from input strings, regular expressions are written in red.

Example 1 : Find a regular expression over an alphabet ={0, 1} representing the set L of strings of 0’s and 1’s either ending with 00 or containing a substring 010. Solution : First, consider the set L 1of strings ending with 00. A possible regular expression representing the set L 1is (0+1)*(00). Second, consider the set L 2of strings containing a substring 010. A possible regular expression representing the set L 2is (0+1)* (010) (0+1)*. Third, L = L 1  L 2. A regular expression representing the set Lis ( (0+1)*(00) + (0+1)* (010) (0+1)* ).

q 0 Theorem 2 : Let r be a regular expression. Then there is an NFA M that accepts L( r ). Proof : The proof is based on the definition of regular expressions. And we prove the theorem by induction on the number of operators in the regular expressions. First, there is no operator in a regular expression. (1) consider the regular expression  that denotes the set { }. Define an NFA M = (, Q, , q 0, F) as follows. Then we have that L( M ) = L(  )= .

q 0 a q 0 q f (2) consider the regular expression  that denotes the set { }. Define an NFA M = (, Q, , q 0, F) as follows. Then we have that L( M ) = L( )= { }. (3) consider the regular expression a   that denotes the set {a }. Define an NFA M = (, Q, , q 0, F) as follows. Then we have that L( M ) = L( a )= {a }.

q f1 q f2 q 02 q 01   M 1 q 0  q f  M 2 Second, assume that the theorem is true for regular expressions containing at most n operators. Third, consider the induction step. If regular expressions both r and s contain at most n operators, then there is an NFA M 1 = (, Q1 , 1, q 01, F 1={q f1}) such that L( M 1 )=L( r ) and an NFA M 2 = (, Q2 , 2, q 02, F 2 = {q f2}) such that L( M 2 ) = L( s ), where Q1 Q2 = . (1) consider the regular expression (r+s). Let NFA M = (, Q, , q 0, F) as follows, Q= {q 0, q f}  Q1 Q2. Then we have that L( M ) = L(M 1)  L(M 2) = L((r+s)).

   q 0 q 0 M 1 M 2 q f1 q 01 q f1 q 01 q f2 q 02    M 1  q f q f (2) consider the regular expression (rs). Let NFA M = (, Q, , q 0, F) as follows, Q= {q 0, q f}  Q1 Q2. Then we have that L( M ) = L(M 1)L(M 2) = L((rs)). (3) consider the regular expression (r*). Let NFA M = (, Q, , q 0, F) as follows, Q= {q 0, q f}  Q1. Then we have that L( M ) = L(M 1)* = L((r*)).

Theorem 3 : If L=L(M) for some DFA M= (, Q, , q 0, F), then there is a regular expression r such that L= L( r ). Proof : The proof is based on dynamic programming method to find an equivalent regular expression for a DFA. Given a DFA M = (, Q, , q 1, F), where Q={q 1, q 2, ..., q n}, i.e., |Q| = n. Define regular expressions R i, j k for the DFA M = (, Q, , q 0, F), where 1  i, j  n and 0  k  n, by R i, j k = {*|  (q i, ) = q j, and for any prefix  of ,    , and   , if  (q i, ) = q s, then s  k. }

R i, j k can also be defined recursively by: (1) for k = 0, (a) R i, j 0 = {a  | ( q i, a) = q j}, if i  j, or (b) R i, j 0 = {a  | ( q i, a) = q j} {}, if i = j . The set R i, j 0 is finite, and there is an equivalent regular expression r i, j 0 for R i, j 0. The regular expression r i, j 0 = a 1 + … +a t, or r i, j 0 = a 1 + … +a t +  , where ( q i, a s) = q j , 1  s  t. (2) for k > 0, R i, j k = R i, k k-1 (R k, k k-1 )*R k, j k-1  R i, j k-1, there is an equivalent regular expression r i, j k for R i, j k , where r i, j k = r i, k k-1 (r k, k k-1 )*r k, j k-1 + r i, j k-1.

By the definition of R i, j k, we know that R i, j kis regular for 1  i, j  n and 0  k  n, and there is a corresponding equivalent regular expression r i, j k. By the definition of R i, j k, we have that the set accepted by the machine M is L = L(M) = {*|  (q 1, ) = q jF} =  qjFR 1, j n. For each final state q jF = {qs1, qs2, …, qst}, there is an equivalent regular expressionr 1, j nfor R 1, j n. Therefore, there is an equivalent regular expression r 1, s1 n + r 1, s2 n + … + r 1, st nfor L.

Algorithm : The algorithm to compute a regular expression r for some DFA M= (, Q, , q 0, F) such that L( r ) = L(M) is as follows. Step 1 : k = 0, for i = 1 to n do for j = 1 to n do compute r i, j 0 Step 2 : for k = 1 to n do for i = 1 to n do for j = 1 to n do compute r i, j k = r i, k k-1 (r k, k k-1 )*r k, j k-1 + r i, j k-1 The time bound of the algorithm is O(n 3).

1 1 0 0 1 k=0 k=1 k=2 e e e r(1,1,k) 0(00)*0+ q 2 q 1 q 3 0 r(1,2,k) 0 0 0(00)* 1 1 r(1,3,k) 0(00)*(1+01)+1 r(2,1,k) 0 0 (00)*0 e r(2,2,k) (00)* 00+e r(2,3,k) 1 1+01 (00)*(1+01)   r(3,1,k) 1(00)*0 r(3,2,k) 1 1 1(00)* e e r(3,3,k) 1(00)*(1+01)+0+ e 0+ 0+ Example 2 : Find a regular expression representing the set L over an alphabet ={0, 1} accepted by the following DFA M. Solution : r 1, 3 3 = r 1, 3 2+r 1, 3 2(r 3, 3 2 )* r 3, 3 2 = (0(00)*(1+01)+1)+(0(00)* (1+01)+1)(1(00)*(1+01)+0+)*(1(00)*(1+01)+0+) = (0(00)*(1+01)+1) (1(00)* (1+01)+0)* = (0*1) (1(00)* (1+01)+0)* = (0*1) (10*1+0)*

2.5 Regular Expressions

2.5 Regular Expressions

Presentation Transcript

Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions

Regular expressions

Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions

Regular expressions

Regular Expressions

Regular Expressions