MA/CSSE 474

MA/CSSE 474 Theory of Computation Regular Expressions Intro

Your Questions? Still more language ambiguity! • Previous class days' material • Reading Assignments • HW5 problems • Anything else

Regular Languages Regular Language Describes Regular Expression Accepts Finite State Machine

Regular Expressions The regular expressions over an alphabet  are the strings that can be obtained as follows: 1.  is a regular expression. 2.  is a regular expression. 3. Every element of  is a regular expression. 4. If  ,  are regular expressions, then so is . 5. If  ,  are regular expressions, then so is . 6. If  is a regular expression, then so is *. 7.  is a regular expression, then so is +. 8. If  is a regular expression, then so is (). #7 is here for convenience only (syntactic sugar); many authors do not include + in the list of r.e. builders.

Regular Expression Examples If  = {a, b}, the following are regular expressions:   a (ab)* (abba )+(a  bab)

Regular Expressions Define Languages Define L, a semantic interpretation function for regular expressions (Let  and  be arbitrary regular expressions over alphabet ). 1. L() = . 2. L() = {}. 3. If c , L(c) = {c}. 4. L() = L() L(). 5. L() = L() L(). 6. L(*) = (L())*. 7. L(+) = L(*) = L() (L())*. If L() is equal to , then L(+) is also equal to . Otherwise L(+) is the language that is formed by concatenating together one or more strings drawn from L(). 8. L(()) = L().

The Role of the Rules • Rules 1, 3, 4, 5, and 6 give the language its power to define sets. • Rule 8 has as its only role grouping other operators. • Rules 2 and 7 appear to add functionality to the regular expression language, but they don’t. 2.  is a regular expression. 7.  is a regular expression, then so is +.

Operator Precedence in Regular Expressions Regular Arithmetic Expressions Expressions Highest Kleene star and + exponentiation concatenation multiplication Lowest union addition a b* c d* x y2 + i j2

Analyzing a Regular Expression L((ab)*b) = L((ab)*) L(b) = (L((ab)))* L(b) = (L(a) L(b))* L(b) = ({a}  {b})* {b} = {a, b}* {b}.

From English to reg exps L = {w {a, b}*: |w| is even} L = {w {0, 1}*: w is a binary representation of a multiple of 4} L = {w {a, b}*: w contains an odd number of a’s}

The Details Matter a* b*  (ab)* (ab)* a*b*

More Regular Expression Examples L ( (aa*)  ) = L ( (a)* ) = L = {w {a, b}*: there is no more than one b in w} L = {w {a, b}* : no two consecutive letters in w are the same}

The Details Matter L1 = {w {a, b}* : every a is immediately followed a b} A regular expression for L1: A FSM for L1: L2 = {w {a, b}* : every a has a matching b somewhere} A regular expression for L2: A FSM for L2:

Kleene’s Theorem Finite state machines and regular expressions define the same class of languages. To prove this, we must show: Theorem: Any language that can be defined by a regular expression can be accepted by some FSM and so is regular. Theorem: Every regular language (i.e., every language that can be accepted by some DFSM) can be defined with a regular expression.

For Every Regular Expression There is a Corresponding FSM We’ll show this by construction. An FSM for: : A single element c of :  :

Union If  is the regular expression  and if both L() and L() are regular:

Concatenation If  is the regular expression  and if both L() and L() are regular:

Kleene Star If  is the regular expression * and if L() is regular:

An Example (b ab)* An FSM for b An FSM for a An FSM for b An FSM for ab:

An Example (b ab)* An FSM for (bab):

An Example (b ab)* An FSM for (bab)*:

For Every FSM There is a Corresponding Regular Expression • We’ll show this by construction. The construction is different than the textbook's. • Let M = ({q1, …, qn}, , , q1, A) be a DFSM.Define Rijk to be the set of all strings x  * such that • (qi,x) |-M (qj, ), and • if (qi,y) |-M (q𝓁, ), for any prefix y of x (except y=  and y=x), then 𝓁  k • That is, Rijk is the set of all strings that take us from qi to qj without passing through any intermediate states numbered higher than k. • In this case, "passing through" means both entering and leaving. • Note that either i or j (or both) may be greater than k. * *

Example: Rijk • Rijk is the set of all strings that take us from qi to qj without passing through any intermediate states numbered higher than k. • In this case, "passing through" means both entering and leaving. • Note that either i or j (or both) may be greater than k. R000 R010 R011 R021 R022 R232 R233

DFAReg. Exp. construction • Rijk is the set of all strings that take M from qi to qj without passing through any intermediate states numbered higher than k. • Examples: Rijn is • Also note that L(M) is the union of R1jn over all qj in A. • We will show that for all i,j{1, …, n} and all k {0, …, n}, Rijk is defined by a regular expression. • We already know that the union of languages defined by reg. exps. is defined by a reg. exp.

DFAReg. Exp. continued • Rijk is the set of all strings that take M from qi to qj without passing through any intermediate states numbered higher than k. It can be computed recursively: • Base cases (k = 0): • If i  j, Rij0 = {a : (qi, a) = qj} • If i = j, Rii0 = {a : (qi, a) = qi}  {} • Recursive case (k > 0): Rijk is Rijk-1  Rikk-1(Rkkk-1)*Rkjk-1 • We show by induction that each Rijk is defined by some regular expression rijk.

DFAReg. Exp. Proof pt. 1 • Base case definition (k = 0): • If i  j, Rij0 = {a : (qi, a) = qj} • If i = j, Rii0 = {a : (qi, a) = qi}  {} • Base case proof:Rij0 is a finite set of symbols, each of which is either  or a single symbol from . So Rij0 can be defined by the reg. exp. rij0 = a1a2…ap (or a1a2…ap if i=j),where {a1, a2, …,ap} is the set of all symbols a such that (qi, a) = qj. • Note that if M has no direct transitions from qi to qj, then rij0 is  (it is  if i=j and no "loop" on that state).

DFAReg. Exp. Proof pt. 2 • Recursive definition (k > 0): Rijk is Rijk-1  Rikk-1(Rkkk-1)*Rkjk-1 • Induction hypothesis: For each 𝓁 and 𝓂, there is a regular expression r𝓁𝓂k-1 such that L(r𝓁𝓂k-1 )= R𝓁𝓂k-1. • Induction step. By the recursive parts of the definition of regular expressions and the languages they define, and by the above recursive defintion of Rijk :Rijk = L(rijk-1  rikk-1(rkkk-1)*rkjk-1)

DFAReg. Exp. Proof pt. 3 • We showed by induction that each Rijk is defined by some regular expression rijk. • In particular, for all qjA, there is a regular expression r1jn that defines R1jn. • Then L(M) = L(r1j1n … r1jpn ), where A = {qj1, …, qjp}

An Example Start q1 q2 q3 0 1 0 0,1 1 k=0 k=1 k=2 r11k   (00)* r12k 00 0(00)* r13k 1 1 0*1 r21k 0 0 0(00)* r22k    00 (00)* r23k 1 1  01 0*1 r31k   (0  1)(00)*0 r32k 0  1 0  1 (0  1)(00)* r33k    (0  1)0*1

MA/CSSE 474

MA/CSSE 474

Presentation Transcript

MA/CSSE 473 Day 25

MA/CSSE 473 Days 35,37

MA/CSSE 474

MA/CSSE 473 Day 04

MA/CSSE 474 Theory of Computation

MA/CSSE 474

MA/CSSE 474 Theory of Computation

MA/CSSE 473/474

MA/CSSE 473 Day 19

MA/CSSE 473 Day 26

MA/CSSE 474 Theory of Computation

MA/CSSE 474 Theory of Computation

MA/CSSE 473 Day 14

MA/CSSE 474 Theory of Computation

MA/CSSE 474 Theory of Computation

MA/CSSE 474 Theory of Computation

MA/CSSE 474 Theory of Computation

MA/CSSE 474 Theory of Computation

MA/CSSE 474 Theory of Computation

MA/CSSE 474

MA/CSSE 474

MA/CSSE 473 Day 05