1 / 23

Regular Expressions: Theory and Perl Implementation

Regular Expressions: Theory and Perl Implementation. Outline: 1. Theoretical Definitions and Examples 2. Acceptance by Finite Automata 3. Perl’s Syntax 4. Other pattern matching functionality in Perl 5. Program Example. Alphabets and Sets of Strings.

len-cantu
Télécharger la présentation

Regular Expressions: Theory and Perl Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Expressions:Theory and Perl Implementation Outline: 1. Theoretical Definitions and Examples 2. Acceptance by Finite Automata 3. Perl’s Syntax 4. Other pattern matching functionality in Perl 5. Program Example CSE 341 S. Tanimoto Perl-Regular-Expressions -

  2. Alphabets and Sets of Strings An alphabet = {a1, a2, ..., an} is a set of characters. A string over  is a sequence of zero or more elements of . Example. If  = {0, 1, 2} then 2201 is a string over . No matter what  is, the empty string  is a string over . A set of strings over  is a set of zero or more strings, each of which is a string over . Example. If  = {0, 1, 2} then {, 111, 121, 0} is a set of strings over . CSE 341 S. Tanimoto Perl-Regular-Expressions -

  3. A Recursive Definition for Regular Expressions A regular expression for an alphabet  is a certain kind of pattern that describes a set of strings over . Any character c in  is a regular expression representing {c} If E, E1 and E2 are regular expressions over  then so are E1 E2 -- representing the setconcatenation of E1 and E2. E1 | E2 -- representing alternation of E1 and E2. ( E ) -- representing E grouped with parentheses. E+ -- rep. one or more instances of E concatenated. E* -- zero or more instances of E CSE 341 S. Tanimoto Perl-Regular-Expressions -

  4. Regular Expression Examples Let  = {a, b}. a = {a} ab = {ab} a | b = {a, b} a+ = {a, aa, aaa, ... } ab* represents the set of strings having a single a followed by zero or more occurrences of b. That is, it’s {a, ab, abb, abbb, ... } a (b | c) = {ab, ac} (a | b) (c | d) = {ac, ad, bc, bd} aa* = a+ = {a, aa, aaa, ... } CSE 341 S. Tanimoto Perl-Regular-Expressions -

  5. Extended Regular Expressions Let letters = a | b | c | d Let digits = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Let identifiers = letters ( letters | digits )* Thus we can use a name to represent a set of strings and use that name in a regular expression. CSE 341 S. Tanimoto Perl-Regular-Expressions -

  6. Finite Automaton b a a start state accepting state corresponding regular expression: ab*a Example: process the string abba Now try abbb Finite number of states, but number of strings is not necessarily finite. CSE 341 S. Tanimoto Perl-Regular-Expressions -

  7. Equivalence of Finite Automata and Regular Expressions a a ab a | b a* a b a b a CSE 341 S. Tanimoto Perl-Regular-Expressions -

  8. Regular Expressions in Perl In Perl, regular expressions are used to specify patterns for pattern matching. $sentence = "Spring weather has arrived." if ($sentence =~ /weather/) { print "Never bet on the weather." ; } # $string =~ /Pattern/ The result of this kind of pattern matching is a true or false value. CSE 341 S. Tanimoto Perl-Regular-Expressions -

  9. A Perl Regular Expressionfor Identifier $identifier = "[a-z][a-z0-9]*"; $sentence = "012,cse341 341,ABC]*"; if ($sentence =~ /$identifier/) { print "Seems to be an identifier here." ; } $ident2 = "[a-zA-Z][a-zA-Z0-9]*"; $reservedWord = "begin|end"; CSE 341 S. Tanimoto Perl-Regular-Expressions -

  10. Specifying Patterns /Pattern/ # Literal text; # true if it occurs anywhere in the string. /^Pattern/ # Must occur at the beginning. "Pattern recognition is alive" =~ /^Pattern/ "The end" =~ /end$/ \s whitespace \S non-whitespace \w a word char. \W a non-word char. [a-zA-Z_0-9] \d a digit \D a non-digit \b word boundary \B not word boundary CSE 341 S. Tanimoto Perl-Regular-Expressions -

  11. Specifying Patterns (Cont.) $test = "You have new mail -- 5-24-99"; if ($test =~ /^You\s.+\d+-\d+-\d+/ ) { print "The mail has arrived."; } if ($test =~ m( ^ You \s .+ \d+ - \d+ - \d+ ) { print "The mail has arrived."; } CSE 341 S. Tanimoto Perl-Regular-Expressions -

  12. Extracting Information $test = "You have new mail -- 5-24-99"; if ($test =~ /^You\s.+(\d+)-(\d+)-(\d+)/ ) { print "The mail has arrived on "; print "day $2 of month $1 in year $3.\n"; } # Parentheses in the pattern establish # variables $1, $2, $3, etc. to hold # corresponding matched fragments. CSE 341 S. Tanimoto Perl-Regular-Expressions -

  13. Search and Replace $sntc = "We surfed the waves the whole day." $sntc =~ s/surfed/sailed/; print $sntc; # We sailed the waves the whole day. $sntc =~ s/the//g; print $sntc; # We sailed waves whole day. # g makes the replacement “global”. CSE 341 S. Tanimoto Perl-Regular-Expressions -

  14. Interpolation of Variables in Replacements $exclamation = "yeah"; $sntc = "We had fun." $sntc =~ s/w+/$exclamation/g; print $sntc; # yeah yeah yeah. # a pattern can contain a Perl variable. CSE 341 S. Tanimoto Perl-Regular-Expressions -

  15. Example of (Crude) Lexical Analysis $ident = "[a-zA-Z][a-zA-Z0-9]*"; $int = "[\-]?[0-9]+"; $op = "[\-\+\*\/\=]|mod"; $exp = "begin x = 5; print sqrt(x); end"; $exp =~ s/$ident/ID/g; $exp =~ s/$int/N/g; $exp =~ s/$op/OP/g; print $exp; ID ID OP N; ID ID(ID); ID CSE 341 S. Tanimoto Perl-Regular-Expressions -

  16. Processing Assignment Submissions Using Forms and Files 1. Form file 2. Perl script to process data from form. 3. Perl script to “compile” data into an index page. CSE 341 S. Tanimoto Perl-Regular-Expressions -

  17. The HTML Form <html><head> <title>Submission for CSE 341 Miniproject Topic Proposals</title> </head><body> <h1>CSE 341 Miniproject Topic Proposal Submission Form</h1> Write a topic-proposal web page, and then fill out this form and submit it by Thursday, February 24 at 5:00 PM. (The web page should follow these <a href="http://www.cs.washington.edu/education/courses/341/00wi/MP-topic-proposal-guidelines.html"> guidelines</a>.) <br><form method=post action="http://cubist.cs.washington.edu/~tanimoto/341-student/process-topic-proposal.pl"> CSE 341 S. Tanimoto Perl-Regular-Expressions -

  18. The HTML Form (2 of 2) <br>Possible name of project: <input type=text name=projectname value="" size=40> <br>Name of Possible partner (optional): <input type=text name=partner value=""> <br>URL of a web page that describes your proposal: <input type=text name=proposalurl value="" size=40> <br>If you plan to submit another topic proposal because you are very uncertain about whether to stick with this one, check this box: <input type=checkbox name=uncertain value="No"> <br><input type=submit name=submit value="Submit"> </form> </body></html> CSE 341 S. Tanimoto Perl-Regular-Expressions -

  19. Perl Script to Process Data From Form #! /usr/bin/perl # Process the miniproject topic proposal form inputs # S. Tanimoto, 20 Feb 2000 use CGI qw/:standard/; use strict; print header; my $projectname = param("projectname"); my $uncertain = param("uncertain"); my $partner = param("partner"); my $proposal_url = param("proposalurl"); my $student_username = $ENV{"REMOTE_USER"}; my $now = localtime(); $projectname =~ s/[^a-zA-Z0-9\-\~]//g; $partner =~ s/[^a-zA-Z0-9\-\~]//g; $proposal_url =~ s/[^a-zA-Z0-9\-\~]//g; CSE 341 S. Tanimoto Perl-Regular-Expressions -

  20. Perl Script to Process the Data (2 of 2) my $output_line = "STUDENT_USERNAME=$student_username; " . "PROPOSAL_URL=$proposal_url; " . "PROJECT_NAME=$projectname; " . "PARTNER=$partner; " . "UNCERTAIN=$uncertain; " . "DATE=$now; "; if (! (open(OUT, ">>MP-topic-proposal-data.txt"))) { print("Error: could not open topic file for output."); print("Please notify instructor and/or try again later."); print end_html; exit 0; } print OUT $output_line, "\n"; close OUT; print h1("Your miniproject topic proposal has been received. Thanks!"); print end_html; CSE 341 S. Tanimoto Perl-Regular-Expressions -

  21. Perl Script to “Compile” the Data #!/usr/bin/perl # make-MP-index-of-proposed-topics.pl use strict; use CGI qw/:standard/; open(INFILE, "<MP-topic-proposal-data-sorted.txt") || die("Could not open the file MP-topic-proposal-data-sorted.txt.\n"); print<<"EOT"; <html><head><title>CSE 341 MP Topic Proposal Index</title> </head><body> <h1>CSE 341 MP Topic Proposal Index</h1> EOT print "<table><tr><td>Student username</td><td>Proposal Page</td><td>Partner</td><td>Certainty</td><td>When</td></tr>\n"; my $projectname; my $uncertain; my $partner; my $proposal_url; my $student_username; my $date; CSE 341 S. Tanimoto Perl-Regular-Expressions -

  22. Perl Script to “Compile” the Data (2 of 3) while (<INFILE>) { if ( /STUDENT_USERNAME=([^\;]+);\s/){$student_username =$1; } else { $student_username =""; } if ( /PROJECT_NAME=([^\;]+);\s/){$projectname =$1; } else { $projectname =""; } if ( /PROPOSAL_URL=([^\;]+);\s/){$proposal_url =$1; } else { $proposal_url =""; } if ( /PARTNER=([^\;]+);\s/){$partner =$1; } else { $partner =""; } if ( /UNCERTAIN=([^\;]+);\s/){$uncertain =$1; } else { $uncertain =""; } if ( /DATE=([^\;]+);/){$date = $1; } else { $date = ""; } if ($proposal_url =~ /http/ ) {} else { $proposal_url = "http://" . $proposal_url; } if ($uncertain eq "No") { $uncertain = ""; } else { $uncertain = "Uncertain"; } CSE 341 S. Tanimoto Perl-Regular-Expressions -

  23. Perl Script to “Compile” the Data (3 of 3) my $link = "<a href=\"$proposal_url\">$projectname</a>"; print "<tr><td>$student_username</td><td>$link</td><td>$partner</td><td>$uncertain</td><td>$date</td></tr>\n"; } print "</table>\n"; print "</body></html>\n"; CSE 341 S. Tanimoto Perl-Regular-Expressions -

More Related