html5-img
1 / 15

Regular Expressions

Regular Expressions. CISC/QCSE 810. Recognizing Matching Strings. ls *.exe translates to "any set of characters, followed by the exact string ".exe" The "*.exe" is a regular expression ls gets a list of all files, and then only returns those that match the expression "*.exe". In Perl.

Télécharger la présentation

Regular Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Expressions CISC/QCSE 810

  2. Recognizing Matching Strings • ls *.exe • translates to "any set of characters, followed by the exact string ".exe" • The "*.exe" is a regular expression • ls gets a list of all files, and then only returns those that match the expression "*.exe"

  3. In Perl • In Perl, can see if strings match using the =~ operator $s = "Cat In the Hat"; if ($s =~ /Cat/) { print "Matches Cat"; } if ($s =~ /Chat/) { print "Matches Chat"; }

  4. Common references

  5. Exercise 1 • Write a regexp that matches only on Canadian postal codes

  6. Exercise 2 • Write a regexp that matches typical intermediate files (.o, .dvi, .tmp) • helpful if you want a systematic way to delete them

  7. String Substitution • Found an input file (*.dat), looking for a matching output file (<same>.out) @input_files = <*.dat> foreach $input_file (@input_files) { # Copy to output name $output_file = $input_file; # replace .dat with .out $output_file =~ s/.dat/.out/; if (! -f $output_file) { print "Need to create output for $output_file\n"; } }

  8. Translating • $s = "Alternate Ending"; • $s =~ tr/[a-z]/[A-Z]; • Can also use 'uc' and 'lc' (more generic for non-English languages)

  9. Grabbing Substrings • Get root URL $url = "http://www.mast.queensu.ca/~math224/Slides/Week_09/driven_spring2.m"; $url =~ /(www[\w.]*)/; $short_url = $1; print "Full URL: $url\n"; print "Site URL: $short_url\n";

  10. End options • s/a/A/g – global; swap all matches • changes "aaaba" to "AAAbA" • Compare with s/a/A/ • changes "aaaba" to "Aaaba" • /tmp/i - case insensitive • recognizes "tmp", "Tmp", "tMP", "TMP"…

  11. Exercise • Write a regexp line that returns all the integers in the text • Can it be extended to handle floating point values?

  12. Functions with Regex • split • split /\s+/, $line; • split /,/, $line; • split /\t/, $line • split //, $line; • grep • @v = qw( aaa bba bbc); • @matches = grep /bb/, @v;

  13. Longer example – Log files • Parsing log files 195.5.23.103 - - [25/Mar/2003:02:22:11 -0800] "GET /gcs/new.gif HTTP/1.1" 200 926 195.5.23.103 - - [25/Mar/2003:02:22:11 -0800] "GET /gcs/update.gif HTTP/1.1" 200 971 proxy.skynet.be - - [25/Mar/2003:02:40:54 -0800] "GET /gcs/gc1hint.html HTTP/1.1" 200 16358 j3194.inktomisearch.com - - [25/Mar/2003:03:13:12 -0800] "GET /~gcs/K-12.html HTTP/1.0" 200 3235 kittyhawk.hhmi.org - - [25/Mar/2003:03:17:20 -0800] "HEAD /gcs/ HTTP/1.0" 200 0 j3104.inktomisearch.com - - [25/Mar/2003:03:54:43 -0800] "GET /gcs/pa.html HTTP/1.0" 200 5614 crawl11-public.alexa.com - - [25/Mar/2003:04:51:41 -0800] "GET /gcs/clinical.html HTTP/1.0" 200 20132 … livebot-65-55-208-64.search.live.com - - [24/Jul/2007:22:16:58 -0700] "GET /gcs/webstats/usage_200602.html HTTP/1.0" 200 128720 203.129.234.42 - - [24/Jul/2007:22:22:39 -0700] "GET /gcs/status/statuscheck.html HTTP/1.1" 200 1522624 livebot-65-55-208-65.search.live.com - - [24/Jul/2007:22:47:32 -0700] "GET /gcs/webstats/usage_200610.html HTTP/1.0" 200 132580 …

  14. Alternate uses • If you write your own program, with many print statements, can • make print statements meaningful • "Time spent on loading: 23.5s" • can parse afterwards to process/store values • $line = m/: ([\d.])+s/; • $time = $1;

  15. Resources • Any web search for "perl regular expression tutorial" • Perl reg exp by example • http://www.somacon.com/p127.php • Reference card • http://www.erudil.com/preqr.pdf • Perl site reference • http://perldoc.perl.org/perlre.html

More Related