1 / 20

Perl Chapter 7

Perl Chapter 7. Pattern Matching. Introduction. Scanning strings for substrings useful in many applications grep , find files, compilers, … Pattern matching  UNIX (egrep) and awk.. Basis is regular expressions from theory of computation? Patterns are boolean expressions  T/F

ami
Télécharger la présentation

Perl Chapter 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perl Chapter 7 Pattern Matching

  2. Introduction • Scanning strings for substrings useful in many applications • grep, find files, compilers, … • Pattern matching  UNIX (egrep) and awk.. • Basis is regular expressions • from theory of computation? • Patterns are boolean expressions  T/F • Patterns remember parts (list)

  3. Syntax • m dl pattern dl [modifiers] • m is the operator • using / .. / as the delimiters makes m optional • Examples m ~pattern~ # ~ if / in pattern or /pattern/

  4. Simple Patterns • Match individual char or character classes • 3 categories • normal chars– which match themselves • metachars, which have special meanings in patterns (\, $, ? , + ) • backslash will turn a meta char into a normal char \? • period • Escape sequences (\t) can appear in a pattern in which case they match themselves, if preceded by the \

  5. Default string to match is $_ if (/snow/) { print “snow in \$_ \n”; } • /snow/ returns T/F • period matches any char expect a newline • /a../ would be an a followed by 2 non-newline chars

  6. Matching Character classes • defined by placing chars in [ ]s • [A-Za-z] • [0-7] octal digit • [aeiou] • [^A-Za-z] chars NOT in char class

  7. Common character classes • \d [0-9] • \D [^0-9] • \w [A-Za-z] a word char • \W [^A-Za-z] • \s [ \r\t\n\f] white space • \S [^ \r\t\n\f]

  8. /[A-Z]”\s/ - matches uppercase letter, a double quote and a whitespace • /[\dA-Fa-f]/ - matches one Hex digit $pattern = “ slkdjfsdf”; if (/$pattern/) { …. }

  9. Quantifiers • {n} - exactly n reps • {m, } – at least m reps • {m,n} - at least m, but not more than n /a{1,3}b}/ - matches ab, aab, aaab /(cats){3}/ - matches catscatscats /[abc]{1,2}/ - matches a, b, c, ab, ac, ba, bc, ca, cb • * 0 or more, including empty string • + 1 or more • ? 0 or 1 • . 1

  10. /\w+/ matches 1 or more word-chars • /\d+\.\d+/ matches 1 or more digits, decimal, 1 or more digits (i.e., a real decimal number) Note \. matches decimal!! • /\$?\d+\.\d\d/ matches a price with or without $ • /ba(ll)*/ matches ba followed by 0 or more occurrences of string ll • /\d{3}-\d{2}-\d{4}/ matches SSN

  11. Questions Assume $_ = “Tommie”; • Which m in Tommie does /m/ match? • What do these match? • /m*/ • /m+/ • /m*i/ • left most • matches empty string at beginning • matches mm • matches mmi

  12. Matching • .* greedy mode (default) matches the max possible non-newline chars $_=“Bob Bobcat Bobolink”; /.*Bob/ will match the Bob in Bobolink Actually .* matches whole string, then backs up one character at a time until it finds a match for the rest of the pattern “Bob”, finding rightmost occurrence. Works that way for all quantified patterns.

  13. Matching $_=“Freddie’s hot dogs are really hot!”; • /Fred+/  Fredd • /Fred+?/ ? minimal mode  Fred • /.*hot/  last hot • /.*?hot/  first hot

  14. Alternation • /a|e|i|o|u/ equivalent to /[aeiou]/ • /Fred|Mike|Dracula/ • left to right matching of alternatives • /Tom|Tommie/ never matches Tommie because leftmost pattern matched first • /to|too|two/ never matches too • Can use ( ) • /t(oo?|wo)/  to, too, or two

  15. Precedence • From highest to lowest • () • Quantifiers • char sequence - [belly|belts|bells] • Alternation • Careful mixing alternation with char-class • [belly|belts|bells] eq to [belyts]

  16. Binding operators • pattern can be matched to any string • connect string to pattern • $stringvar =~ /[,;:]/; finds pattern in $stringvar • $string !~ /[,;:]/; finds pattern, but inverts logic

  17. Remembering matches $s = “TD ran for 305 yards today”; $s =~ /(\d+)(\w+)(\w+)/; print “$1 $2 $3 \n”; • prints 305 yards today • Matching parentheses $s =~ /((\d+)(\w+)(\w+))/; • $1 305 yards today • $2 305 • $3 yards • $4 today

  18. Split with a pattern $s = “Betty, Bert, Bart, Bartholomew” @names = split /, /, $s $s = “Betty:778:Bert:222:Bart:43297:Bartholomew” $s =~ /:\d+:/ • $1 = Betty $2-Bert $3=Bart $4=Bartholomew

  19. Substitutions $x = “no more apples!”; $x=~ s /apples/applets/;  $x changed to “no more applets!” $x = “12034005”; $x =~ s/0//g; $x changes $x to “12345” • g modifier changes every occurrence

  20. Translating characters • tr /search-list/replacement-list/ • tr /a-z/A-Z/; replaces all LC to UC, returns number replaced • tr /\./\./; replaces all . with ., but returns number of replacements (so in effect counts) $s = “Hello”; $s =~ tr /a-z/A-Z/; changes to HELLO, returns 4 (or true)

More Related