1 / 29

Perl

Perl. Regular expression: string manipulation. substr function. string = substr(string2,start pos (starts with 0), offset) returns a substring after the start point to offset string2 is not changed $str2 = "Hi There"; $str = substr($str2, 3, 2);

amora
Télécharger la présentation

Perl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perl Regular expression: string manipulation

  2. substr function • string = substr(string2,start pos (starts with 0), offset) • returns a substring after the start point to offset • string2 is not changed • $str2 = "Hi There"; • $str = substr($str2, 3, 2); • $str = "Th"; # from 4 position to 5 position; • substr(string,start pos, offset) = string2 • puts string2 after the start pos and removing old string characters to offset. • $str2 = "Hi There"; $str = "hi"; • substr($str2, 3,3) = $str; #insert and replace • $str2 = "Hi hire"; • substr($str2, 3,0) = $str; #insert only. • $str2 = "Hi hihire";

  3. index and rindex • index string, substring [, offset] • returns the position before the substring in string, else -1 • with offset, position after the offset, else -1 • rindex string, substring [, offset] • return the last occurrence of the substring, else -1 • with offset, the right most position that may be returned. • $pos = index $str, $str2 • returns the position where $str2 is found in $str

  4. example of substr and index • $str = "There there Jim"; • $sstr = "Jim"; • $replace = "Fred"; • substr($str,(index $str,$sstr),3)= $replace; • replace Jim with Fred in $str • $str = "There there Fred"; • The substitution operator is an easier way to do this.

  5. grep • LIST = grep EXPR, LIST • LIST = grep BLOCK LIST • like map, each element is assigned to the $_, then processed by BLOCK or EXPR, results are put into the list. @new = grep /[a-zA-Z]/, @lines • NOTE: altering $_ will alter the original list @list = qw(barney fred dino wilma) @greplist = grep {s/^[bfd]//} @list • @greplist = "arney", "red", "ino" • @list = "arney", "red", "ino", "wilma"

  6. s/// Operator (Substitution) • $str =~ s/pattern to match/replacement/; • find the first match and replace it • $str =~ s/pattern to match/replacement/g; • Find all matches and replace each of them. • Simple substitution • $str = "3 dogs bit 1 dog"; • $str =~ s/dog/cat/; • $str = "3 cats bit 1 dog"; • $str =~ s/dog/cat/g; • $str = "3 cats bit 1 cat";

  7. s/// Operator (Substitution) (2) • s/pattern//; • remove the pattern found • $str = "abad"; • s/a//g; • $str ="bd"; • From substr and index slide $str =~ s/$sstr/$replace/; OR $str =~ s/Jim/Fred/;

  8. case insensitive substitution • /i ignore case • $str = "Dog, dog, dOg"; • s/DOG/cat/ig; • $str = "cat, cat, cat"; • $str = "Dog, dog, dOg"; • s/DOG/cAt/ig; • $str = "cAt, cAt, cAt"; • The replacement string is replaced as written.

  9. examples • $str = "fred xxx barney"; • $str =~ s/x/boom/; • $str = "fred boomxx barney" • $str =~ s/x/boom/g; • $str = "fred boomboomboom barney"; • $str =~ s/x+/boom/; • $str = "fred boom barney";

  10. alternation and group matching • | allows an or'd matching • $str = "Wilma Flintstone"; • $str =~ s/Fred|Wilma|Pebbles/Dino/g; • $str = "Dino Flintstone"; • Replace all instances of Fred or Wilma or Pebbles with Dino. • $str = "1st time winner"; • $str =~ s/(1st|2nd|3rd) time/Last place/; • $1 is the match, “1st” Entire match is “1st time” • $str = "Last place winner"

  11. single character substitution • Using [] • $str =~ s/[abc]/d/; #sub a, b, or c with d • $str =~ s/[Fred]/x/g; • If $str was "Fred", after it would be "xxxx" • $str =~ s/[^aeiouAEIOU]/_/g; • replace any non-vowel with an _ • Common mistake: • $str =~ s/[a-z]/[A-Z]/g; • Should replaces any lower case letter with upper case letters but replace side is literal (not a pattern) • if $str = "hi", then it would be "[A-Z][A-Z]"; • NOTE: $str = uc $str; #upper cases a string.

  12. matching quantifiers • $str =~ s/a{3}/b/; • first instance of aaa is replace with b • $str = "aaaaa"; # use this for the rest of the slide • $str =~ s/a{3,}/b/; #max matching • $str = "b" • $str =~ s/a{3,}?/b/; #min matching • $str = "baa"; #only sub 3 to make a min match • $str =~ s/(a{3,}?)(a*)/b/; • $str = "b"; $1 = "aaa"; $2 = "aa"; • $str =~ s/(a{3,})(a*)/b/; • $str = "b"; $1 = "aaaaa"; $2 = ""; • $str =~ s/(a{3,}?)(a*?)/b/;# min match on both • $str = "baa"; $1 = "aaa"; $2 = "";

  13. matching quantifiers (2) • $str = "aaaaab"; # use this for the rest of the slide • $str =~ s/a{3,}?b/c/; • $str = "c", why? in order to make the match, it used all the a's to include the b. • + 1 or more and ? 0 or 1 time (max match) • $str =~ s/(a+)(b?)/c/; • $str = "c", $1 = "aaaaa" and $2 = "b" • $str =~ s/(a+?)(b??)/c/; #min match • $str = "caaaab"; $1 ="a"; $2 = "";

  14. matching quantifiers (3) • Example and perl doesn’t always do what you think. • $str = "ddogg"; • $str =~ s/d.*g/cat/; • $str = "cat" # max match, makes sense • $str = "ddogg"; • $str =~ s/d.*?g/cat/; • $str = "catg"; #min match, but not the best min match it can make.

  15. matching quantifiers (4) • More Examples (with $_ variable) $_ = "a xxx c xxxxx c xxx d"; • s/x{1,}/d/g; produces "a d c d c d d" • s/x{1,}?/d/g; produces "a ddd c ddddd c ddd d" • s/x{1,2}/d/g; prodcues "a dd c ddd c dd d" • s/x{1,3}/d/g; produces "a d c dd c d d" • s/x{2,2}/d/g; produces "a dx c ddx c dx d" • or s/x{2}/d/g;

  16. Anchoring • $str = "Fred Flintstone Fred" • $str =~ s/Fred/Wilma/g; • Replaces all instances of Fred with Wilma • $str =~ s/Fred$/Wilma/g; • Only the last instance, "Fred Flintstone Wilma", even with /g flag • $str =~ s/^Fred/Wilma/g; • only the first instance, "Wilma Flintstone Fred", even with the /g flag • $str = "abcd"; • $str =~ s/^[abc]+/d/; • $str = "dd";

  17. Parentheses as memory • s/a(.)b(.)c\2d\1/a mess/; • "adbecedd" is converted to "a mess" • "adbecdde" is not converted. • s/a(.*)b\1c/a mess/; • "addbddc" changes to "a mess" • "adddbddc" is not changed • To kept the pattern found use \1 ..\9 in replacement • s/a(.*)b\1c/What is this: \1/; • "addbddc" converted to "What is this: dd" • again $1 = "dd"

  18. metasymbols • a very common substitution • s/\s+/ /g; # replace all whitespace with single space. • " a b\t c" changes to " a b c" • remove word character duplicates • $str = "11aabbdccaa"; • $str =~ s/(\w)\1/\1/g; • $str = "1abcda" • Remove any duplicates • $str = "11 ,,aa" • $str =~ s/(.)\1/\1/g; • $str ="1 ,a"

  19. Metasymbols (2) • \U Upper case until \E and \L lower case until \E • Example • s/a(.*)b\1c/What is this: \U\1\E/; • "addbddc" converted to "What is this: DD" • s/a(.*)b\1c/What is this: \L\1\E/; • "addbddc" converted to "What is this: dd" • \Q …\E stop regex characters in between

  20. Exercise 10 • What is the outcome of the following substitutions? Use $_ = "ad dog cd" • s/dog//; • while (/ /) { s/ / /g;} • s/(\w+)\s+(\w+)/$2 $1/g; • s/(.+)d/Dd/g; • s/(.+?)d/Dd/g; • s/(\S+)/=\1=/g; • Write a substitution to change each vowel to an X.

  21. s/// flags • like the match operator • /m let ^ and $ match next to embedded \n • /s let . match newline • /x ignore whitespace and permit comments • s/// flags only • /g replace globally, ie all occurrences • /e evaluate the right side as an expression • in other words, perl interprets the right side as perl code, where you have return value

  22. /e flag • s/(\d+)/sprintf("%#x",$1)/ge; • covert all numbers to hex • "2581" would converted to "0xb23" • return to the leap year with a trinary operator s/(\d+)/ $1 % 4 ? "$1 (not a leap year)" : $1 % 100 ? "$1 (a leap year)" : $1 % 400 ? "$1 (not a leap year)" : "$1 (a leap year)" /gxe • "2000" changed to "2000 (a leap year)"

  23. tr/// Operator (Transliteration) • same as sed, can as use y/// instead of tr/// • DOES NOT use pattern matching, instead it scans character by character and replaces each occurrence of a character with a replacement • tr/SEARCHLIST/REPLACEMENTLIST/cds; • Example: • $str = "AABBCCDDEE"; • $str =~ tr/ABC/XYZ/; • $str = "XXYYZZDDEE"; • $str =~ tr/DE/!/; #if the replacement list is too short, uses the last one as many times as needed. • $str = "XXYYZZ!!!!";

  24. tr/// Operator (Transliteration) (2) • Duplicates in the Searchlist are ignored • $str = "AABBCCDDEE"; • $str =~ tr/AAB/xyz/; • $str = "xxzzCCDDEE"; • /c means letters not in the Searchlist • $str = "AABBCCDDEE"; • $str =~ tr/ABC/x/c; • $str = "AABBCCxxxx";

  25. tr/// Operator (Transliteration) (3) • /d delete found, but non-replaced characters • Changes tr, so if your replacement list is short, those characters are removed • $str = "AABBCCDDEE"; • $str =~ tr/ABC/xy/d; • $str = "xxyyDDEE"; • $str =~ tr/DE//d; • $str = "xxyy";

  26. tr/// Operator (Transliteration) (4) • /s removes duplicates in replaced characters • $str = "AABBCCDDEE"; • $str =~ tr/ABC/xyz/s; • $str ="xyzDDEE"; • tr/// returns the number of characters found/replaced. • $count = ($str =~ tr/ABC/xyz/); • $count = 6; $str = "xxyyzzDDEE"; • $count = ($str =~ tr/ABC//); • $count = 6; $str = "AABBCCDDEE"; • No replacement list, so it just counted them and made no replacements. Note s/// would have removed them.

  27. More tr/// Examples • $str = "AABBCCDDEE"; • $str =~ tr/D//d; #delete found characters • $str = "AABBCCEE"; • $str = "AABBCCDDEE"; • $str =~ tr/ABD/xy/ds; #delete D, sub A for x and B for y and remove duplicates replacements • $str = "xyCCEE"; • $str =~ tr/a-zA-Z//dc; • remove any non letters from $str. • $str =~ tr/A-Za-z/N-ZA-Mn-za-m/; • rotate the characters by 13 letters for simple encryption.

  28. Exercise 11 • What is the outcome of the following transliteration? Use $_ = "fred and barney" • tr/abcde/ABCDE/; • tr/a-z/ABCDE/d; • $count = tr/a-z/A-Z/; • tr/a-z/_/c; • tr/a-m/X/s; • tr/aeiou/X/cs; • $count = tr/aeiou//c; • Change the letters bdr to X and count the number of changes.

  29. Q A &

More Related