1 / 35

Perl

Perl. Perl. Perl - Practical extraction report language for text files system management combines C, SED, AWK, SH interpreted dynamic. Data Structures. scalars $num arrays @num associative arrays %num $num[50] 50th element of the array num $#num last index of num. Examples.

nieve
Télécharger la présentation

Perl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perl

  2. Perl • Perl - Practical extraction report language • for text files • system management • combines C, SED, AWK, SH • interpreted • dynamic

  3. Data Structures • scalars $num • arrays @num • associative arrays %num • $num[50] • 50th element of the array num • $#num • last index of num

  4. Examples #! /usr/local/bin/perl -w # find the sum of a list of numbers from STDIN # one number per line $sum = 0; while( <STDIN> ) { $sum += int $_; } print "the sum is $sum\n";

  5. Examples #!/usr/bin/perl -w # find the sum of a list of numbers from STDIN # several numbers per line $sum = 0; while( <STDIN> ) { @nums = split; foreach (@nums) { $sum += int $_; } } print "the sum is $sum\n";

  6. Average #!/usr/bin/perl -w # find the average of a list of # numbers from STDIN # several numbers per line $sum = 0; $count = 0; while( <STDIN> ) { @nums = split; foreach (@nums) { $sum += int $_; $count++; } } print "the average is ", $sum/$count, "\n";

  7. median #!/usr/bin/perl -w # find the median of a list of number # from STDIN # several numbers per line @nums = (); while( <STDIN> ) { @nums = (@nums, split ); } @nums = sort @nums; if($#nums % 2) { $median = ($nums[($#nums - 1)/2] + $nums[($#nums + 1)/2])/2; } else { $median = $nums[$#nums/2]; } print "the median is $median\n";

  8. Output? #!/usr/bin/perl -w @stuff = ("one", "two", "three"); print @stuff, "\n"; $stuff = ("one", "two", "three"); print $stuff, "\n"; $stuff = @stuff; print $stuff, "\n"; onetwothree8 three 3

  9. Pattern Matching m// s/// Modifiers • i case-insensitive • m multiple lines • s single line • x extend

  10. Regular Expressions Code Meaning \w Alphanumeric Characters \W Non-Alphanumeric Characters \s White Space \S Non-White Space \d Digits \D Non-Digits \b Word Boundary \B Non-Word Boundary \A ^ At the Beginning of a String \Z $ At the End of a String . Match Any Single Character

  11. Regular Expressions * Zero or More Occurrences ? Zero or One Occurrence + One or More Occurrences { N } Exactly N Occurrences { N,M } Between N and M Occurrences .* <thingy> Greedy Match, up to the last thingy .*? <thingy> Non-Greedy Match, up to the first thingy [ set_of_things ] Match Any Item in the Set [ ^ set_of_things ] Does Not Match Anything in the Set ( some_expression ) Tag an Expression $1..$N Tagged Expressions used in Substitutions

  12. Rules • Rule 1 • The engine tries to match as far left as it can • Rule 2 • The regular expression is regarded as set of alternatives. Tries them left to right. (see page 61) • Rule 3 • Items that have choices match from left to right /x*y*/ • Rule 4 • Assertions • ^ $ \b \B \A \Z \G (?…) (?!…)

  13. Rules • Rule 5 • A quantified atom matches only if the atom itself matches some number of times allowed by the quantifier Maximal minimal {n,m} {n,m}? {n,} {n,}? At least n {n} {n}? Exactly n * *? 0 or more + +? 1 or more ? ?? 0 or 1

  14. Rules • Rule 6 • Each atom matches according to its type • (…) ==> grouping + storage $1, $2 • . matches any char except \n • […] groups • Special characters \a \n \r … • \1 \2 ... backreference to (…) • \033 octal char • \xf7 hex char • \cD control char • any other \ matches the char itself

  15. precedence • () (?: ) • Repetition • Sequence • | alteration

  16. How do you fix it? /(‘[^’]’*’)/

  17. Examples s/^([^ ]) +([^ ]+)/$2 $1/ /(\w+)\s*=\s*\1/ /.{40,}/ /^((\d+\.?\d*|\.\d+)$/ if (/Time: (..):(..):(..)/){ $hours = $1; $minutes = $2; $seconds = $3; }

  18. Default arguments • $_, @_, @ARGV, STDIN sub foo{ my $x = shift; # @_ default • in the main program @ARGV while($_ = shift) { if(/^-(.*)/){ process_optein($1); } else { process_file($_); } }

  19. Reading a stream open FIN, “myfile” or die; while (<FIN>){ # do something with $_ } foreach (<FIN>){ # do something with $_ } print sort <FIN>;

  20. Reading a stream # print a window @f = <FIN>; foreach ( 0..$#f ) { if[$[$_] =~ /\bShazam\b/){ $lo = ($_ > 0)? $_ -1 : $_; $hi = ($_ < $#f) )? $_ +1 : $_; print map{“$_: $f[$_]”} $lo .. $hi; } }

  21. Sorting • sort numerically sub numerically { $a <=> $b } @list = sort numerically (16, 1, 8, 2, 4, 32); or @list = sort { $a <=> $b } (16, 1, 8, 2, 4, 32); @list = sort{uc($a) cmp uc($b)} qw(this is a test); #reverse @list = sort { $b <=> $a } (16, 1, 8, 2, 4, 32);

  22. example #! /usr/bin/perl -w # This script will count the frequency of distinct words # in the file that is given as an argument. # Warning: Error checking is minimal! die "usage: $0 file\n" unless @ARGV; while(<>){ tr/A-Z/a-z/; # translate to lowercase @w = split(/[\W]+/,$_); # split into words foreach (@w){ $list{$_}++; # increment the counter } } foreach $key (sort {$list{$b} <=> $list{$a}} keys %list) { print $key, ' = ', $list{$key}, "\n"; }

  23. Tokenizing # tokenize an arithmetic expression while($_){ if(/^(\d+)/) { push @tok, ‘num’, $1; } elsif(/^([+\-\/*()])/) { push @tok, ‘punct’, $1; } elsif (/^([\d\D])/) { die “invalid char $1 in input”; } $_ = substr($_, length $1); } • substr slows things down • cut start of string

  24. Tokenizing 2 while(/ (\d+) | ([+\-\/*()]) | ([\d\D])/gx) { if($1 ne “”){ push @tok, ‘num’, $1; }elsif ($2 ne “”) { push @tok, ‘punct’, $2; }else { die “invalid char $3 in input”; } }

  25. Tokenizing 3 { if(/\G(\d+)/gc) { push @tok, ‘num’, $1; } elsif(/\G([+\-\/*()])/gc) { push @tok, ‘punct’, $1; } elsif (/\G([\d\D])/gc) { die “invalid char $1 in input”; }else{ last; } redo; }

  26. Use split for clarity ($a, $b, $c) = /^(\S+)\s+(\S+)\s+(\S+)/; ($a, $b, $c) = split /\s+/, $_; ($a, $b, $c) = split; Get the fifth field: ($a) = /[^:]*:[^:]*:[^:]*:[^:]*:([^:]*)/; or ($a) = /(?:[^:]*:){4}([^:]*)/; or ($a) = (split /:/)[4];

  27. unpac ps l F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 100 1216 30562 30561 7 0 2804 1768 rt_sig S pts/2 0:00 -tcsh 000 1216 30658 30562 10 0 2780 1080 - R pts/2 0:00 ps l chomp (@ps = `ps l`); shift @ps; for(@ps){ ($uid, $pid, $sz, $tt) = unpack '@3 A6 @9 A7 @30 A5 @52 A7', $_; print "$uid, $pid, $sz, $tt\n"; }

  28. Avoid regex for simple strings do_it() if $answer eq ‘yes’; do_it() if $answer =~ /^yes$/; do_it() if $answer =~ /yes/; do_it() if lc($answer) eq ‘yes’; do_it() if $answer =~ /^yes$/i;

  29. #!/usr/bin/perl # remove the comments from a C program $filename = shift or die "usage $0 filename\n"; open FIN, $filename or die "can't open file"; while (<FIN>){ for(split m!("(:?\\\W|.)*?"|/\*|\*/)!){ if($in_comment){ $in_comment = 0 if $_ eq "*/"; } else { if ($_ eq "/*") { $in_comment = 1; print " "; } else { print; } } } print "\n"; }

  30. References $a = 3.1416; $scalar_ref = \$a; $array_ref = \@a; $hash_ref = \%a; $array_el_ref = \$a[3]; $hash_el_ref = \$a{‘John’};

  31. Lists of Lists @LoL = ( [“fred”, “barney” ], [“george”, “jane”, “elroy” ], [“homer”, “marge”, “bart” ], ); print $LoL[2][2]; # prints “bart” $ref_to_LoL = [ [“fred”, “barney” ], [“george”, “jane”, “elroy” ], [“homer”, “marge”, “bart” ], ]; print $ref_to_LoL ->[2][2]; • Note: $LoL[2][2] implies $LoL[2]->[2]

  32. Grow your own while(<>){ @tmp = split; push @LoL, [ @tmp ]; }

  33. Hashes of Arrays %HoL = ( flinstones => [“fred”, “barney” ], jetsons => [“george”, “jane”, “elroy” ], simpsons => [“homer”, “marge”, “bart” ], ); • generation # reading from a file with format: # flistones: fred barney .. while(<>){ next unless s/^(.*?):\s*//; $HoL{$1} = [ split ]; } • or while($line = <>){ ($who, $rest) = split /:\s*/, 2; @fields = split ‘ ‘, $rest; $Hol{$who} = [ @fields ]; }

  34. Hashes of Arrays # calling a function for $group (flinstones, jetsons, simpsons) { %HoL($group) = [ get_family($group) ]; ); # append member to existing family push @{ $HoL{flinstones} }, “wilma”, “betty”; • access $HoL{flinstone}[0] = “fred”;

  35. Packages, Modules, and Object Classes

More Related