1 / 37

A Stroll through Perl

A Stroll through Perl. (R L Schwartz & T Christiansen, O’Reilly) PERL = Practical Extraction and Report Language. A major strength of Perl is the recognition and substitution of text sequences called regular expressions. This is useful for:

derry
Télécharger la présentation

A Stroll through Perl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Stroll through Perl • (R L Schwartz & T Christiansen, O’Reilly) • PERL = Practical Extraction and Report Language. • A major strength of Perl is the recognition and substitution of text sequences called regular expressions. • This is useful for: • Web searching - are the query keywords in this web page? • Computation of frequencies in a document collection, e.g. to produce a stoplist, or mid-frequency terms for automatic indexing. • Making finite state transducers e.g. pluraliser, stemmer, americanizer. • Dialogue systems, e.g. ELIZA.

  2. “Hello World” Program • #!/usr/bin/perl -w • print “Hello, world!\n”; • The first line means “this is a Perl program”. -w tells Perl to generate warning messages. • Apart from the first line, all Perl statements end with a semicolon ; • To run a PERL program from UNIX: • perl programname.pl • comments: • # anything from the hash sign to the end of the line is a comment

  3. Scalar Variables • Now get the “Hello, world” program to call you by your name. To do this, we need a place to hold the name, a way to ask for the name, and a way to get a response. • One place to hold values (like a name) is as a scalar variable. Here we will use the scalar variable $name to hold your name. A scalar variable starts with $ and can hold either a single number or a string (sequence of characters).

  4. print, <STDIN>, chomp • The program needs to ask for the name (prompt): use the print function. • The way to get a line from the terminal is with the <STDIN> construct, which grabs one line of input. We assign this input to the $name variable. This gives us the program: • print “What is your name?”; • $name = <STDIN>; • The value of $name has a terminating newline \n. To get rid of that, we use the chomp function • chomp ($name); • Now we can reply with: • print “Hello, $name!\n”; • (what does this do?)

  5. Putting it all together we get: • #!/usr/bin/perl -w • print “What is your name?”; • $name = <STDIN>; • chomp ($name); • print “Hello, $name!\n”;

  6. Adding Choices • Let’s say we have a special greeting for Randal, but we want an ordinary greeting for anyone else. To do this, we need to compare the name that was entered with the string Randal, and if it’s the same, do something special. Let’s add a C-like if-then-else branch and a comparison to the program: • #!/usr/bin/perl -w • print “What is your name?”; • $name = <STDIN>; • chomp ($name); • if ($name eq “Randal”){ • print “Hello Sir Randal!\n”; • } • else { • print “Hello, $name!\n”; • }

  7. Guessing the Secret Password • What does this code do? • #/usr/bin/perl -w • $secretword = “llama”; # the secret word • print “What is the secret password?”; • $guess = <STDIN>; • chomp($guess); • while ($guess ne $secretword) { • print “Wrong, try again:\n”; • $guess = <STDIN>; • chomp($guess); • } • First, we define the secret word by putting it into another scalar variable, $secretword. The person is asked (using print) for a guess, which goes into $guess. The guess is compared with the secret word using the ne operator, which returns true if the strings are not equal (this is the logical opposite of the eq operator). The result of the comparison controls a while loop, which executes the block as long as the ne comparison remains true.

  8. Arrays • . • We can store several secret words in sort of list, a data structure called an array. Each element of the array is a separate scalar variable that can be independently set or accessed. The entire array can also be given a value in one fell swoop. We can assign a value to the entire array named @words so that it contains three possible good passwords. • @words = (“camel”,”llama”,”alpaca”); • or • @words = qw(camel llama alpaca) • Note arrays begin with @, while scalar variables begin with $. • Once the array is assigned, we can access each element using a subscript reference. So $words[0] is camel, $words[1] is llama, and $words[2] is alpaca. The subscript can be an expression as well, so if we set $i = 2 then $words[$i] = alpaca. • Note: array elements start with $ rather than @ because they refer to a single element of an array rather than the whole array.

  9. More than one Secret Word • #/usr/bin/perl -w • @secretword = qw (camel llama alpaca); • print “What is the secret password?”; • $guess = <STDIN>; • chomp($guess); • $i = 0; • $correct = “maybe”; • while($correct eq “maybe”){ • if($words[$i] eq $guess){ • $correct = “yes”; • } • elsif ($i < 2){ • $i = $i + 1; • } • else { • print “Wrong, try again:”; • $guess = <STDIN>; • chomp ($guess); • $i = 0; • } • } • This program also shows the elsif block of the if-then-else statement. Perl doesn’t have C’s switch statement, so in Perl we tend to compare a set of conditions in a if-elsif-elsif-elsif-else type chain.

  10. Hashes • Giving each person a different secret word: • The easiest way to store such a table in Perl is with a hash. • Each element of the hash holds a separate scalar value (just like an array) but the hashes are referenced by a key, which can be any scalar value (string or number). • To create a hash called %words (notice the % rather than @) we can write: • %words = qw( • fred camel • barney llama • betty alpaca • wilma alpaca • ); • To find the secret word for Betty, we need to use betty as the key in a reference to the hash %words, via some expression such as • $words{“betty”} will return alpaca • or • $person = “betty”; • $words{$person} will also return alpaca.

  11. Trying to look up a word not in the hash • When we look up someone’s secret word, if their name is not one of the hash keys, the value of $secretword will be an empty string, e.g: • { instantiate %words, get $name first, then:} • $secretword = $words{$name} • if($secretword eq “”){ • print “secret word not found\n”; • } • else { • print “your secret word is $secretword”; • }

  12. Handling Varying Input Formats • How do we make our password checker accept Randal, randal, or • Randal L. Schwartz ? • If ($name =~ /^Randal\b/i) { • # yes, it matches • } • else { • # no, it doesn’t • } • Notes: eq is for exact equality, =~ for pattern matching. • The regular expression is delimited by forward slashes. • /^Randal/ means any string starting with Randal. • /^Randal\b/ means there must be a white space after Randal, so Randall is excluded. • /^Randal\b/i means that we ignore case, so randal is accepted.

  13. Two Text Converters • We can write a case converter by using the translate operator. • $name = tr/A-Z/a-z/; • The slashes delimit the searched-for and replacement character lists. The hyphen stands for all the characters between A and Z, so the two lists are the same length (26 characters). • We can replace the word Eurasia with Eastasia using the substitution operator. • $temp =~ s/Eastasia/XXXX/; • $enemy =~ s/Eurasia/Eastasia/; • $ally =~ s/XXXX/Eurasia/;

  14. Making it Modular • Perl provides subroutines that have parameters and return values. A subroutine is defined once in a program, and can be used repeatedly by being invoked from any expression. • Let’s create a subroutine called good_word that takes a name and a guessed word, and returns true if the word is correct and false if not: • sub good_word { • my($somename, $someguess) = @_; • # name the parameters • if ($words{$somename} eq $someguess { • return 1; # true • } • else { • return 0; # false • } • }

  15. Subroutines • First, the definition of a subroutine consists of a reserved word sub followed by the subroutine name followed by a block of code { delimited by curly braces }. The definition can go anywhere in the program file, though most people put it at the end. • The first line within this particular definition is an assignment that copies the values of the two parameters of this subroutine into two local variables named $somename and $someguess. • The my()defines the two variables as private to the enclosing block - in this case the whole subroutine - and the parameters are initially in a special local array called @_ • A return statement can be used to make the subroutine immediately return to its caller with the supplied value. • Note that the subroutine assumes that the value of the %words hash is set by the main program.

  16. Let’s Integrate this with the Rest of the Program • #!/usr/bin/perl • %words = qw{ • fred camel • barney llama • betty alpaca • wilma alpaca • }; • print “What is your name? “; • $name = <STDIN>; • chomp($name); • print “What is the secret word? “; • $guess = <STDIN>; • chomp($guess); • while (! good_word($name, $guess){ • print(“Wrong, try again: ”); • $guess = <STDIN>; • chomp($guess); • } • # insert definition of good_word here …

  17. While, ! • The while loop contains the subroutine good_word. Here we see an invocation of the subroutine, passing it two parameters, $name and $guess. Inside the subroutine, the value of $somename is set from the first parameter, $name, and the value of $someguess is set from the second parameter $guess. • The value returned by the subroutine (either 1 or 0) is logically inverted with the prefix ! (logical not) operator. This expression returns true is the expression following is false, and returns false if the expression following is true. The overall meaning is “while it’s not a good word …”

  18. Moving the Secret Word List into a separate file • Suppose we wanted to share the secret word list among three programs, e.g. for simultaneous updating. We can put the word list into a file and then read the file to get the word list into the program. To do this, we need to create an I/O channel called a filehandle. Your Perl program automatically gets three filehandles called STDIN, STDOUT and STDERR. Now we want another handle attached to a file of our own choice. • sub init_words { • open (WORDSLIST, “wordslist”) || die “can’t open wordlist: $!; while ( defined ($name = <WORDSLIST>)) { • chomp ($name); • $word = <WORDSLIST>; • chomp ($word); • $words{$name} = $word; • } • close (WORDSLIST) || die “couldn’t close wordlist: $!”; • }

  19. The (arbitrary) form of the word list • fred • camel • barney • llama • betty • alpaca • wilma • alpaca • The open function initialises a filehandle named WORDSLIST by associating it with a file named wordslist in the current directory. • while ( defined ($name = <WORDLIST>) ) { • i.e. while there are still values in the data file to read • The die function is frequently used to exit the program with an error message in case something goes wrong, e.g. the word list file is not found. $! contains the system error message explaining what went wrong.

  20. Three More Loops • 1. To print out scalar variables: • This example prints the numbers 1 to 10, each followed by a space: • for ($i = 1; $i <= 10; $i++){ • print “$i “; • } • The above code is very similar to C++. • 2. To print out the contents of an array: • foreach $i(@somelist) { • print “$somelist[$i]\n”; • } • The foreach statement takes a list of values and assigns them one at a time to a scalar variable, executing a block of code with each successive statement. • 3. To print out the contents of a hash: • foreach $key (keys(%freqhash)) { • print “$key $freqhash{$key}\n”; • }

  21. Regular Expressions • See Chapter 7 of “Learning Perl”, by R L Schwartz & T Christiansen, O’Reilly, 1993. • A regular expression is a pattern to be matched against a string. • e.g. is put found in computer? Succeeds • Is michael found in computer? Fails • Sometimes match success or failure is all you are concerned about. Other times you want to match andreplace. • e.g. Find put in computer and replace with pil. If the match is unsuccessful, nothing happens. • $_ is Perl’s default variable – we don’t have to declare it.

  22. Search, Substitution • Print out every line in the file specified on the command line which contains abc: • while (<>) { • if(/abc/){ • print $_; • } • } • Substitution. If abc is found in $_, replace it with def (g means every time). • s/abc/def/g;

  23. Patterns • A regular expression is a pattern. Some parts of the pattern match single characters, others match multiple characters. • . stands for any single character except \n (newline). • /a./ any two letter sequence that starts with a but is not a\n • /[abcde]/ matches a, b, c, d, or e. (“character class”) • /[a-zA-Z0-9_]/ matches a Perl “word” character. • /[^0-9]/ any NON-digit (“negated character class”) • character class abbreviations: • \d digit • \D non-digit • \w Perl “word”character • \W not a Perl “word” character • \s space character (\r \t \n \f or “ “) • All of the above match one character. We now look at “grouping patterns”: • * zero or more of the immediately previous character or character class. • + one or more of the immediately previous character • ? zero or one of the immediately previous character.

  24. Patterns are greedy by default • $_ = “fred xxxxxx barney”; • s/x+/boom/; • now $_ = “fred boom barney” • /x{3}/ would mean match against exactly xxx.

  25. Parentheses as memory, anchoring patterns, alternation • Parentheses as memory: • abc* matches ab, abc, abcc, abccc, abcccc etc. • (abc)* matches “”, abc, abcabc, abcabcabc etc. • Anchoring patterns: • /fred\b/; matches fred and alfred but not frederick • /\bfred/; matches fred and frederick but not alfred • /\bfred\b/; matches fred but not frederick and alfred. • Alternation: • (song|blue)bird matches songbird or bluebird

  26. Selecting a different target (the =~ operator) • $a = “hello world” • if($a =~ /he/) { • # do something … • $a =~ s/hello/goodbye/; • Special read-only variables • $_ = “this is a sample string”; • /sam.le/; # matches “sample” within the string • # $` is now “this is a” • # $& is now “sample” • # $’ is now “string” • More substitutions • $_ = “this is a test”; • $new = “quiz”; • s/test/$new/; # now $_ = “this is a quiz”

  27. Basic Data Structures • $scalar - single value or string • @array - list e.g. • @flintstones = qw(fred barney betty wilma); • $array[2] = “betty”; • foreach $member (@flintstones){ • print “$flintstones [$member]; • } • %hash, e.g. frequency list %freq built up by: • $freq{“the”} = 100; • $freq{“chandelier”} = 1; • $freq{$string} = 5; • foreach $key {keys (%freq)) { # once for each key of %freq • print “ $key was found $freq{$key} times\n”; # show key and value; • }

  28. Sorting: arrays • @x = qw(small medium large); • @y = sort @x; • Now @y is (large medium small). • @x = (15, 27, 9, 49, 14); • @y = sort @x; • Now @y is (14, 15, 27, 49, 9). • @x = (15, 27, 9, 49, 14); • @y = sort { $a <=> $b } @x; • Now @y is (9, 14, 15, 27, 49).

  29. Sorting: hashes • Sort by alphabetic order of keys, or numeric order of values • @sortedkeys = sort by_names keys(%freqhash); • sub by_names { • return $a cmp $b; • } • foreach (@sortedkeys) { • print “$_ is found $freqhash{$_}times\n”; • } • @sortedkeys = sort by_number keys(%freqhash); • sub by_number { • return $freqhash{$a} <=> $freqhash{$b}; • } • foreach (@sortedkeys) { • print “$_ is found $freqhash{$_}times\n”; • }

  30. Array of arrays (2D arrays) • @AoA = { • [ “fred”, “barney” ], • [ “george”, “jayne”, “elroy” ], • [ “homer”, “marge”, “bart” ], • }; • print $AoA[2][1]; # prints “marge” • for $x (0 .. 9) { • for $y (0 .. 9) { • $AoA[$x][$y] = x * y; • } • } • while (<>) { # read in a line of text • @tmp = split; # split elements into a 1D array • push @AoA, [@tmp]; # add 1D array as the next row of a 2D array • } • for $i (0 .. $#AoA) # for each row in AoA • $row = $AoA[$i]; # put row of 2D array into a 1D array - • # note $ subscript even so • for $j (0 .. $#{@row}) { # for each element of that 1D array print “element $i Sj is $AoA[$i][$j]\n”; • } • }

  31. Hashes of Hashes • %HoH = ( • flintstones => { • husband => “fred”, • pal => “barney”, • }, • jetsons => { • husband => “george”, • wife => “jane”, • “his boy” => “elroy”, • }, • simpsons => { • husband => “homer”, • wife => “marge”, • kid => “bart”, • }, • ); • To add another hash to the hash of hashes, you can simply say: • $HoH{ mash } = { • captain => “pierce”; • major => “burns”; • corporal => “radar”; • };

  32. Populating a Hash of Hashes • Here is one technique for populating a hash of hashes. To read from a file with the following format: • flintstones: husband=fred pal=barney wife=wilma pet=dino • while ( <> ) { • next unless s/^(.*?):\s//; #look for characters from start of line to colon • $who = $1; • # $1 is first parenthesised part of reg exp • for $field (split) { • #for each other Perl word in the input line • ($key, $value) = split /=/, $field • # cut each key=value pair at = • $HoH{$who}{$key} = $value; • } • }

  33. To set a key/value pair, and print out a hash of hashes • You can set a key/value pair of a hash of hashes as follows: • $HoH{flintstones}{wife} = “wilma”; • To print out the families, loop through all the keys of the outer hash nd then looping through the keys of the inner hash: • for $family ( keys %HoH) { • print “$family: “; • for $role (keys %{ $HoH{$family} } ) { • print “$role=$HoH{$family}{$role} “; • } • print “\n”; • }

  34. More advanced data structures • Also possible: Arrays of hashes, hashes of arrays, hashes of functions and more elaborate records. See chapter 9 of “Programming Perl” by Larry Wall, Tom Christiansen & Jon Orwant, O’Reilly, 3rd edition.

  35. ELIZA (1) • Substitutions may use memory • e.g. /the (.*)er they were, the \1er they will be/ • will match the bigger they were, the bigger they will be but not the bigger they were, the faster they will be. • Substitutions using memory are very useful in implementing a simple natural-language understanding program like ELIZA (Weizenbaum, 1966), which could carry on conversations like the following:

  36. ELIZA (2) • User: Men are all alike. • ELIZA: IN WHAT WAY • User: They’re always bugging us about something or other. • ELIZA: CAN YOU THINK OF A SPECIFIC EXAMPLE • User: Well, my boyfriend made me come here. • ELIZA: YOUR BOYFRIEND MADE YOU COME HERE • User: He says I’m depressed much of the time. • ELIZA: I AM SORRY TO HEAR THAT YOU ARE DEPRESSED. • ELIZA works by searching the user’s sentence for regular expressions and substituting them, e.g. s/my/YOUR/ and s/I’m/YOU ARE/, and then: • s/.* YOU ARE (depressed|sad) .*/I AM SORRY TO HEAR YOU ARE \1/ • s/.* always .*/CAN YOU THINK OF A SPECIFIC EXAMPLE/

  37. ELIZA (3) • ELIZA works by searching the user’s sentence for regular expressions and substituting them, e.g. s/my/YOUR/ and s/I’m/YOU ARE/, and then: • s/.* YOU ARE (depressed|sad) .*/I AM SORRY TO HEAR YOU ARE \1/ • s/.* always .*/CAN YOU THINK OF A SPECIFIC EXAMPLE/

More Related