1 / 12

LING 388: Language and Computers

LING 388: Language and Computers. Sandiway Fong Lecture 4: 8/30. Today’s Lecture. Recap More on Perl and regexps Homework 1 due next Thursday my mailbox by midnight. Variables: always prefixed by $ e.g. $count , $i Assignment and arithmetic expressions: e.g. $count = 0;

jenski
Télécharger la présentation

LING 388: Language and Computers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 388: Language and Computers Sandiway Fong Lecture 4: 8/30

  2. Today’s Lecture • Recap • More on Perl and regexps • Homework 1 • due next Thursday • my mailbox by midnight

  3. Variables: always prefixed by $ e.g. $count, $i Assignment and arithmetic expressions: e.g. $count = 0; $count = $count + 1; $count++; (auto-increment) Arithmetic operators: + addition - subtraction * multiplication ** exponentiation / division Variables and strings: $i = “this”; $i = $i . “ moment”; . is the string concatentation operator Perl: recap

  4. Example: $i = 99; $j = 100; if ($j > $i) { print “$j greater than $i\n” } else { print “$j less than $i\n” } substitute gt for > and a surprising result obtains reason: string comparison proceeds character by character (left to right) and ASCII representation of 1 is 49 < 57 the representation of 9 Numeric comparisons: == equality != inequality < less than > greater than <= less than or equal >= greater than or equal String comparisons: eq equality ne inequality lt less than gt greater than le less than or equal ge greater than or equal Perl: recap

  5. Iteration: (while loop) $i = 10; while ($i>0) { $i-- } counts down $i: 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 (for loop) $max = 7; for ($i=0; $i <= $max; $i++) {... } counts up $i: 0, 1, 2, 3, 4, 5, 6, 7, 8 More Perl

  6. We have already seen how to incorporate regexp matching in a Perl program: open (F,$ARGV[0]) or die "$ARGV[0] not found!\n"; while (<F>) { print $_ if (/regexp/); } by default /regexp/ matches against the value of the variable $_ (filled by <F> ) We can also match against a variable of our own choosing using the =~ operator: $x = “this string”; if ($x =~ /^this/) { print “ok” } Matching is by default case sensitive: this can be changed using the modifier i /regexp/i Perl and regexps

  7. Multiple matches within a string can be made using the g modifier with a loop: $x = “the cat sat on the mat”; while ( $x =~ /the/ ) { print “match!\n” } goes into an infinite loop and keeps printing match! whereas: while ( $x =~ /the/g ) { print “match!\n” } prints match! twice Perl and regexps

  8. Grouping uses the metacharacters ( and ) to delimit a group inside a regexp, each group can be referenced using \1, \2, and so on... outside a regexp, each group is stored in a variable $1, $2, and so on... Example: doubled vowel ([aeiou])\1 matches heed and book but not head cf. [aeiou][aeiou] Perl and regexps

  9. Homework 1 • out today • due next Thursday • in my mailbox by midnight

  10. Homework 1 • Data: • text file wsj500.txt • download from course webpage • make sure the newlines are correct for your platform • 500 sentences from the Wall Street Journal (WSJ) part of the Penn Treebank • one sentence per line • words are separated by spaces, also punctuation

  11. Question 1 Write a Perl program to count the number of lines in a file and print the result Submit your program Demonstrate it works on the test file (copy the output of the cmd interpreter) Homework 1

  12. Question 2 Write a Perl program to count the number of words in wsj500.txt that satisfy the following criteria: there are two identical vowels in a row within the word, and the word also ends in (lowercase) s Question 3: modify your Perl program from Question 2 to print out what those words are Homework 1

More Related