1 / 10

Regular Expressions in Perl Part I

Learn about the simplest regular expressions, word matching, metacharacters, escape sequences, anchors, character classes, and alternation in Perl. Enhance your text processing skills!

joanc
Télécharger la présentation

Regular Expressions in Perl Part I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Expressions in Perl Part I by Ayush Gupta

  2. Simplest Regular Expression • The simplest regexp is simply a word, or more generally, a string of characters. • A regexp consisting of a word matches any string that contains that word: "Hello World" =~ /World/; # matches • The operator =~ associates the string with the regexp match and produces a true value if the regexp matched, or false if the regexp did not match.

  3. Simple Word Matching • In the case in previous slide, “World” matches the second word in "Hello World", so the expression is true. Expressions like this are useful in conditionals: if ("Hello World" =~ /World/) { print "It matches\n"; } else { print "It doesn't match\n"; } • The literal string in the regexp can be replaced by variable, like, $greeting = "World"; • The if statement would then beif ("Hello World" =~ /$greeting/)

  4. Simple Word Matching (cont’d) • If you're matching against the special default variable $_, the $_ =~ part can be omitted: $_ = "Hello World"; if (/World/) { print "It matches\n"; } else { print "It doesn't match\n"; } • The // default delimiters for a match can be changed to arbitrary delimiters by putting an 'm' out front:"Hello World" =~ m!World!; # matches, delimited by '!'

  5. Match or No Match? • "Hello World" =~ /world/; It doesn't match because regexps are case-sensitive • "Hello World" =~ /o W/; # matches • "Hello World" =~ /oW/; It doesn't match because of a lack of a space character • "Hello World" =~ /World /; It doesn't match because there is a space at the end of the regexp, but not at the end of the string • Regular expressions must match a part of the string exactly in order for the statement to be true.

  6. Metacharacters • These characters are reserved for use in regexp notation. • The metacharacters are {}[]()^$.|*+?\ • A metacharacter can be matched by putting a backslash before it:"2+2=4" =~ /2+2/; # doesn't match, + is a metacharacter "2+2=4" =~ /2\+2/; # matches, \+ treated like an ordinary + • The backslash character '\' is a metacharacter itself and needs to be back slashed: 'C:\WIN32' =~ /C:\\WIN/; # matches

  7. Escape Sequences • In addition to the metacharacters, there are some ASCII characters which don't have printable character equivalents. • Common examples are \t for a tab, \n for a newline, \r for a carriage return and \a for a bell."1000\t2000" =~ m(0\t2) # matches "1000\n2000" =~ /0\n20/ # matches "1000\t2000" =~ /\000\t2/ # doesn't match, "0" ne "\000"

  8. AnchorMetacharacters • ^ and $ are the anchor metacharacters. • The anchor ^ means match at the beginning of the string • The anchor $ means match at the end of the string, or before a newline at the end of the string. "housekeeper" =~ /keeper/; # matches "housekeeper" =~ /^keeper/; # doesn't match "housekeeper" =~ /keeper$/; # matches "housekeeper\n" =~ /keeper$/; # matches • When both ^ and $ are used at the same time, the regexp has to match both the beginning and the end of the string, i.e., the regexp matches the whole string.

  9. Character Class • A character class allows a set of possible characters, rather than just a single character, to match at a particular point in a regexp. • Character classes are denoted by brackets [...], with the set of characters to be possibly matched inside. /[bcr]at/; # matches 'bat, 'cat', or 'rat‘ • The special character '-' acts as a range operator within character classes, like, [0123456789] becomes [0-9] • The special character ^ in the first position of a character class denotes a negated character class, which matches any character but those in the brackets.

  10. AlternationMetacharacter “|” • Enables our regexp to be able to match different possible words or character strings. • To match dog or cat, we form the regexp dog | cat. "cats and dogs" =~ /cat|dog|bird/; # matches "cat" "cats and dogs" =~ /dog|cat|bird/; # matches "cat” • Even though dog is the first alternative in the second regexp, cat is able to match earlier in the string.

More Related