Download
using regular expressions n.
Skip this Video
Loading SlideShow in 5 Seconds..
Using regular expressions PowerPoint Presentation
Download Presentation
Using regular expressions

Using regular expressions

131 Vues Download Presentation
Télécharger la présentation

Using regular expressions

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Using regular expressions • Search for a single occurrence of a specific string. • Search for all occurrences of a string. • Approximate string matching.

  2. Forming RegExps • Strings • Variables • Patterns

  3. Strings and Variables • /Joey Ramone/ - match a specific string. • /$name/, where $name = “Joey Ramone” - match the string stored in a variable. • /Joey $name/ - matching a pattern defined by a mixture of strings and variables.

  4. Character classes • abc – match “abc” • . – match any single character (i.e. a.b). • [abc] – match “a” or “b” or “c” • [0123456789] – match “0” or “1” or …or “9” • [0-9] – same as previous • [a-z] – match “a” or “b” or …or “z” • [A-Z] – same as previous only with caps • [] – match any single occurrence of any of the characters found within. • [0-9a-zA-Z-] – match any alphanumeric or the minus sign

  5. Negated character classes • [^0-9] – match any single character that is not a numeric digit • [^aeiouAEIOU] – match any single character that is not a vowel • Works only for single characters • We’ll discuss matching negated strings of characters later.

  6. Escape characters • \ - use the backslash to match any special character as the character itself. • /\$name/ - match the literal string “$name”. • /a\.b/ - match the literal string “a.b” rather than “a” followed by any character, followed by “b”.

  7. Convenience character classes • \d (a digit) - [0-9] • \D (digits, not!) - [^0-9] • \w (word char) - [a-zA-Z0-9_] • \W (words, not!) - [^a-zA-Z0-9_] • \s (space char) - [ \r\t\n\f] • \S (space, not!) - [^ \r\t\n\f]

  8. Sequences • + - one or more of preceding pattern • /[a-zA-Z]+/ (match a string of alpha characters such as a name). • ? (match zero or one instance of preceding character). • /[a-zA-Z]+-?[a-zA-Z]+ (Now we can match hyphenated names).

  9. Sequences • * (match zero or more of preceding pattern) • Example – list of names: • George Harrison • Paul McCartney • Richard “Ringo” Starkey • John Winston Lennon • /[a-zA-Z]+ [a-zA-Z]+/ (match first and last name) • /[a-zA-Z]+ [a-zA-Z\”]* [a-zA-Z]+/ (match first name, middle name, if it exists, and last name)

  10. Sequences • {k} – match k instances of preceding pattern. • Example: floating point numbers to 2 decimal places • /[0-9]+\.[0-9]{2} • {k,j} – match at least k instances of preceding pattern, but no more than j. • Example: floating point numbers that may or may not have a decimal component. • /[0-9]+\.?[0-9]{0,2}/

  11. Grouping • /(John|Paul|George|Ringo)/ – matches any one of either “John”, “Paul”, “George”, or “Ringo” • /((John|Paul|George|Ringo) )+/ • Matches the Beatles names listed in any order. • John Paul George Ringo • Paul George John Ringo • Ringo Paul George John • Actually, this will also match: • Paul Paul Paul Paul Paul Paul Paul Paul Paul • Be careful about what assumptions you make.

  12. Problem • Write a regular expression that will match social security number. • Format: 555-55-5555

  13. A solution • /[0-9]{3}-[0-9]{2}-[0-9]{4}/

  14. Problem • Write a regular expression that will match a phone number. • Formats • 319-337-3663 • 319.337.3663

  15. A solution • /[0-9]{3}[\.-][0-9]{3}[\.-][0-9]{4}

  16. Add another format • 3193373663

  17. A solution • /[0-9]{3}[\.-]?[0-9]{3}[\.-]?[0-9]{4}/

  18. Problem • Write a regular expression that will match an email address. • Legal characters for names are: • Letters, numbers, “-”, and “_” • Legal characters for domain names are: • Letters only • Assume form: username@machine.domain.suffix

  19. A solution • /[a-z0-9-_]+\@[a-z]+(\.[a-z]+){2}/ • More general version: /[a-z0-9-_]+\@[a-z]+(\.[a-z]+)+/

  20. Problem • Write a regular expression that will match an HTML anchor start tag. • Assume anchor tag is of the form: • <a href=“some url”>some anchor text</a>

  21. A solution • /<a href=“[^”]+”>/ • Actually, quotes are not required • So it should be: • /<a href=“?[^”>]+”?>/ • How would we assign the url to a variable?

  22. A solution • ($url) = ($htmlText =~ m/<a href=“?[^”>]”?>/);

  23. Take Away • There is almost always a pattern that will match what you want it to match. • The best way to learn is to simply jump in and start writing your own patterns. • If you have a question about how to construct one, feel free to ask me. • One typically learns Perl by asking people with more experience.