1 / 15

Regular expressions step by step

Regular expressions step by step. Tamás Váradi varadi@nytud.hu. What are they?. Regular expressions (regexp) define a pattern, which may match a whole series of strings Powerful, compact, fast Useful for all sorts of text processing tasks. Where can I use them?.

meena
Télécharger la présentation

Regular expressions step by step

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular expressions step by step Tamás Váradi varadi@nytud.hu BTANT129 w6

  2. What are they? • Regular expressions (regexp) define a pattern, which may match a whole series of strings • Powerful, compact, fast • Useful for all sorts of text processing tasks BTANT129 w6

  3. Where can I use them? • In text editors/word processors (even in Ms Word to some extent!) like: • Textpad, EditPad Pro (to name but two) • Special programs to search a set of files: • grep, egrep, sed (free) • powergrep • Visual REGEXP • In programming languages • Perl, Python and other so-called script languages BTANT129 w6

  4. What about INTEX? • Yes, INTEX has a built-in regexp facility • But it is a little limited and peculiar (INTEX offers graphs as an alternative) • In this lecture, we are going to cover regular expressions as used in the text processing tools mentioned above BTANT129 w6

  5. Is there a standard variety? • More or less • There are variants that differ in • notation • features (expressive power, elegance etc) • Here we'll concentrate on what you can expect regular expressions to do BTANT129 w6

  6. First things first • Any character will match itself • Except characters with a special meaning (metacharacters): \ | ( ) [ { ^ $ * + ? . < > • The pattern is applied from top to bottom left to right, as if a sliding window onto the text BTANT129 w6

  7. Special characters • . will match any one character • ? will match the preceding character zero or once (at most once) • + will match the preceding character one or any number of times (at least once) • * will match the preceding character zero or any number of times • {n,m} BTANT129 w6

  8. Examples • .at matches bat, cat, fat, pat, rat • c*at matches at and cat and ccat, cccat etc. • guess what c* will match and why? • c+at matches cat and ccat, cccat etc. but not at • c?at matches at and cat, BTANT129 w6

  9. Anchor points • A regexp is matched against the text at any point where the first char of the regexp matches a char in the target text – a sliding window • matching is done line-by line by default • ^ : match at the beginning • $ : match at the end BTANT129 w6

  10. Groups and alternations • (bla)* • Sir|Madam BTANT129 w6

  11. Character classes • [aeiou] matches one of the set • [^aeiou] matches any other char except one in the set • [a-zA-Z0-9] consecutive characters can be referred to with a range • Note: whatever the length of the set, it always represents a single character in the pattern – so it's a single character alternation ('or' relation between characters BTANT129 w6

  12. Extended features • \d a digit • \D a non-digit • \s a space, tab, linefeed, newline • \S a non-whitespace • \w a word-character • \W a non-wordcharacter • \b word-boundary • \n a newline • \t a tabulator BTANT129 w6

  13. Longest vs. shortest match • When using quantifiers with non-literal characters (".","\w","\S" etc.) one can easily get unintended matches • .+ longest match (default) • .+? shortest match BTANT129 w6

  14. The escape character • Problem:What if we want to find characters that are special metacharacters for regexp(\ | ( ) [ { ^ $ * + ? . < >) • Solution:They have to be preceded by "\" to strip them of their special value e.g.: • \( \$ \[ \? etc. BTANT129 w6

  15. Things to do • Look up the tutorial athttp://www.zvon.org/other/PerlTutorial/Output/contents.html • Download one of the toolsVisualRegexp, Prowergrep,EditPad Proand experiment with texts • Follow the tutorial of EditPad Pro, which you can find in its Help BTANT129 w6

More Related