1 / 8

Regular Expressions

Regular Expressions. The ultimate tool for textual analysis. What is regular expression?. Regular Expressions are strings that encode some patterns, according some rules

berne
Télécharger la présentation

Regular Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Expressions The ultimate tool for textual analysis

  2. What is regular expression? • Regular Expressions are strings that encode some patterns, according some rules • If you have a regular expression parsing and searching program, (Python provides one) you can search for all strings that match some patterns within some text

  3. Rule 1 • All non-meta characters match themselves • Find "Ishmael" in Moby Dick • Click on show context

  4. Rule 2 • [] or | allows you match any of a number of characters [abc] or a|b|c matches a, or b, or c • Try to find all occurrences of "this". It may appear at the beginning of a sentence.

  5. Rule 3 • \w matches all alphanumerical characters • \W matches all non-alphanumerical characters • * matches zero or more occurrences of the preceding character ab* matches a, or ab, or abb, or abbb, or … • Try find approximately all adverbs (words ending with –ly) in Moby Dick. Note that you should not find flying.

  6. Rule 4 • dot (.) matches all characters except new lines (\n). a.b matches aab, or abb, or acb, or abd, or a-b, or a#b, or… • You are Tasking to solve a crossword puzzle, and you've come to the following: C A ? T H E A • What is the missing letter?

  7. Rule 5 • * matches zero or more occurrences of the preceding character • + matches one or more occurrences of the preceding character • ? matches zero or one occurrences of the preceding character

  8. Rule 6 • () groups characters together so that they act as one when working with *, +, ? • In the search result, you can also ask to show a chosen group • Find the words modified by your adverbs (i.e., the word just after the adverb). put the adverb and any space, punctuation in one group and the modified word in another. Turn on and off 'show groups' checkbox to see what it does.

More Related