80 likes | 259 Vues
Regular Expressions. The ultimate tool for textual analysis. What is regular expression?. Regular Expressions are strings that encode some patterns, according some rules
E N D
Regular Expressions The ultimate tool for textual analysis
What is regular expression? • Regular Expressions are strings that encode some patterns, according some rules • If you have a regular expression parsing and searching program, (Python provides one) you can search for all strings that match some patterns within some text
Rule 1 • All non-meta characters match themselves • Find "Ishmael" in Moby Dick • Click on show context
Rule 2 • [] or | allows you match any of a number of characters [abc] or a|b|c matches a, or b, or c • Try to find all occurrences of "this". It may appear at the beginning of a sentence.
Rule 3 • \w matches all alphanumerical characters • \W matches all non-alphanumerical characters • * matches zero or more occurrences of the preceding character ab* matches a, or ab, or abb, or abbb, or … • Try find approximately all adverbs (words ending with –ly) in Moby Dick. Note that you should not find flying.
Rule 4 • dot (.) matches all characters except new lines (\n). a.b matches aab, or abb, or acb, or abd, or a-b, or a#b, or… • You are Tasking to solve a crossword puzzle, and you've come to the following: C A ? T H E A • What is the missing letter?
Rule 5 • * matches zero or more occurrences of the preceding character • + matches one or more occurrences of the preceding character • ? matches zero or one occurrences of the preceding character
Rule 6 • () groups characters together so that they act as one when working with *, +, ? • In the search result, you can also ask to show a chosen group • Find the words modified by your adverbs (i.e., the word just after the adverb). put the adverb and any space, punctuation in one group and the modified word in another. Turn on and off 'show groups' checkbox to see what it does.