200 likes | 305 Vues
Learn the ins and outs of regular expressions, from simple strings to special characters and modifiers. Understand components like literals, delimiters, and how to match patterns efficiently. Explore advanced techniques in pattern matching with examples and best practices.
E N D
Appendix A:Regular Expressions It’s All Greek to Me
Regular Expressions • A pattern that matches a set of one or more strings • May be a simple string, or contain wildcard characters or modifiers • Used by programs such as vim, grep, awk, and sed • Not the same as shell expansion
Components • Characters • Literals • Special Characters • Delimiters • Mark beginning end of regular expressions • Usually / • ’ (but not really)
Simple Strings • Contain no special characters • Matches only the string • Ex: /foo/ matches: • foo • tomfoolery • bar.foo.com
Special Characters • Can match multiple strings • Represent zero or more characters • Always match the longest possible string (we’ll see examples in a bit)
Periods • Matches any single character • Ex: /.ing/ • I was talking • bling • he called ingred • Ex: /spar.ing/ • sparring • sparking
Brackets • Define a character class • Match any one character in the class • If a carat (^) is first character in class, character class matches any character not in class • Other special characters in class lose meaning
Brackets con’t • Ex. /[jJ]ustin/ matches justin and Justin • Ex. /[A-Za-z]/ matches any letter • Ex. /[0-9]/ matches any number • Ex. /[^a-z]/ matches anything but lowercase letters
Asterisks • Zero or more occurrences of the previous character • So match any number of characters would be /.*/ • Ex. /t.*ing/ • thing • this is really annoying
Plus Signs and Question Marks • Very similar to asterisks, depend on previous • + matches one or more occurrences (not 0) • ? Matches zero or one occurrence (no more) • Ex. /2+4?/ matches one or more 2’s followed by either zero or one 4 • 22224, 2 match • 4, 244 do not • Part of the class of extended R.E.
Carets & Dollar Signs • If a regular expression starts with a ^, the string must be at the beginning of a line • If a regular expression ends with a $, the string must be at the end of a line • ^ and $ are referred to as anchors • Ex. /^T.*T$/ matches any line that starts and ends with T
Quoting Special Characters • If you want to use a special character literally, put a backslash in front of it • Ex. /and\/or/ matches and/or • Ex. /\\/ matches \ • Ex. /\**/ matches any number of asterisks
Longest Match • Regular expressions match the longest string possible in a line • Ex. I (Justin) like coffee (lots). • /(.*)/ • Matches (Justin) like coffee (lots) • /([^)]*)/ • Matches (Justin)
Boolean OR • You can pattern match for two distinct strings using OR (the pipe) • Ex. /CAT|DOG/ • Matches exactly CAT and exactly DOG • Simplier expressions can be written just using a character class • I.E. /a[bc]/ instead of /ab|ac/ • Also part of extended R.E.
Grouping • You can apply special characters to groups of characters in parenthesis • Also called bracketing • Matches same as unbracketed expression • But can use modifiers • Ex. /\(duck\)*|\(goose\)/
Using with vim • Use regular expressions for searching and substituting • Searching: • /string or ?string • Substituting: • :[g][address]s/string/replace[/g] • g : global; substitute all lines • string and replace can be R.E. • /g: global; replace all occurrences in the line
Using with vim con’t • [address] • n: line number • n[+/-]x: line number plus x lines before or after • n1,n2 : from line n1 to n2 • . : alias for current line • $ : alias for last line in work buffer • % : alias for entire work buffer
vim examples • /^if( • /end\.$ • :%s/[Jj]ustin/Mr\. Awesome/g
Using with vim con’t • Ampersand (&) • Alias for matched string when substituting • Ex: /[A-Z][0-9]/_&_/ • Quoted digit (\n) • Used with R.E. with multiple quoted parts • Can be used to rearrange columns • Ex: /\([^,]*\), \(.*\)/\2 \1/
Using with grep • To take advantage of extended regular expressions, use egrep or grep -E instead • Use single quote as delimiter • Ex: • egrep ’^T.*T$’ myfileLists all lines in myfile that begin & end with T