180 likes | 396 Vues
Lecture 6 Lecture 7. Regular Expressions grep. Why Regular Expressions?. Regular expressions are used to describe text patterns/filters Unix commands/utilities that support regular expressions: grep (fgrep, egrep) - search a file for a string or regular expression sed - stream editor
E N D
Lecture 6Lecture 7 Regular Expressions grep
Why Regular Expressions? • Regular expressions are used to describe text patterns/filters • Unix commands/utilities that support regular expressions: • grep(fgrep, egrep) - search a file for a string or regular expression • sed - stream editor • awk (nawk) - pattern scanning and processing language • There are some minor differences between the regular expressions supported by these programs • We will cover the general matching operators first.
Character Class • [] matches any of the enclosed chars • [abc] matches a singlea b or c • [a-z] matches any of abcdef…xyz • [^A-Za-z] matches a single character as long as it is not a letter. • Example: [Dd][Aa][Vv][Ee] • Matches "Dave" or "dave" or "dAVE", • Does not match "ave" or "da"
Regular Expression Operators • Any character (except a metacharacter!) matches itself. • . Matches any single character except newline. • * Matches 0 or more of the immediately preceding R.E. • ?Matches 0 or 1 instances of the immediately preceding R.E. • + Matches 1 or more instances of immediately preceding R.E. • ^ Matches the preceding R.E. at the beginning of the line • $ Matches the preceding R.E. at the end of the line • | Matches the R.E. specified before or after this symbol • \ Turn off the special meaning
Examples of R.E. x[abc]?x matches "xax" or "xx“ [abc]* matches "aaaaa" or "acbca" 0*10 matches "010" or "0000010"or "10" ^(dog)$ matches lines starting and ending with dog [\t ]* (A|a)+b*c?
Grouping with parens • If you put a subpattern inside parens you can use + * and ? to the entire subpattern. a(bc)*d matches "ad" and "abcbcd" does not match "abcxd" or "bcbcd"
Example • Christian Scott lives here and will put on a Christmas party • There are around 30 to 35 people invited. • They are: • Tom • Dan • Rhonda Savage • Nicky and Kimberly. • Steve, Suzanne, Ginger and Larry ^[A-Z]..$ ^[A-Z][a-z]*3[0-5] [a-z]*\. ^ *[A-Z][a-z][a-z]$ ^[A-Z][a-z]*[^,][A-Za-z]*$
Review: Metacharacters for filename abbreviation • * Matches anything: ls Test*.doc • ? Matches any single character ls Test?.doc • [abc…] Matches any of the enclosed characters: ls T[eE][sS][tT].doc • [a-z] matches any character in a range ls [a-zA-Z]* • [!abc…] matches any character except those listed: ls [!0-9]*
Difference !! • Although there are similarities to the metacharacters used in filename expansion – we are talking about something different! • Filename expansion is done by the shell. • Regular expressions are used by commands (programs). • However, be careful about specifying RE on the command line as a result of this overlap • Good idea to always quote RE with special chars (‘’or “”)on the command line • Example: % grep ‘[a-z]*’ chap[12]* Note: filename mask expanded by shell w/o ``
grep - search for a string • grep [-bchilnsvw] PATTERN [filename...] • Read files or standard /redirected input • Search for specified pattern in each line • Send results to the standard output • Examples: %grep ‘^X11’ *- search all files for lines starting with the string “X11” %grep -v text file - print lines that do not match “text”
Regular expressions for grep c any non special character\c turn off any special meaning of character c^ beginning of line$ end of line. any single character[...] any of characters in range .…[^....] any single character not in range .…r* zero or more occurrences of r
Regular Expressions for grep \< beginning of word anchor \<abc matches “abcd” but not “dabc” \> end of work anchor abc\> matches “dabc” but not “abcd” \(…\) stores the pattern … \(abc\)def matches “abcdef” and stores abc in \1. So \(abc\)def\1 matches “abcdefabc”. Can store up to 9 matches
grep - options • Some useful options -c count number of lines-h do not display filename-l list only the files with matching lines-v display lines that do not match-n print line numbers
File db northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Heme 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Webber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
grep with pipes • Remember, we can use pipes when a file is expected • ls –l | grep ‘\<Feb.*3\>’
egrep • Extended grep • allows for more kinds of regular expressions • unfortunately, egrep regular expressions are not a superset of grep regular expressions • some of grep’s regular expressions are not available in egrep
grep vs. egrep • new to egrep • f+ matches one or more occurrences of f • f? matches zero or one occurrences of f • f|g matches f or g • (ab) groups characters a and b together • only in grep • \( … \), \<, \> • Final Note: Different versions of grep/egrep may support different expressions. Make sure to check the man pages.
Recommended Reading • Chapter 3 • Chapter 4, sections 4.1 – 4.5