1 / 20

Regular Expression 1. What is regular expression?

Regular Expression 1. What is regular expression? An expression of a pattern in a string using special characters and words. 2. When and where we use it? Regular expression is used to parse an output from a software , for example, BLAST, or used to extract information you need

argus
Télécharger la présentation

Regular Expression 1. What is regular expression?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Expression 1. What is regular expression? An expression of a pattern in a string using special characters and words. 2. When and where we use it? Regular expression is used to parse an output from a software , for example, BLAST, or used to extract information you need from a text file. When a string | line matches the pattern, it is extracted. Therefore, it is extremely useful.

  2. String match: Two different formats while (<>){ chomp; $line = $_; if (/drought/) { # check if current line $_ contains “drought”; print “$_\n”; } # how to check if a variable string contains “drought” if ($line =~m/drought/) { print “$line\n”; } }

  3. #!/usr/bin/perl use strict; my $infile=shift; open (IN, "$infile") || die "Can not open input file -- $infile \n"; while (<IN>){ chomp; if ((/drought/) && (/salt/)) { print "drought_salt\t$_\n"; } elsif (/calcium/) { print "calcium:\t$_\n"; } elsif (/cold/){ print "cold:\t$_\n"; } } close (IN);

  4. .   Match any character\w  Match "word" character (alphanumeric plus "_") \W  Match non-word character\s  Match whitespace character\S  Match non-whitespace character\d  Match digit character\D  Match non-digit character\t  Match tab\n  Match newline

  5. If (/\d+/) { print “match_digit: $_\n”; } elsif (/\w+/) { print “match_character:$_\n”; } +: one or more *: zero or more

  6. *      Match 0 or more times+      Match 1 or more times?      Match 1 or 0 times{n}    Match exactly n times{n,}   Match at least n times{n,m}  Match at least n but not more than m times

  7. if($str =~m/(A|E|I|O|U|a|e|i|o|u)/) { print "String contains a vowel!\n” } if($string =~ /[^AEIOUYaeiouy]/){ print “String contains a non-vowel\n“; }

  8.  Volume in drive D has no label Volume Serial Number is 4547-15E0 Directory of D:\polo\marco.              <DIR>        12-18-97 11:14a ...             <DIR>        12-18-97 11:14a ..INDEX    HTM         3,237  02-06-98  3:12p index.htmAPPDEV   HTM         6,388  12-24-97  5:13p appdev.htmNORM     HTM         5,297  12-24-97  5:13p norm.htmIMAGES         <DIR>        12-18-97 11:14a imagesTCBK     GIF           532  06-02-97  3:14p tcbk.gifLSQL     HTM         5,027  12-24-97  5:13p lsql.htmCRASHPRF HTM        11,403  12-24-97  5:13p crashprf.htmWS_FTP   LOG         5,416  12-24-97  5:24p WS_FTP.LOGFIBB     HTM        10,234  12-24-97  5:13p fibb.htmMEMLEAK  HTM        19,736  12-24-97  5:13p memleak.htmLITTPERL       <DIR>        02-06-98  1:58p littperl         9 file(s)         67,270 bytes         4 dir(s)     132,464,640 bytes free

  9. What are \w and \W ? \w Match "word" character (alphanumeric plus "_") any word or character of [a-zA-Z0-9_] \W  Match non-word character [^a-zA-Z0-9_]

  10. Grouping : Perl stores whatever are within () sequentially to variable 1, 2, …, $_= “Forests are important to Human”; If (/(\w+)\W+(\w+)/) { print “$1\n$2\n”; } Or you can do this ($first, $second) = (/(\w+)\W+(\w+)/; print “$first\n$second\n”;

  11. Translation: Translations are like substitutions, except they happen on a letter by letter basis instead of substituting a single phrase for another single phrase. For instance, what if you wanted to make all vowels upper case: # Change DNA sequence from low case to upper case: $string =~ tr/[a,t,c,g]/[A,T,C,G]/; # Change everything to upper case: $string =~ tr/[a-z]/[A-Z]/; Change everything to lower case $string =~ tr/[A-Z]/[a-z]/;

  12. Perl regular expressions normally match the longest string possible. For instance: my($text) = "mississippi"; $text =~ m/(i.*s)/;print $1 . "\n"; Run the preceding code, and here's what you get: ississ It matches the first i, the last s, and everything in between them. But what if you want to match the first i to the s most closely following it? Use this code: my($text) = "mississippi"; $text =~ m/(i.*?s)/; # Match 1 or 0 times print $1 . "\n"; Now look what the code produces: is

  13. \b  Match a word boundary    \B  Match a non-(word boundary)    \A  Match only at beginning of string    \Z  Match only at end of string, or before newline at the end    \z  Match only at end of string    \G  Match only where previous m//g left off (works only with /g) For example If (/Fred\b/) ---matches Fred, but not Frederick / \bTech\b ---matches Tech, but not MichiganTech or Technological \B requires that there not a word boundary /\bFred\B/ ----matches Frederick but not Fred Christopher

  14. The \A and \Z are just like ``^'' and ``$'', except that they won't match multiple times when the /m modifier is used Pattern match modifiers m/PATTERN/cgimosx /PATTERN/cgimosx Options are: cDo not reset search position on a failed match when /g is in effect. gMatch globally, i.e., find all occurrences. IDo case-insensitive pattern matching. mTreat text as multiple lines, allow anchors to match before and after newline o Compile pattern once. s Treat text as single line, allowing newlines to match x Use extended regular expressions.

  15. Substitute $str = “foot fool buffoon”; $str = s/foo/bar/g; #str now is “bart barl bufbarn” g (global ) tells Perl to replace on all matches. $str = “foot Fool buffoon”; $str = s/foo/bar/gi; #str now is “bart barl bufbarn

  16. #!/usr/bin/perl   use warnings;   $_ = "my test string goes here";   while (/(\w+)/gi) {       $word = $&;   # Contains the string matched by the last pattern match     while ($word =~ /e/gi) {           $count++;           if ($count == 3) {               print "$word\n";               $count = 0;           }       }   }  

More Related