1 / 12

part 4

part 4. Arrays:. Forging ahead on Perl. Stacks foreach command. Regular expressions:. String structure analysis and substrings extractions and substitutions. Command line arguments:. @ARGV array. Functions/Subroutines:. Repetitive use of functional blocks. Modules in Perl:.

mari
Télécharger la présentation

part 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. part 4 Arrays: Forging ahead on Perl • Stacks • foreach command Regular expressions: • String structure analysis and substrings extractions and substitutions Command line arguments: • @ARGV array Functions/Subroutines: • Repetitive use of functional blocks Modules in Perl: • How to use/share libraries of functions Error messages: • How to interrupt program on a mistake • die statement

  2. part 4 Arrays as a “FIRST-COME … LAST-SERVED” storage 5 numbers array @a = (7,-1,2,4,5); push pop 5 5 4 2 Jar of 5 numbers -1 7 # zero array@a = (); # store numberspush @a, 7;push @a, -1;push @a, 2;push @a, 4;push @a, 5; $lastNumber = pop @a; print “last number stored in @a was $lastNumber\n”;

  3. part 4 When push/pop commands are useful? 1 18 23 2 #!/usr/local/bin/perl # storing file data @fileLines = ();open (INP, “ < data.txt”);while ($line = <INP>) { chomp($line);push @fileLines, $line;}close(INP); # calculating number of lines in the file$nLines = $#fileLines + 1;print “There are $nLines lines in data.txt file\n”; # printing out data.txt file contentforeach $line (@fileLines) { print “$line\n”;} Finding potential regulatory elements in noncoding regions of the human genome is a challenging problem. Analyzing novel sequences for the presence of known transcription factor binding sites or their weight matrices produces a huge number of @a = (1..6);foreach $d (@a) { print “$d “;}print “\n”; 1 2 3 4 5 6

  4. part 4 Command line arguments printFile.pl -- program, which prints out contents of a file 118232-123 words.txt numbers.txt Finding potential regulatory elements in noncoding regions of the human genome is a challenging problem. Analyzing novel sequences for the presence of known transcription factor binding sites or their weight matrices produces a huge number of printFile.pl numbers.txt printFile.pl words.txt @ARGV -- array of arguments following program name@ARGV = (“numbers.txt”); #!/usr/local/bin/perl # determine file name$fName = $ARGV[0]; # open, read and print out fileopen (INP, “ < $fName”);while ($line = <INP>) { print $line;}close(INP);

  5. part 4 Example. Print out N-th line of the file words.txt Finding potential regulatory elements in noncoding regions of the human genome is a challenging problem. Analyzing novel sequences for the presence of known transcription factor binding sites or their weight matrices produces a huge number of printFile.pl words.txt 3 a challenging problem. Analyzing novel #!/usr/local/bin/perl # determine file name, and line index$fName = $ARGV[0];$lineNo = $ARGV[1]; # open and read fileopen (INP, “ < $fName”);while ($line = <INP>) { push @fileLines, $line;}close(INP); # print out N-th lineprint $fileLines[ $lineNo-1 ];

  6. part 4 Error messages How to stop correctly a program with an indication of a run problem? Example problem: printFile.pl words.txt 3 Program should be executed with 2 arguments,but user specifies only 1: printFile.pl 3 Program should stop and report about an error #!/usr/local/bin/perl # check whether we’ve got 2 arguments or notif ($#ARGV != 1) {die “Error. Incorrect number of arguments\n”;} ... Print out a message and stop the program Stop on incorrect indication of a line number: ... if ($ARGV[1] <= 0) {die “Error. Incorrect line number: $ARGV[1]\n”;} ...

  7. part 4 Defining novel functions and commands Function is a “mini computer” inside a program, it gets input data and produces output results INPUT 2 Hello 3 4 7 Everybody 33 57 OUTPUT Hello Everybody FUNCTION(filtering out numbers) Defining min function, which returns minimum of 2 numbers: $x = min(5,3); print “Smallest of 5 and 3 is: $x\n”; # Function minsub min { ($a, $b) = @_; if ($a < $b) { $small = $a; } else { $small = $b; }return $small;} INPUT parameters

  8. part 4 Regular expressions $string1 = “Total: 576 genes, 2763 exons, some introns”; How to extract 2 numbers? $string2 = “human -G-ACT---TTGC------AA----A---A----”; How to extract just DNA sequence? Special symbols substituting groups of common type characters (called patterns): \s Match a whitespace character \S Match a non-whitespace character \d Match a digit character \D Match a non-digit character ^ Match the beginning of the line . Match any character (except newline) $ Match the end of the line \t Tabulation symbol (HT, TAB) \n Newline (LF, NL)

  9. part 4 Grouping options: * Match 0 or more times + Match 1 or more times [] Character class Patterns management: $string = “Total: 576 genes, 2763 exons, some introns”; $string =~ s/\d+/some/g; --> “Total: some genes, some exons, some introns”; $string =~ s/\s+/#/g; --> “Total:#576#genes,#2763#exons,#some#introns”; $string =~ s/\D+/\*/g; --> “* 576 * 2763 * * *”;

  10. part 4 Localizing substrings: 10 20 human -G-ACT---TTGC------AA----A---A-----CG-----G-AT-------TGGG--- | ||| ||| || | | || | || |||| mouse TGAACTCAAGTGCTATTTTAATTCCATTCATTCTCCGTGGCTGCATCAGGGCCTGGGGCT 10 20 30 40 50 60 30 human ---------------C----GG------GA-------TG-AG--AGG------------- | || || || || ||| mouse CTACCTCCTGACAAACATTTGGTCTCTAGAAGGCTTCTGAAGTTAGGCAAGTCTGAAAAT 70 80 90 100 110 120 alignment.blast How to extract only the lines starting with ‘mouse’ ? while ($line = <INP>) { if ($line =~ /^mouse/) { print $line; }} mouse TGAACTCAAGTGCTATTTTAATTCCATTCATTCTCCGTGGCTGCATCAGGGCCTGGGGCT mouse CTACCTCCTGACAAACATTTGGTCTCTAGAAGGCTTCTGAAGTTAGGCAAGTCTGAAAAT

  11. part 4 Obtaining substrings after localization: 10 20 human -G-ACT---TTGC------AA----A---A-----CG-----G-AT-------TGGG--- | ||| ||| || | | || | || |||| mouse TGAACTCAAGTGCTATTTTAATTCCATTCATTCTCCGTGGCTGCATCAGGGCCTGGGGCT 10 20 30 40 50 60 30 human ---------------C----GG------GA-------TG-AG--AGG------------- | || || || || ||| mouse CTACCTCCTGACAAACATTTGGTCTCTAGAAGGCTTCTGAAGTTAGGCAAGTCTGAAAAT 70 80 90 100 110 120 alignment.blast How to extract human and mouse sequences? /...(xxx)...(xxx)../ -- substrings enclosed into parenthesizesare available after a search in a format of variables $1, $2, ... $humanSeq = “”;$mouseSeq = “”; while ($line = <INP>) { if ($line =~ /^mouse (\S+)$/) { $mouseSeq .= $1; } elsif ($line =~ /^human (\S+)$/) { $humanSeq .= $1; }} print “Human sequence: $humanSeq\n”;print “Mouse sequence: $mouseSeq\n”;

  12. part 4 Modules: Perl does not have functions for all the cases, but majority of those functions are already programmed by other people… And they share their libraries of functions, which are called modules useX; command indicates that functions from X module should be used Perl does not know how to create pictures, use GD; -- now it knowsHow to communicate with databases?use DBI;How to do DNA sequence analysis?use BioPerl; How to extract command line options?use Getopt; http://cpan.org/ -- storage of Perl modules

More Related