1 / 34

Perl for Bioinformatics

Perl for Bioinformatics. Lecture 4. Variables - review. A variable name starts with a $ It contains a number or a text string Use my to define a variable Use = to assign a value Use to stop the variable being interpolated

kylar
Télécharger la présentation

Perl for Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perl for Bioinformatics Lecture 4

  2. Variables - review • A variable name starts with a $ • It contains a number or a text string • Usemyto define a variable • Use = to assign a value • Use \ to stop the variable being interpolated • Take care with variable names and with changing the contents of variables

  3. Conditional Blocks, review • Anif test can be used to control a command in a conditional block, according to the outcome of a decision made by comparing variables. • It’s important to keep track of whether variables are strings or numbers. Numbers are compared with ==, strings with eq. • It’s usual to indent the block to make it easier to read the code

  4. Arrays • An array can store multiple pieces of data. • They are essential for the most useful functions of Perl. They can store data such as: • the lines of a text file (e.g. primer sequences) • a list of numbers (e.g. BLAST e values) • Arrays are designated with the symbol @ my @bases = (“A”, “C”, “G”, “T”);

  5. Converting a variable to an array split splits a variable into parts and puts them in an array. my $dnastring = "ACGTGCTA"; my @dnaarray = split //, $dnastring; @dnaarray is now (A, C, G, T, G, C, T, A) @dnaarray = split /T/, $dnastring; @dnaarray is now (ACG, GC, A)

  6. Converting an array to a variable • joincombines the elements of an array into a single scalar variable (a string) $dnastring = join('', @dnaarray); spacer (empty here) which array

  7. Loops • A loop repeats a bunch of functions until it is done. The functions are placed in a BLOCK – some code delimited with curly brackets {} • Loops are really useful with arrays. • The “foreach” loop is probably the most useful of all: foreach my $base (@dnaarray) { print "$base “; }

  8. Comparing strings • String comparison (is the text the same?) • eq (equal ) • ne (not equal ) There are others but beware of them!

  9. Getting part of a string • substrtakes characters out of a string $letter = substr($dnastring, $position, 1) where in the string how many letters to take which string

  10. Combining strings • Strings can be concatenated (joined). • Use the dot . operator $seq1= “ACTG”; $seq2= “GGCTA”; $seq3= $seq1 . $seq2; print $seq3;ACTGGGCTA

  11. Making Decisions - review • The if operator is generally used together with numerical or string comparison operators, inside an (expression). numerical: ==, !=, >, <, ≥, ≤ strings: eq, ne • You can make decisions on each member of an array using a loop which puts each part of the array through the test, one at a time

  12. More healthy exercise • Write a program that asks the user for a DNA restriction site, and then tells them whether that particular sequence matches the site for the restriction enzyme EcoRI, or Bam HI, or Hind III. • Site for EcoR1: GAATTC • Bam H1: GGATCC • Hind III: AAGCTT

  13. Arrays and loops - review • An array starts with @. It contains multiple bits of data in a list-like format. • @bases = (“A”, “C”, “G”, “T”); • You can make decisions on each member of an array using a foreachloop which puts each part of the array through the test, one at a time

  14. Test time, again • Remember – keep track of what’s in a variable don’t over-write a variable with another value, unless you intend to syntax and case are critical lines end with a semicolon brackets and quotes must match.

  15. Opening and closing files • So we can input large amounts of data, Perl has to read data out of files, and write results into output files • This is done in two steps • First, you must give the file a name within the script - this is known as a filehandle • Use the open command: open MYFILE, ‘exampleprotein.txt’;

  16. Reading a file • Once the file is open, you can read from it, line by line, using the readline <> operator again • (put the filehandle between the angle brackets) • Perl reads files one line at a time, each time you input data from the file, the next line is read: open FILE1,’exampleprotein.txt’; $line1 = <FILE1>; chomp $line1; $line2 = <FILE1>;

  17. Using loops to read in a file • The while loop just keeps doing an expression while it’s true. So it will keep reading lines from the file until it runs out. • The special variable $_ keeps track of the line of the file we’re on. my $longsequence; open FILE, ‘exampleprotein.txt’; while (<FILE>){ $longsequence = $longsequence . $_; chomp $longsequence; } close FILE; • This reads the whole file, and puts each line into the variable $longsequence one at a time.

  18. Now More Fun Excercises • Read a DNA sequence from a fasta format file • Calculate the GC content. • What about the non-DNA characters in the file? >header lines with the name of the sequence carriage returns !! You know this one. blank spaces N’s or X’s or unexpected letters

  19. Writing to a File • Writing to a file is similar to reading from it • Use the > operator to open a file for writing: open OUTPUT,‘>/home/class30/output.txt’; • This creates a new file with that name, or overwrites an existing file • Use >> to append text to an existing file • print to the file using the filehandle: print OUTPUT $myoutputdata;

  20. else Some more stuff you need to know • Instead of just letting the script go on if it fails an if test, you can get it to execute a second block of code if the statement in brackets isn’t true.

  21. elsif • You can string a lot of “if”s together using elsif if ($site eq “GAATTC” { print “EcoR1 site\n”; } elsif ($site eq “CCATGG” { print “BamHI site\n”; } elsif ($site eq “AAGCTT”) { print “HindIII site\n”; } else { #only happens if none of the preceeding are true die “I can’t find any of the sites I know\n”; }

  22. Subscripts • Bioinformatics data often can be made into array format: • multi-line sequence files • Microarray or statistics data in “tab delimited” format • You can address part of the array as if it was a variable using a subscript @numbers = (8, 8, 8, 23984092, 8); print “$numbers[3]\n”; Please note – the first element is number zero! Second is 1!

  23. Regular Expressions • Sounds odd, doesn’t it? It means a pattern that the computer can match, in a standard format. • Very useful in bioinformatics work • DNA patterns • restriction sites • promoters/transcription factor binding sites • intron splice site • Protein patterns • conserved domains (motifs) • active sites • structural motifs (membrane spanning, signal peptide, etc.)

  24. The Binding and Match Operators:=~ / / • The =~ operator bindsfunctions together • The // operator matches things to patterns It can be translated as “contains” • The forward slashes contain the pattern to be matched, like this: if ($dnaseq=~ /GAATTC/) {print “EcoRI site found\n”}

  25. A regular expression.. is a joy forever. And a pattern to match: can be just a text string, such as: /GATC/ it can have alternative characters: /G[AT]TC/ or contain a wildcard that matches any character: /G.TC/ Or be something bizzare:/\/[^\/]*\/\.\./

  26. Perl Regular Expressions • It never ceases to amaze me what people can do with regular expressions, but you can match pretty much anything you can think of and a lot you can’t: #man perlrequick

  27. Alternative Characters • Square brackets within the match expression allow for alternative characters: if ($dna =~ /CAG[AT]CAG/) • This will match an DNA string that starts with CAG; has A or T in the 4th position, followed by another CAG. • A vertical line within the /expression/ means “or”; it allows you to look for either of two completely different patterns: if ($dna=~/GAAT|ATTC/)

  28. Special characters • Perl has a large set of special characters to use in regular expressions: • the dot (.) matches any character • \d matches any digit (a number from 0-9) • \w matches any “word” character (a letter or a number, not punctuation or space) • \s matches white space (any amount) • \t matches a tab (useful for tab delimited files) • ^ matches the beginning of a line • $ matches the end of a line • Knowing this makes you lots of fun at parties.

  29. “Special” characters • What if you need to match text that contains a special character? • Aren’t there dots at the end of sentences? • Now you have to use a backslash (\) to “escape” the special meaning of that character: if $onewordsentence =~ /\w+\ ./ -This would match any text that has one or more text characters, followed by a dot. Finished.

  30. Bringing it together • So now, when you think about it, you can: Open a file Check whether each line of the file contains a particular pattern Recover part of that line Write it out to another file So.. isn’t that what you wanted to know? But really, it’s very useful combined with the UNIX command line.

  31. A last exercise?... • Now we’re getting up to speed with Perl, lets try something more fun: • Open up a BLAST output file • Spit out the name of the query sequence, the top hit, and how many hits there were.

  32. Only the beginning • Sadly, there is much, much more than this to the Power of Perl. • You can make, create and download other people’s websites • Make Linux and Windows graphical programs • Do almost anything on the internet • Interact with databases • And much much more

  33. Why won’t I teach you more stuff? • Whoa! • Programming takes time to learn properly • You’ve got the tools now to get started on a programming project • We will go through some more Perl functions in the later classes, especially modules such as Bioperl.

  34. Practice makes perfect • You can now practice your Perl skills and understand a lot of the books and help files, which are probably more useful. #man perlintro #man perlrequick #perldoc bioperl • Also, check out Lincoln Stein’s course at: http://stein.cshl.org/genome_informatics/index.html

More Related