1 / 96

Welcome to lecture 3: An introduction to programming in PERL

Welcome to lecture 3: An introduction to programming in PERL. IGERT – Sponsored Bioinformatics Workshop Series Michael Janis and Max Kopelevich, Ph.D. Dept. of Chemistry & Biochemistry, UCLA. Last time…. We covered a bit of material… Try to keep up with the reading – it’s all in there!

asasia
Télécharger la présentation

Welcome to lecture 3: An introduction to programming in PERL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome to lecture 3:An introduction to programming in PERL IGERT – Sponsored Bioinformatics Workshop Series Michael Janis and Max Kopelevich, Ph.D. Dept. of Chemistry & Biochemistry, UCLA

  2. Last time… • We covered a bit of material… • Try to keep up with the reading – it’s all in there! • How’s it coming along? • regex examples? (TATA box, palindrome)… • > grep -E --color 'TA(TAAA|TAAT|TATT|ATAA|ATAT)' *.fsa • > grep -E --color '(.)(.).\2\1' • Using emacs? • Let’s ignore the long version of the prosite match for now… we’ll deal with that soon…

  3. Shell scripting is useful, but… It does not port or scale well; complex data structures may be somewhat challenging. Having said that, Shell scripting skills have many applications, including: • Ability to automate tasks, such as • Backups • Administration tasks • Periodic operations on a database via cron • Any repetetive operations on files • Increase your general knowledge of UNIX • Use of environment • Use of UNIX utilities • Use of features such as pipes and I/O redirection

  4. For bioinformatics, we need a fully featured programming language There’s a problem with our search of fasta files – can you guess what? We’ll be dealing with this using a programming language with arbitrarily complex data structures Perl is a scriptable, portable, interpreted and compiled language: • Scriptable and portable and networks well • The code remains in text format • The code is interpreted and compiled at runtime • The interpreter has been written for use on every (?) platform • Can control a vast number of other devices (files, programs, either local or remote) • Drawbacks of the language • Since it’s compiled to C code, it will always run slower than C code • There’s a double edged sword called TMTOWTDI • Not truly OO; not the most elegant language for algorithm implementation (arguable!)

  5. PERL: starting point for bioinformatics • Easy to learn (a bit forgiving) • Easy to process text files; good language for pattern searching • Most biological file formats are text files • Most sequence analysis tasks deal with pattern finding at some point • Easy to run other programs and process their results • Similar to shell programming in this regard!

  6. Extending the shell: Creating Our Own Commands • Use programming language to create the new command • We will use perl • TASK: write a PERL program that • A.) reads a fasta sequence file • B.) reverse complements the sequence • C.) prints the output to STDOUT • D.) Then modify program to write to a file • 1. Using command line REDIRECTION • 2. Using PERL to open and write to OUTPUT FILE

  7. PERL vocabulary – similar to bash functionality • print • chomp • while • open • close • $ARGV[0], $ARGV[1] • $_ • if. . .else • =~ • /^>/

  8. PERL vocabulary. . .EXPLAINED • printworks like echo command • chompremoves the ‘newline character’ • whilerepetitive loop until breaking condition met • open,used to open a file • close used to close a file • $ARGV[0], $ARGV[1] command line arguments • $_variable that holds current line from in-file • if. . .else[if true perform a, else perform b] • =~binding operator (compare text w/ reg. exp) • /^>/match “>” at very beginning of line ONLY

  9. Running a perl script • Create a file • Specify location of perl • Write program • Make it executable • Run it!

  10. Example: “Hello world!” The location of PERL • Write the program: #!/usr/bin/perl print("Hello, world!\n"); A PERL command • Make it executable: Tells the computer to allows the user to read, write AND execute it. Others can only read it. >chmod 744 > • Run it: Run the program >hello.pl Hello, world! > The output

  11. Data • Data is stored in variables. • A variable is like a box. • We put values in it. • There are three ways of storing data: • Scalar variables • Arrays • Hashes • A single variable (a ‘scalar variable’) can be called anything, but must start with a ‘$’

  12. Scalar variables: example #!/usr/bin/perl $dna = “TGACT”; Print(“$dna\n”); Defining a variable Using it >printVariable.pl TGACT >

  13. Scalar variables (cont.) • PERL doesn’t differentiate between strings (e.g. “Fred”), integers (e.g. “13”) or floating point numbers (e.g. “16.9”). • If there’s one piece of information, it’s a scalar variable. • PERL understands the context you’re working in.

  14. Scalar variables (cont.) #!/usr/bin/perl $dna = “TGACT”; print(“$dna\n”); $dna = 11; print($dna+2.”\n”); Defining a variable (here it’s a string) Using it Redefine variable Use it in an integer context >printVariable.pl TGACT 13 > Perl worked out what to do

  15. Limitations of scalar variables Imagine we want to find the average of a list of numbers • we could do it like this: program 1 $number1 = 5.4; $number2 = 7.3; $number3 = 4.1; $average = ( $number1 + $number2 + $number3 ) / 3; but this is obviously extremely limited

  16. Lists Of course there is a way to make lists in Perl. You can always enclose a list of items in parentheses... ( 5.6, 8.22, 14.9 ); # list of floating point numbers ( "hello", "Canada" ); # list of strings ( "hello", $country ); # mixed list ( "blah", 18, 22, 'x', 3.14 ); # mixed list ( 0 .. 5 ); # list of integers between 0 and 5 ( 'a' .. 'z' ); # list of strings a,b,c,d......

  17. Array variables There is a special type of variable in perl which can hold lists - The array • Perl knows a variable is an array when we use a special character @ • Remember, scalars (single valued variables) start with a dollar ($) sign, arrays start with an @ sign. • Arrays can have as many elements as you need (up to the limits of your available memory, anyway) @numbers = (5.6, 8.22, 14.9); # list of floating point numbers

  18. Printing arrays @words = ("Hello", "Canada!"); print "@words" # prints Hello Canada! print @words # prints HelloCanada! • Double quoted strings will print array elements with spaces in between them. • No quotes will print array elements all smashed together. !

  19. Accessing array elements An array wouldn't be very useful if we couldn't look at the individual members of the list. print "Enter an index number between 0 and 25\n"; $index = <STDIN>; chomp $index; @letters = ('A'..'Z'); print "letter index $index = $letters[$index] \n"; What does it mean?

  20. Accessing array elements • Arrays are stored in perl's memory in order. • Each position (element) in the array has a number • This number is called the index • Each element in an array is a single (scalar) value • There is magic syntax for addressing individual array elements. • This syntax can be a bit bewildering. • To access an element we type: • $array_name[element_number] • Elements are numbered starting at zero, not one!!

  21. Setting the values in an array Remember ‘ls –1’? We’ll use that here… @files=`ls –1 *.CEL`; # BACKQUOTE here • this is an \n separated list • Any delimiter is ok • Any element can be accessed as a scalar and any function that acts upon a scalar can be introduced ($file=$files[2];)

  22. Indexing arrays with negative numbers You can index from the end of an array backwards by using negative numbers: @letters = ('A'..'Z'); print "last letter = $letters[-1] \n"; print "penultimate letter = $letters[-2] \n";

  23. Getting the length of an array • You can use the function scalar to turn an array into a single valued scalar variable; • the value of this variable will be the number of elements in the array. @numbers = (0..100); print scalar(@numbers); # prints 101

  24. Functions that act on arrays push Adds a value (or values) to the end of an array @numbers = (1, 2, 3); push(@numbers, 4, 5); print "@numbers \n"; # prints 1 2 3 4 5

  25. Functions that act on arrays pop Removes a single value from the end of an array @words = ('the', 'quick', 'brown', 'fox'); print pop(@words); # fox print pop(@words); # brown print pop(@words); # quick

  26. Functions that act on arrays shift Removes a single value from the beginning of an array @words = ('the', 'quick', 'brown', 'fox'); print shift(@words); # the print shift(@words); # quick

  27. Functions that act on arrays unshift Pushes a value (or values) onto the front of an array

  28. Functions that act on arrays reverse @words = ('the', 'quick', 'brown', 'fox'); print reverse(@words), "\n"; # foxbrownquickthe

  29. Functions that act on arrays sort sort does what you think it does. You give it a list (or array), and it returns a list that is sorted in some way. @words = ('The', 'quick', 'brown', 'fox', 'jumped'); @sorted = sort(@words); print "sorted words = @sorted\n"; # The brown fox jumped quick

  30. Functions that act on arrays join @words = ('The', 'quick', 'brown', 'fox', 'jumped'); print join("+", @words), "\n"; # The+quick+brown+fox+jumped You specify what string you want to join with as the first argument. You can use anything.

  31. Array summary • An array is a variable that has multiple values simultaneously. • We refer to the different values using a number called the index.

  32. Array example Note square brackets enclose index #!/usr/bin/perl $dna[0] = “TATA”; $dna[1] = “ATG”; print(“$dna[0]\n”); print(“$dna[1]\n”); Defining different entries of an array Print them both >arrayExample.pl TATA ATG >

  33. What is a hash? Hashes are similar to arrays in many respects. Remember, arrays are simple lists stored as a series of elements, and each element has a number (index). The elements are stored in numeric order. It is a bit like a shopping list. Arrays are limited, in that you need to know which index position contains your value of interest. It might be nice if we could give these index positions names of our choice.

  34. What is a hash? Perl has a way to do this, it is called a hash. Perl denotes a hash with a % (percent) sign. If arrays are shopping lists, hashes are telephone directories. You look up phone numbers by a person's name, not a unique number. They look something like this %astronomy value key to get the value: --------------------------------- | 'string' | 'word' | $astronomy{'word'}

  35. Making a hash %re_lookup = ( 'Eco47III'=> 'AGCGCT', 'EcoNI' => 'CCTNNNNNAGG', 'EcoRI' => 'GAATTC', 'EcoRII' => 'CCWGG', 'HincII' => 'GTYRAC', 'HindII' => 'GTYRAC', 'HindIII' => 'AAGCTT', 'HinfI' => 'GANTC' );

  36. Accessing a hash print "Enter restriction enzyme name\n"; $re=<STDIN>; chomp $re; $seq = $re_lookup{$re}; if (defined($seq)) { print "RE sequence for $re is: $seq\n"; } else { print "Sorry, I don't know about \"$re\""; }

  37. Changing values in a hash Just like we can change individual elements in an array by referring to them by number, we can change values in a hash by referring to them by their key. $space{'moon'} = 'Titan'; # change "Luna" to "Titan"

  38. Useful Hash Functions The keys function takes a hash as argument and returns a list of keys in that hash The values function takes a hash as argument and returns a list of values in that hash

  39. Useful Hash Functions KEYS %accession_hash = ( "BACR01A01" => "AC005555", "BACR48E02" => "AC005577", "BACR24K17" => "AC005101", ); # get all the keys in the hash @clones = keys %accession_hash; print "Clone IDs: @clones\n"; # prints BACR01A01 BACR48E02 BACR24K17

  40. Useful Hash Functions VALUES # get all the values in the hash (hash is a lookup for accessions): @accs = values %accession_hash; print "GenBank Accessions: @accs\n"; # prints AC005555 AC005577 AC005101

  41. Removing elements from a hash To remove a key value pair from a hash, you can use the delete function delete $re_lookup{"EcoRI"} If you just want to delete a value, but keep the key, you could do this: $re_lookup{"EcoRI"} = “”; # set value to the empty string

  42. Counting things with a hash One of the most popular things to do with a hash is to count the number of times something has been seen.

  43. Counting things with a hash @things = qw(YOR382W YML383W YML280W); # a list of accession numbers %counting = (); # initialize a hash foreach $item (@things){ $counting{$item}++; # increment the value associated with the key } foreach $key (keys %counting) { print "$key is found $counting{$key} times \n";}

  44. Hashes summary • Hashes are like arrays except instead of a numerical index, we use keys. • A key can have any value. It can be a string, an integer – anything. • Until you learn to use hashes, you aren’t really using Perl!

  45. Hashes: example Note curly braces enclose key Defining different entries of the hash #!/usr/bin/perl $wife{“Fred”} = “Hannah”; $wife{“Bill”} = “Josephine”; print($wife{“Bill”}.”\n”); print($wife{“Fred”}.”\n”); >testHash.pl Josephine Hannah >

  46. More stuff on variables • We’ve used the ‘$’ to talk about individual entries for hashes or arrays. • But referring to the whole array, we use ‘@’. • Referring to the whole hash, we use ‘%’.

  47. More stuff on variables • This becomes useful when looking at properties of an entire array or hash • For example, the length of an array: #!/usr/bin/perl $names[0] = “Bill”; $names[1] = “Fred”; $names[2] = “Bartholomew”; print(scalar(@names).”\n”); ‘@’ means we’re referring to the whole array >testScalar.pl 3 >

  48. Control structures • All out programs so far have run from start to finish. Each line has been executed in turn. • What if we only want to run some lines some of the time? • This is where control structures come in.

  49. Control structures • PERL has a number of control structures. • I’ll talk about four: • if • while • for & foreach • There are others (e.g. unless)

  50. ‘if’ control structure #!/usr/bin/perl $name = “Bill”; if ($name eq “Bill”) { print(“The name is Bill!\n”); } else { print(“The name isn’t Bill!\n”); } >testIf.pl The name is Bill! >

More Related