1 / 35

Perl Programming

Perl Programming. Paul Tymann Computer Science Department Rochester Institute of Technology ptt@cs.rit.edu. Strings. A collection of characters This slide consists of a sequence of strings CS folk have been working with strings for years

carlo
Télécharger la présentation

Perl Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perl Programming Paul Tymann Computer Science Department Rochester Institute of Technology ptt@cs.rit.edu

  2. Strings • A collection of characters • This slide consists of a sequence of strings • CS folk have been working with strings for years • Many tools and algorithms have been developed to work with strings

  3. Sequences • Ask a biologist what a sequence is: • ATGCCTATGCCCCTTGAGAGA • Show that to a CS type and ask “what is this” • It is a string!! • In a way bioinformatics is all about manipulating strings • CS types are real good at manipulating strings!!

  4. What the heck is Perl? • Perl a computer language designed to scan arbitrary text files, extract information from those text files, and print reports based on that information • “Perl” == “Practical Extraction and Report Language” • What makes Perl powerful? • It has sophisticated pattern matching capabilities • Straightforward I/O • It was created, written, developed, and maintained by Larry Wall (lwall@netlabs.com)

  5. Where does Perl stand? • Perl is an interpreted language • Which means it runs slower than a compiled language • BUT it is much easier, and quicker, to develop programs • Some people would call Perl a scripting language • The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal) • It is a useful tool that can get the job done

  6. Lots of People Are Using Perl • There are lots of people using Perl and as a result there are lots of libraries that you can get for free • If you can think of an application, chances are you can find the Perl code to do it • This means writing Perl programs to do sophisticated things is easy and does not take long to to.

  7. BioPerl • Bioperl is a collection of perl modules that facilitate the development of perl scripts for bioinformatics applications • Bioperl provides a means by which large quantities of sequence data can be analyzed in ways that are typically difficult or impossible with web based systems • Bioperl is open source software that is still under active development

  8. BioPerl Modules • Sequence Object • Sequence flat-file format I/O • Sequence alignment objects • BLAST similarity search • Sequence database access • Sequence file indexing • Common Base Object

  9. Is Perl THE tool? • Probably not • Perl is great for munging text data to a different form • Get a blast search off the web and extract info from it and place it in your database • Perl is great if you want it done fast • What about more complicated programming? • You might want to get a bigger hammer!! • There are many BIO.* packages out there.

  10. Comment Ignored by Interpreter Escape character - newline Print statement A String – a collection of characters Your First Perl Program • # Say Hello • print “Hello World\n”; Execution Order

  11. Comment used by Unix to run Perl Perl - Unix Style • #!/usr/local/bin/perl -w • # Say Hello • print “Hello World\n”;

  12. How To Make It Run Create a text file that contains a Perl program (script)

  13. How To Make It Run Invoke the interpreter to run the program

  14. Should be “print” Sometimes we make misteaks Create the Perl script

  15. Sometimes we make misteaks Run the interpreter

  16. Sometimes we make misteaks Fix the mistake Try again

  17. Your Turn!! • Write a Perl program that prints out your name and the name of your workshop partner on separate lines • Sample Output: Paul Tymann Rhys Price Jones

  18. A scalar variable holds the characters in the string Assignment – evaluate right side and place in left Apply operation on right to the contents of the variable on the left Substitute all occurrences of T with U Your Second Perl Program • # Convert DNA string to RNA string • $DNA = “AGGGGAGGCCTTACT”; • $RNA = $DNA; • $RNA =~ s/T/U/g; • print “$RNA\n”;

  19. Reading from the Keyboard • You can read information from the keyboard by using • <STDIN> • For example to read a string from the keyboard and place that string in the string variable str • $STR = <STDIN>; • The line termination character will be read and appended to the string

  20. Modified Program • # Convert DNA string to RNA string • print "Enter DNA string: "; • $DNA = <STDIN>; • $RNA = $DNA; • $RNA =~ s/T/U/g; • print "$RNA\n";

  21. Arithmetic and Logic Operators

  22. Flow of Control • Conditional • if ( expression ) { statements } • if ( expression ) { statements } else { statements } • If ( expression ) { statements } elsif … • Loops • while ( expression ) { statements } • for ( init; test; increment ) { statements }

  23. Examples • # Print 1 through 100 twice • $i = 1; • while ( $i <= 100 ) { • print $i,”\n”; • $i = $i + 1; • } • for ( $i = 1; $i <= 100; $i = $i + 1 ) { • print $i,”\n”; • $i = $i + 1; • }

  24. Don’t include the newline String concatenation Reverse Complement • # Calculate the reverse complement • $dna = <STDIN>; • $revcomm = “”; • for ( $pos=0; $pos<length($dna)-1; $pos = $pos + 1 ) { • $base = substr( $dna, $pos, 1 ); • if ( $base eq ‘A’ ) { $base = ‘T’; } • elsif ( $base eq ‘T’ ) { $base = ‘A’; } • elsif ( $base eq ‘C’ ) { $base = ‘G’; } • else { $base = ‘C’; } • $revcomm = $revcomm . $base; • } • print $revcomm,”\n”;

  25. Treat each argument on the command line as a file name. Open the files one at a time and step through them a line at a time Print the current line if it contains the string “blue” Perl IS Different • while ( <> ) { • print if /blue/; • }

  26. Your Turn!! • Change the reverse complement program so that • It reads the DNA strings from a file whose name is supplied on the command line. You may assume that each DNA string is on a separate line • Instead of calculating the reverse complement starting at the beginning of the string, your program must start at the end of the DNA and work towards the front

  27. Lists • A list is an object consisting of a sequence of values • 1, 2, 3, 5, 7, 11, 13, 17, 19, 23 • 1..10 • ‘a’..’z’ • A list that has been given a name is called an array • @small_primes = (1, 2, 3, 5, 7, 11, 13, 17); • The individual elements of a list must be scalars

  28. A list with the first two Fibonacci numbers Add the previous two numbers to get the next one Extends the list and puts the next number there Numbers of items in the list Fibonacci @fibs = ( 1, 1 ); for ( $i = 2; $i <= 10; $i = $i + 1 ) { $fibs[ $i ] = $fibs[ $i - 1 ] + $fibs[ $i - 2]; } print “I calculated ",$#fibs," fibs\n"; print @fibs,"\n"

  29. Regular Expressions • Provide a way of writing a compact description of a set of strings • Sort of like wildcards • Single character patterns • A single character matches itself • A “.” matches any single character except newline • [characters] – matches any one of the characters • ^ means “does not match”

  30. Examples • G • [0123456789] • [0-9] • [a-zA-z] • [^0-9]

  31. Character Class Abbreviations

  32. Grouping Patterns • Sequence • abc • Multipliers • * - zero or more of the previous character • a*b  b, ab, aab, aaab, aaaab, … • + - one or more of the previous character • a+b  ab, aab, aaab, …

  33. My Problem XXXX, ROBERT 4653 N VCSG-4 rma9999 XXXXXX, ADAM 3976 N VCSG-4 716-555-4281 alb9999 XXXXXXX, EDWARD 4637 N VCSG-2 716-555-4780 esb9999 XXXXXXX, JOHN 1906 N VCSG-4 716-555-4780 XXXX, DERRICK 6432 N VCSG-2 716-555-3161 dxc9999 XXXXXXXXX, JOHN 5034 N VCSG-2 716-555-3894 jak9999 XXX, JASON 9020 N VCSG-2 716-555-3145 jsl9999 XXXXXXX, SARAH 7610 N VCSG-2 716-555-3147 sem9999 XXXXXXXX, CHRISTOPHER 6309 N VCSG-2 716-555-3427 cco9999 XXXXXXX, MICHAEL 8195 N VCSG-2 716-555-3166 mpp9999 XXXXXX, SHAUN 9925 N VCSG-2 716-555-3145 sls9999 XXXXXX, WILLIAM 2568 N VCSG-2 716-555-3144 wjw9999 XXXXXX, PATRICK 2335 N EECC-2 716-555-3144 psw9999

  34. XXXXXXX, EDWARD 4637 N VCSG-2 716-555-4780 esb9999 Match 1 or more non-comma characters Match 1 or more non-whitespace characters Match 4 digits Match 0 or more non-whitespace characters (the fields may not be in the input Match anything!! Roster to CSV while(<>) { ($last,$first,$id,$ntid,$gradeType,$program,$phone,$email)= /([^,]+), (\S+) (\d{4}) (\S*) (\S*) (\S+) (\S*) (\S*).*/; print "\"$last,$first\",$id,$program,$email\@cs.rit.edu\n"; }

  35. The Result "XXXX,ROBERT",4653,VCSG-4,rma9999@cs.rit.edu "XXXXXX,ADAM",3976,VCSG-4,alb9999@cs.rit.edu "XXXXXXX,EDWARD",4637,VCSG-2,esb9999@cs.rit.edu "XXXXXXX,JOHN",1906,VCSG-4,@cs.rit.edu "XXXX,DERRICK",6432,VCSG-2,dxc9999@cs.rit.edu "XXXXXXXXX,JOHN",5034,VCSG-2,jak9999@cs.rit.edu "XXX,JASON",9020,VCSG-2,jsl9999@cs.rit.edu "XXXXXXX,SARAH",7610,VCSG-2,sem9999@cs.rit.edu "XXXXXXXX,CHRISTOPHER",6309,VCSG-2,cco9999@cs.rit.edu "XXXXXXX,MICHAEL",8195,VCSG-2,mpp9999@cs.rit.edu "XXXXXX,SHAUN",9925,VCSG-2,sls9999@cs.rit.edu "XXXXXX,WILLIAM",2568,VCSG-2,wjw9999@cs.rit.edu "XXXXXX,PATRICK",2335,EECC-2,psw9999@cs.rit.edu

More Related