1 / 31

CRN: 84250----FW4500 Bioinformatics Programming & Skills Time: MWF 1:05 pm – 1:55 pm

CRN: 84250----FW4500 Bioinformatics Programming & Skills Time: MWF 1:05 pm – 1:55 pm Where: School of Forest Resource, Rm 143 Instructor: Hairong Wei Assistant Professor of Plant Bioinformatics, Molecular Biology, and Genetics. Why I need to take this course?

baris
Télécharger la présentation

CRN: 84250----FW4500 Bioinformatics Programming & Skills Time: MWF 1:05 pm – 1:55 pm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CRN: 84250----FW4500 Bioinformatics Programming & Skills Time: MWF 1:05 pm – 1:55 pm Where: School of Forest Resource, Rm 143 Instructor: Hairong Wei Assistant Professor of Plant Bioinformatics, Molecular Biology, and Genetics

  2. Why I need to take this course? To learn the most efficient language for manipulating text and data file; To learn the most efficient language for extracting information from various data resources and outputs from any tools & software; To add an expertise to your CV; A must-have skill for doing bioinformatics, and biology research. Open an avenue to your career. A pipe-line developing language. It is easy to call other language from Perl and the intermediate results can be processed, stored and retrieve easily. 7. Rarely taught in most universities.

  3. The most useful skills for a bioinformatian (1). working knowledge of biology and its applications; (2). proficiency in computer languages; (3). skills in data mining; (4). skills in data visualization; (5). experience with systems biology tools; (6). experience in using bioinformatics resources. In this course, we aim to provide students with the opportunities to learn skills of (2), (6) and (1).

  4. Upon completion of this course, students will be able to: • Prepare large-scale expression and sequence data for bioinformatics analyses • Manipulate files and directories • Use arrays and array functions to solve a wide variety of problems • Use hashes to solve commonly encountered problems • Use the powerful regular expression capabilities of Perl • Extract useful information from various outputs • Manipulation of gene annotation • Take advantage of Perl's powerful system interface • Use Modules from the standard Perl distribution • Use Perl references • Write pipeline to do bioinformatics tasks

  5. Requirements: Homework------5 homework assignments 25% Projects----------2 projects 30% Mid-term exam 15% Final exam 25% Participation 5% Late Assignments One day delay, 10% off two-day delay, 30% off three-day delay 60% off more 100% off

  6. Unix /Linux essential (http://www.computerhope.com/unix.htm) 1. Text editors----emacs, vi, vim, pico 2. emacsmy_perl.pl exit: ctrl-x-c Look at first ten rows of a file: head -10 data.txt Look at last ten rows of a file: tail -10 data.txt Search a word: ctrl-s, then type a word for searching grep “word4search” file.txt Move to the end of file: esc first then push “shift” down then push “>” or “<“ Count how many rows you have in a file: cat file.txt | wc how many uniq rows you have in a file? cat file.txt | sort | uniq | wc wc---count characters, words, lines Look at memory: top type q to quit Content of current dir: ls –l or ls –la Run program without disrupt when you logout or close you terminal. ----screen or nohup command & nohup---continuing a job after logout

  7. wget http://…./filename.gz tar –cvffolder.tar folder or tar –tvzffoo.tar.gz gzipfolder.tar gunzipfolder.tar.gz tar –xvffolder.tar or tar –xvzffoo.tar.gz screen and nohup pwd---find path sort ---sort file How do you sort file according to field x? %cat file.txt | sort –k 3,3 17. Quota –v ---find out you available disk space 18. scpfile.txthairong@pandora.ffr.mtu.edu: 19. df -k summarize free disk space 20. du - summarize disk space used 21. env

  8. Bioinformatics Resources: NCBI----Blast, EST, mRNA sequences, genome sequences. UCSC---Blat Ensembl---Biomart, Sahha EBI----protein domain analysis ProSCAN TIGR---fungi and microbe genomes TAIR, NSGA-----Arabidopsis Maize Genome Project Rice Poplar

  9. What is “Perl”? ---Practical Extraction and Report Language---Larry Wall Check if perl is installed in your machine: perl –V perl –V:startperl Features of Perl Flexible syntax Hard to read ---partly because of modules Clever Slower than C Many modules-----CPAN

  10. A simple Perl program #!/usr/bin/perl –w use warnings; use strict; print “I love bioinformatics\n”;

  11. A simple Perl program #!/usr/bin/perl –w # enable the warning use warnings; use strict; # load the strict module for strict syntax checks print “I love bioinformatics\n”; How to make it executable? $chmod +x simple_perl.pl $./simple_perl.pl $mv simple_perl.pl simple $./simple

  12. 2. Global variables ----$ $a, $b, $var, $A1, $signal, $exp, $tmp_1, $_ etc. all are legal variable Alphanumeric up to a total of 251 characters in length Illegal variables: $5dollars, $big-var Local variables are not subjected to these rules. 3. Array----@ @first_ array =(1, 2, 3, 4, 5); @sec_arr=(‘Tom’, 89, ‘little-foot’, 95); @third_arr=qw(bar jar car ear far var mar) # qw: quota of words $array[index]=$element_2_store; 4. Hash---% %fun_figures=( Mouse=> ‘Jerry’, Cat=> ‘Tom’, Dog=> ‘Spike’); %mid_term=( Jim Carr, ’85’, Tim Hall, ‘98’, Simplson , ‘71’); $hash{$a_key} = $element_4_store; What happens if you store two different items with the same key?

  13. 5. Subroutine---& (We will discuss this late) 6. How to use variables, arrays, hashes? Some examples of perl scripts ----show in classes 7. References my $scalar_ref=\$variable; my $array_ref=\@array; my $hash_ref=\%hash; my $subroutine_ref=\&a_subroutine; Dereferencing $variable = $$scalar_ref; @array = @$array_ref; %hash = %$hash_ref;

  14. Week 1: Lecture 2: Operators: 1. String concatenation: $str=“fred” . “\t” . “barney”; Now $str is a string of “fred barney” $str=“fred” . “|” . “barney”; Now $str contains a string of “fred|barney”

  15. 2. Comparison: Numeric String Return ------------------------------------------------------------------------------- != ne not equal > gt greater == eq equal >= ge greater or equal < lt less than <= le less than or equal <=> cmp compare

  16. 3. Logical and Bitwise Logical operators Bitwise Operators ----------------------------------------------------------------- && & AND || | OR xor ^ Exclusive ! ~ Not For example, if ($file1 && $file1) , return TURE if both $file1 and $file2 exist 3=011, 6=110 3&6 = 010=2 3|6=111=7

  17. 4. Arithmetic $x=4**0.5 =2 $x=4**2=16 power $x=9%2=1 modulus: remainder upon dividing 5. how to use arrow? A. Look for a hash value in a hash $value=$hash_ref->{key} B. Take a slice of an array @slice = $array_ref -> [5..10]; C. Get the first element of subroutine returning array reference $result = returned_array_ref() -> [0];

  18. 6. String manipulation a. Length $str b. substr ($str, offset, len); returns all characters in the string after the designated offset from the start of the passed string up to the number of characters designated by LEN c. Substr ($str1,offset, len, $str2) Replaces the part of the string beginning at OFFSET of the length LEN with the REPLACEMENT string.

  19. #!/usr/bin/perl -w $temp = substr("okay", 2); print "Substring value is $temp\n"; Substring value is ay $temp = substr("okay", 1,2); print "Substring value is $temp\n"; Substring value is ka $sentence = "The quick brown fox jumps over the lazy dog.";$chunk = substr($sentence, 4, 5); #quick

  20. d. Find substrings with index and rindex $_=“It ‘s a Perl PerlPerlPerl World” $left = index $_, ‘Perl’ # 7 $right=rindex $_, ‘Perl’ #22 $str=“It’s a Perl word” $substr=substr($str, index($str, ‘Perl’), 4) # Perl e. uc $strlc $str get the upper or low case of the string

  21. f. Split Split a row or line into multiple fields according delimitor you specify Usually “TAB or space” Ptp.3328.1.S1_s_at bZIPfamily transcription factor; 221.3727 168.3524 96.88159 PtpAffx.1578.1.S1_at similar to zinc finger protein (PMZ) 100.5228 123.7725 85.54334 PtpAffx.200456.1.S1_at DRE binding protein (DREB1A); 121.7867 142.2339 16.21638 PtpAffx.202271.1.S1_s_at bHLHprotein; 118.736 146.9658 48.37343 PtpAffx.215817.1.S1_at putative protein; 46.99999 77.77928 19.88638 PtpAffx.22673.2.A1_at bHLHprotein; 147.6356 163.8369 73.30122 @field=split(/\t/, $_); # Split the current line $_ by TAB $field[0] contains Ptp.3328.1.S1_s_at $field[1] contains bZIP family transcription factor; $field[2] contains 221.3727 …. @field=split; #split $_ by TAB and space $field[0] contains Ptp.3328.1.S1_s_at $field[1] contains “bZIP” $field[2] contains “family” $field[3] contains “transcription” Space TAB

  22. 7. Array manipulation Shift unshift pop push Array of list push(@array, $new_element); $element_at_rightmost = pop(@array); unshift(@array, $new_element); $element_at_leftmost = shift (@array); @array=(1,2,3) @reverse_array = reverse(@array); @sorted_array=sort(@array);

  23. Special array @ARGV Running your code from command line $perl my_perl.pl input_file1.txt input_file2.txt Inside your program, you can get the input file names from @ARGV $ARGV[0] = input_file1.txt $ARGV[1] = input_file2.txt

  24. Lecture 3: Input and output with filehandlers How to open a file? open (MHD, “myfile” ) or die “Cannot open the file: $!”; #explicit filename Or open MHD $filename # filename in variable while (<MHD>) { print “$_\n” } When the open fails, the reason is stored in the special variable $!. So print $! will help you to learn why it fails. Some examples

  25. 2. Open file using shift #!/usr/bin/perl -w Use warnings; use strict; my $infile=shift; my @fields; open (IN, "$infile") || die "Can not open input file -- $infile \n"; while (<IN>){ chomp; @fields = split(/\t/, $_); # @columns = split; print "@fields\n"; }

  26. 3. Open file with Getoptmodule #! /usr/bin/perl –w use Getopt::Std %opt=(); Getopts(“hm:n:o:”, \%opt); Open (FH4M, “$opt{m}”) or die “Cannot open the input file: $!”; while (<FH4M>) { chomp(); print “$_\n”; # $_ current row }

  27. 4. Open file using IO::File module #!/usr/bin/perl –w use warnings; use strict; use IO::File; $FH = new IO::File; # create a file handler object $FH->open(“> myfile”) or die “ Unable to open :$!”; Or $FH= new IO::File(“> myfile” ) or die “Unable to open: $!”; $FH->close(); Or $FH->open ($anotherfile, “>”);

  28. Open mode: < r read only > w write only >> wac write, append, and create +< r+ read and write only +> w+ write, create, and truncate +>> a+ write, append, and create

  29. Open FH “>$file”; #open file for writing “>”: Open for write access. Creates the file if it does not exist, otherwise destroys the current file “>>”: open the file for appending access. Create the file if it does not exist, otherwise open for appending “<“ open for read access “+<“: Open a file for read and write access. If the file does not exists, the open fails. If exists, the current contents are preserved and both read and writing start from the beginning of file. Use only when we want to open and write over the existing contents. “+>” : Open a file for read and write access. If the file does not exist, it is created. If the file exists, the current content will be truncated and lost. Use when we want create a new file that will first be written to and later read from. “+>>” create file if not exist. If exists, both read and write start from the end of file. Read may start anywhere in the file in some platform

  30. How to judge the type of a file? Chop ($filename = <STDIN>); While (<>) { if (-e $filename) { print “The file or directory exists” } else { print “ the file or directory DOES not exist” } -d is directory -f is a plain file -B file is binary -x file or directory is executable -r file or directory is readable -w fiel or directory is writable

More Related