1 / 35

Computer Programming for Biologists

Computer Programming for Biologists. Class 9 Dec 5 th , 2013 Karsten Hokamp http://bioinf.gen.tcd.ie/GE3M25. Computer Programming for Biologists. Overview. revision variable scope course project file handles. Computer Programming for Biologists. Revision - Pragmas.

vonda
Télécharger la présentation

Computer Programming for Biologists

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Programming for Biologists Class 9 Dec 5th, 2013 Karsten Hokamp http://bioinf.gen.tcd.ie/GE3M25

  2. Computer Programming for Biologists Overview • revision • variable scope • course project • file handles

  3. Computer Programming for Biologists Revision - Pragmas • Start of Perl scripts: • use warnings; • use strict; • Changes behaviour of Perl • Leads to better programming practice

  4. Computer Programming for Biologists Revision - Hashes my %seq = (); # initialisation $freq{$char} = 0; # storing a value $freq{$char}++; # changing a value my $aa = $code{$codon}; # extracting foreach my $header (sort keys %seq) { my $seq = $seq{$header}; … }

  5. 0 0 0 0 my $A = 0; my $C = 0; my $G = 0; my $T = 0; • A C G T Hash Variables Scalars vs Hash Initialisation of values: my %frequency = ();

  6. 1 C Hash Variables Scalars vs Hash Increment: %frequency C if ($char eq 'A') { $A++; } elsif ($char eq 'C') { $C++ } elsif ($char eq 'G') { $G++; } elsif ($char eq 'T') { $T++; } 1 $frequency{$char}++;

  7. 5 9 7 5 • A C G T Hash Variables Scalars vs Hash %frequency T G 9 C A

  8. 5 7 9 5 • A T G C Hash Variables Scalars vs Hash %frequency Output: T G print "Frequency of A: $A"\n; print "Frequency of C: $C"\n; print "Frequency of G: $G"\n; print "Frequency of T: $T"\n; 9 C A foreach my $char (keys %frequency) { print "Frequency of $char: $frequency{$char}++; }

  9. Computer Programming for Biologists Revision - Subroutines my $prot = &translate($seq); # call sub translate {# definition my $seq = shift @_; # parameters … return $prot; # return value(s) }

  10. Computer Programming for Biologists scope • The area of the script in which a variable is visible. • Different blocks defined by: • main program • subroutines • loops • branches •  different namespaces

  11. Computer Programming for Biologists scope my $global_1; … while (my $input = <>) { statement a; } if (condition) { my $local_1 = 'xxx'; } sub subroutine { my $local_1; foreach my $nuc (@bases) { statement a; } } main part

  12. Computer Programming for Biologists scope my $global_1; … while (my $input = <>) { statement a; } if (condition) { my $local_1 = 'xxx'; } sub subroutine { my $local_1; foreach my $nuc (@bases) { statement a; } } Blocks

  13. Computer Programming for Biologists scope Tip: Keep local variables within subroutines  explicitly pass content between main part and subs, e.g.: $num = &split($seq); value returned from subroutine value passed into subroutine  avoid accidentally changing global variables

  14. Computer Programming for Biologists scope # extract header while ($input = <>) { if ($input =~ /^>(.+)/) { my $header = $1; } } print "sequence ID: $header\n"; Wrong: Global symbol "$header" requires explicit package name

  15. Computer Programming for Biologists scope # initialize global variable my $header = ''; # extract header while ($input = <>) { if ($input =~ /^>(.+)/) { $header = $1; } } print "sequence ID: $header\n"; Correct:

  16. Computer Programming for Biologists course project common errors: (scope) my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string my $dna .= $input; }

  17. Computer Programming for Biologists course project common errors: (scope) my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string my $dna .= $input; } different variables

  18. Computer Programming for Biologists course project common errors: (scope) my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string $dna .= $input; } same variable

  19. Computer Programming for Biologists course project common errors: (case) Use of uninitialized value $count{'G'} my $dna = ''; # read input while (my $input = <>) { # remove line ending chomp $input; # append to sequence string $dna .= uc $input; } change input to upper case

  20. Computer Programming for Biologists course project common errors: (arrangement) # print output in chunks of 60 bp width while ($dna) { $out = substr $dna, 0, 60, ''; print "$i $out\n"; $i += length($out); } # change string to array: my @chars = split //, $dna; empties $dna

  21. Computer Programming for Biologists course project considerations: # form the reverse complement $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; # translate my $protein = &translate($dna); order is important # translate my $protein = &translate($dna); # form the reverse complement $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/;

  22. Computer Programming for Biologists course project considerations: # define variables: my $do_revcomp = ''; my $do_gc_content = ''; my $do_translate = ''; # read sequence my $dna = ''; … # calculate GC content if ($do_gc_content) { &gc_content($dna); } # form the reverse complement if ($do_revcomp) { $dna = reverse($dna); $dna =~ tr/ACTG/TGAC/; } # translate the protein if ($do_translate) { &translate($dna); } make actions optional

  23. Computer Programming for Biologists course project Work on your course project (sequanto.pl): correct errors add "choice" variables at the top move code blocks into subroutines (GC-content, translation)

  24. Data Input/Output Redirect output Program prints output to screen: $ translate.pl seq.fa MGSAILSALLSRRSQRATTIIYHYARITTQRAHGLCDII… Redirect into file: $ translate.pl seq.fa > seq.aa Append to file: $ translate.pl seq.fa >> seq.aa

  25. Data Input/Output Filehandles Reading from STDIN, default input stream: my $in = <>; Use filehandle to read input from a specific file: open (IN, 'input.txt'); # open file for reading while (my $in = <IN>) { … } # read content line by line close IN; # close filehandle when finished filehandle

  26. Data Input/Output Filehandles Syntax: open (FH, filename); # open file for reading open (FH, "< filename"); # open file for reading open (FH, "> filename"); # open file for writing open (FH, ">> filename"); # append to file close FH; # empties buffer Write and append mode will create files if necessary Write mode will empty file first

  27. Data Input/Output Writing to files $file_name = 'results.txt'; if ($write_modus eq 'append') { # append to file (creates file if necessary) open (OUT, ">>$file_name"); } else { # normal write (erases content if file exists) open (OUT, ">$file_name"); } print OUT 'some text'; close OUT; # output might not appear until FH is closed!

  28. Data Input/Output Error check! Always test if an important operation worked out: open (IN, $file_name) or die "Can't read from $file_name: $!"; open (OUT, ">>$file_out") or die "Can't append to $file_out: $!"; # Note: special variable $! contains error message

  29. Data Input/Output Reading from Filehandle One or more file names are specified after the program, loop over each argument: foreach my $file (@ARGV) { # special variable @ARGV open (IN, $file) or die; # open filehandle while (my $in = <IN>) { # read file line by line # do something } close IN; # close filehandle }

  30. Computer Programming for Biologists Reading sequence from a file # get pattern my $pattern = shift @ARGV; my $sequence = ''; while (<>) { next if (/^>/); chomp; $sequence .= $_; } Note: two command line arguments! $ perl split.pl gcctg test.fa # read pattern and sequence my ($pattern, $file) = @ARGV; open (IN, $file) or die "$!"; my $sequence = ''; while (<IN>) { next if (/^>/); chomp; $sequence .= $_; } close IN;

  31. Computer Programming for Biologists course project Work on your course project (sequanto.pl): Add explicit opening of file-handle Store translated sequence into a new file

  32. Computer Programming for Biologists Control through options Perl module Getopt::Long allows processing command line options.

  33. Computer Programming for Biologists Control through options $man Getopt::Long NAME Getopt::Long - Extended processing of command line options SYNOPSIS use Getopt::Long; my $data = "file.dat"; my $length = 24; my $verbose; $result = GetOptions ("length=i" => \$length, # numeric "file=s" => \$data, # string "verbose" => \$verbose); # flag

  34. Computer Programming for Biologists Control through options Try this in your script: use Getopt::Long; my $do_translation = ''; my $do_gc_content = ''; &GetOptions("translate" => \$do_translation, "gc" => \$do_gc_content, );

  35. Computer Programming for Biologists Exam Reminder Exam 2: Thu, Dec 12th, 11 -1 pm

More Related