150 likes | 279 Vues
This document explores the fundamental data types in Perl, specifically scalar values, arrays (lists), and associative arrays (hashes). It delves into the use of hashes, how to check for keys, retrieve keys and values, and common functions related to them. Moreover, it introduces subroutines, their syntax, and practical examples to illustrate their use in organizing code. Finally, we cover regular expressions in Perl, useful for text and sequence manipulation, including pattern matching required in bioinformatics.
E N D
Basic Data Types • Perl has three basic data types: • scalar • array (list) • associative array (hash)
Associative Arrays/Hashes • List of scalar values (like array) • Elements referred to by key, not index number • Elements stored as a list of key-value pairs %threeletter = ('A','ALA','V','VAL','L','LEU'); key value key value key value print $threeletter{'A'};# “ALA” print $threeletter{'L'};? • exists checks if a specific hash key exists if ($threeletter{'E'}) print ($threeletter{'E'}); ? print "Exists\n" if exists $array{$key}; print "Defined\n" if defined $array{$key}; print "True\n" if $array{$key};
Getting all keys and values in a hash %threeletter = ('A','ALA','V','VAL','L','LEU'); • keys returns a list of all keys • values returns a list of all values • each returns one key-value pair each time it’s called ($key, $val) = each %threeletter; • Unlike array, not an ordered list (order of key-value pairs determined by the Perl interpreter) foreach $k ( keys %threeletter ) { print $k;} # Might return, for instance, “A L V”, # not “A V L” (need not to be sorted) foreach $v ( values %threeletter ) { print $v;} ?
Associative Arrays • Some common functions: • keys(%hash) #returns a list of all the keys • values(%hash) #returns a list of all the values • each(%hash) #each time this is called, it will #return a 2 element list #consisting of the next #key/value pair in the array • delete($hash{[key]}) #remove the pair associated #with key
More on Perl • Subroutines and Functions • A way to organize a program • Wrap up a block of code • Have a name • Provide a way to pass values to the block and report back the results • Regular expression
Basics about Subroutines • # define a subroutine sub myblock { my ($arg1, $arg2, $arg3, …, $argN) = @_; # @_ is special variable containing args print "Please enter something: "; } • # function call myblock($arg1, $arg2, …, $argN); • Example sub add8A { my ($rna) = @_; $rna .= "AAAAAAAA"; return $rna; } #the original rna $rna = "CGAAUCUAGGAU"; $longer_rna = add8A($rna); print "I added 8 As to $rna to get $longer_rna.\n";
More example sub denaturizing { my (@products) = @_; my @strands = (); foreach $pairs (@products) { ($A,$B) = split /\s/, $pairs; @strands = (@strands, $A, $B); } return @strands; } #templates are in the form "A B". Ex. “ACGT TGCA” @Denatured = denaturizing(@PCRproducts);
Variables Scope • A variable $a is used both in the subroutine and in the main part program of the program. $a = 0; print "$a\n"; sub changeA { $a = 1; } print "$a\n"; changeA(); print "$a\n"; • The value of $a is printed three times. Can you guess what values are printed? • $a is a global variable use strict; my $a = 0; print "$a\n"; sub changeA { my $a = 1; } print "$a\n"; changeA(); print "$a\n";
Ex: What would be the output? #!/usr/bin/perl -w $dna = 'AAAAA'; $result = A_to_T($dna); print "I changed all the A's in $dna to T's and got $result\n\n"; ############################################# # Subroutines sub A_to_T { my($input) = @_; $dna = $input; $dna =~ s/A/T/g; return $dna; } Output?
Regular Expressions • Regular Expressions: Language for specifying text strings • Regular Expressions is a mechanism for specifying character patterns • Useful for • Finding files by name • Finding text in a file • Finding (or not finding) interesting text in a string • Text based search and replace • Finding and extracting text
Pattern Finding Problem: find an ORF in nucleotide sequence • Look for start (ATG) and stop codons (TAA, TAG, TGA) • Pattern search operator: m// or // • $string =~ /<pattern>/returns true if the pattern matches somewhere in $string, false otherwise • Example: $dna = "GATGCCATGACACTGTTCA"; if ($dna =~ /ATG/){ print "starting codon is there"; } else { print "no starting codon!\n"; }
*+ Stephen Cole Kleene Regular Expressions • Optional characters ? ,* and + • /colou?r/ colororcolour • ? (0 or 1) • /oo*h!/ oh!orooh!orooooh! • * (0 or more) • /o+h!/ oh!orooh!orooooh! • + (1 or more) • Wild cards . • /beg.n/ beginorbeganorbegun
Common Regular Expressions White-space characters \t (tab), \n (newline), \r (return) \s : match a whitespace character x : character 'x' . : any character except newline ^r : match at beginning of line r$ : match at end of line r|s : match either or (r) : group characters (to be saved in $1, $2, etc) [xyz] : character class, in this case, matches either an 'x', a 'y', or a 'z' [abj-oZ] : character class with a range in it; matches 'a', 'b', any letter from 'j' through 'o', or 'Z' r* : zero or more r's, where r is any regular expression r+ : one or more r's r? : zero or one r's (i.e., an optional r) {name} : expansion of the "name" definition rs : RE r followed by RE s (e.g., concatenation)
Exercise Ex1: $dna = AGGCTCGTACGACG; if( $dna =~ /CT[CGT]ACG/ ) { print "I found the motif!!\n"; #? } Ex2: Find an ORF in nucleotide sequence (look for start (ATG) and stop codons (TAA, TAG, TGA)) $dna = "tatggagcctcctgaggctacagccacacctgagccactctaaga"; ?