150 likes | 266 Vues
This guide provides an overview of the three basic data types in Perl: scalars, arrays (lists), and associative arrays (hashes). It covers the intricacies of associative arrays, including how to define key-value pairs, check for key existence, and retrieve keys and values. Additionally, the document introduces subroutines and functions, illustrating how to organize code effectively. Regular expressions are also detailed, showcasing their utility in pattern matching and text manipulation—key skills for bioinformatics applications. Perfect for beginners seeking to enhance their Perl programming skills.
E N D
Basic Data Types • Perl has three basic data types: • scalar • array (list) • associative array (hash)
Associative Arrays/Hashes • List of scalar values (like array) • Elements referred to by key, not index number • Elements stored as a list of key-value pairs %threeletter = ('A','ALA','V','VAL','L','LEU'); key value key value key value print $threeletter{'A'};# “ALA” print $threeletter{'L'};? • exists checks if a specific hash key exists if ($threeletter{'E'}) print ($threeletter{'E'}); ? print "Exists\n" if exists $array{$key}; print "Defined\n" if defined $array{$key}; print "True\n" if $array{$key};
Getting all keys and values in a hash %threeletter = ('A','ALA','V','VAL','L','LEU'); • keys returns a list of all keys • values returns a list of all values • each returns one key-value pair each time it’s called ($key, $val) = each %threeletter; • Unlike array, not an ordered list (order of key-value pairs determined by the Perl interpreter) foreach $k ( keys %threeletter ) { print $k;} # Might return, for instance, “A L V”, # not “A V L” (need not to be sorted) foreach $v ( values %threeletter ) { print $v;} ?
Associative Arrays • Some common functions: • keys(%hash) #returns a list of all the keys • values(%hash) #returns a list of all the values • each(%hash) #each time this is called, it will #return a 2 element list #consisting of the next #key/value pair in the array • delete($hash{[key]}) #remove the pair associated #with key
More on Perl • Subroutines and Functions • A way to organize a program • Wrap up a block of code • Have a name • Provide a way to pass values to the block and report back the results • Regular expression
Basics about Subroutines • # define a subroutine sub myblock { my ($arg1, $arg2, $arg3, …, $argN) = @_; # @_ is special variable containing args print "Please enter something: "; } • # function call myblock($arg1, $arg2, …, $argN); • Example sub add8A { my ($rna) = @_; $rna .= "AAAAAAAA"; return $rna; } #the original rna $rna = "CGAAUCUAGGAU"; $longer_rna = add8A($rna); print "I added 8 As to $rna to get $longer_rna.\n";
More example sub denaturizing { my (@products) = @_; my @strands = (); foreach $pairs (@products) { ($A,$B) = split /\s/, $pairs; @strands = (@strands, $A, $B); } return @strands; } #templates are in the form "A B". Ex. “ACGT TGCA” @Denatured = denaturizing(@PCRproducts);
Variables Scope • A variable $a is used both in the subroutine and in the main part program of the program. $a = 0; print "$a\n"; sub changeA { $a = 1; } print "$a\n"; changeA(); print "$a\n"; • The value of $a is printed three times. Can you guess what values are printed? • $a is a global variable use strict; my $a = 0; print "$a\n"; sub changeA { my $a = 1; } print "$a\n"; changeA(); print "$a\n";
Ex: What would be the output? #!/usr/bin/perl -w $dna = 'AAAAA'; $result = A_to_T($dna); print "I changed all the A's in $dna to T's and got $result\n\n"; ############################################# # Subroutines sub A_to_T { my($input) = @_; $dna = $input; $dna =~ s/A/T/g; return $dna; } Output?
Regular Expressions • Regular Expressions: Language for specifying text strings • Regular Expressions is a mechanism for specifying character patterns • Useful for • Finding files by name • Finding text in a file • Finding (or not finding) interesting text in a string • Text based search and replace • Finding and extracting text
Pattern Finding Problem: find an ORF in nucleotide sequence • Look for start (ATG) and stop codons (TAA, TAG, TGA) • Pattern search operator: m// or // • $string =~ /<pattern>/returns true if the pattern matches somewhere in $string, false otherwise • Example: $dna = "GATGCCATGACACTGTTCA"; if ($dna =~ /ATG/){ print "starting codon is there"; } else { print "no starting codon!\n"; }
*+ Stephen Cole Kleene Regular Expressions • Optional characters ? ,* and + • /colou?r/ colororcolour • ? (0 or 1) • /oo*h!/ oh!orooh!orooooh! • * (0 or more) • /o+h!/ oh!orooh!orooooh! • + (1 or more) • Wild cards . • /beg.n/ beginorbeganorbegun
Common Regular Expressions White-space characters \t (tab), \n (newline), \r (return) \s : match a whitespace character x : character 'x' . : any character except newline ^r : match at beginning of line r$ : match at end of line r|s : match either or (r) : group characters (to be saved in $1, $2, etc) [xyz] : character class, in this case, matches either an 'x', a 'y', or a 'z' [abj-oZ] : character class with a range in it; matches 'a', 'b', any letter from 'j' through 'o', or 'Z' r* : zero or more r's, where r is any regular expression r+ : one or more r's r? : zero or one r's (i.e., an optional r) {name} : expansion of the "name" definition rs : RE r followed by RE s (e.g., concatenation)
Exercise Ex1: $dna = AGGCTCGTACGACG; if( $dna =~ /CT[CGT]ACG/ ) { print "I found the motif!!\n"; #? } Ex2: Find an ORF in nucleotide sequence (look for start (ATG) and stop codons (TAA, TAG, TGA)) $dna = "tatggagcctcctgaggctacagccacacctgagccactctaaga"; ?