70 likes | 210 Vues
This lecture covers important string manipulation functions in Perl, focusing on how to split strings into arrays and lists, and how to join arrays back into strings. Learn about the `split` function for dividing strings, the `join` function for concatenating arrays, and the `substr` function for extracting substrings. Practical examples include reading lines from files, extracting specific sequences from DNA strings, and using the unpacking technique for groups of characters. Enhance your programming skills with these essential Perl functions for efficient data handling.
E N D
String ($var) arrays (@array) conversion and substring extraction Lecture 6
Split strings • This function can be used to split (divide) data: • Strings into an arrays. • Strings into a list of scalars ($variables) • It can also split each character of a string by using “” as the deliminiter. • >192a8, the lactose gene, e. coli, cambridge university, january 1981 • chomp($line = <>); # read the line into $line • @fields = split ‘,’,$line; #splits a String into an array • ($clone,$laboratory,$left_oligo,$right_oligo) = split ‘,’,$line; • See SplitExample.pl
Join: elements of an array/ • The join function is the reverse of the split: • Convert an array into a string • To transform arrays (lists) into strings: join • #initialize an array • @seq = (“aaaaaa",“tttttt",“cccccc",“ggggggg"); • $CombinedSeq = join ‘', @seq; • Result of the join is: • aaaaaattttttccccccggggggg • See JoinExample.pl
Concatetion • To concatenate to strings you use the • =. Symbol • Seq1 is a null string: $seq = “”; • We can add (concatenate) a sequence to this by: • $seq .= $input_seq2 • It can be used to read in sequences and join them together so they form one string.
Extracting substrings • Substr: a function to extracting a substring from a string. • Assume the string is: AAAAGGGGCCCCTTTT • To extract the sequence AGG (a codon) from the string we need: • Move to 4 positions [character} of the string] t. • Extract 3 characters or a 3 character substring • The syntax for perlsubstr (substring function) • $sub = substr ($string, offset position[position to begin extraction], size of substring) • Offset is zero based • # more details on substrings can be found at: • # http://perlmeme.org/howtos/perlfunc/substr.html • Extract words from a sentence: Substring.pl • Extract codon from a DNA seqeunce: substring.pl
Perl Functions for determining the ORF of DNA sequences. • The Unpack function: this a function of the perl language that extracts sets of characters from a sequence of characters and assign them to an array. • So they can be used to extract groups of 3 bases from a DNA sequence. E.g.. open reading frames, and assign each set to an element of an array. • @triplets = unpack("a3" x (length($line)/3), $line); • To determining all possible open reading frames (ORFs) for a DNA sequence (reading frame 1, reading frame 2 and reading frame 3) one needs to shift one base when going from reading frame 1 to reading frame 2 and the same when going from reading frame 2 to reading frame 3 subsequent • Frame Shift (1positions to the right) • @triplets = unpack(‘a1’ . “a3” x (length ($line)/3),$line); • Remember if there are only 2 characters at the end/ beginning of a sequence. Unpack will still assign them to an element of the array. If using hash tables do not forget an exist function may be required, • See Unpack_codons.pl (Run to show the output)
Sample Exercise • Write a script to read in the contents of a fasta file (without descriptor line) and print it out as a string containing all the DNA bases/ Amino acids • Modify the unpack function to use substrings instead of unpack.