1 / 7

String ($ var ) arrays (@array) conversion and substring extraction

String ($ var ) arrays (@array) conversion and substring extraction. Lecture 6. Split strings. This function can be used to split (divide) data: Strings into an arrays. Strings into a list of scalars ($variables)

samira
Télécharger la présentation

String ($ var ) arrays (@array) conversion and substring extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. String ($var) arrays (@array) conversion and substring extraction Lecture 6

  2. Split strings • This function can be used to split (divide) data: • Strings into an arrays. • Strings into a list of scalars ($variables) • It can also split each character of a string by using “” as the deliminiter. • >192a8, the lactose gene, e. coli, cambridge university, january 1981 • chomp($line = <>); # read the line into $line • @fields = split ‘,’,$line; #splits a String into an array • ($clone,$laboratory,$left_oligo,$right_oligo) = split ‘,’,$line; • See SplitExample.pl

  3. Join: elements of an array/ • The join function is the reverse of the split: • Convert an array into a string • To transform arrays (lists) into strings: join • #initialize an array • @seq = (“aaaaaa",“tttttt",“cccccc",“ggggggg"); • $CombinedSeq = join ‘', @seq; • Result of the join is: • aaaaaattttttccccccggggggg • See JoinExample.pl

  4. Concatetion • To concatenate to strings you use the • =. Symbol • Seq1 is a null string: $seq = “”; • We can add (concatenate) a sequence to this by: • $seq .= $input_seq2 • It can be used to read in sequences and join them together so they form one string.

  5. Extracting substrings • Substr: a function to extracting a substring from a string. • Assume the string is: AAAAGGGGCCCCTTTT • To extract the sequence AGG (a codon) from the string we need: • Move to 4 positions [character} of the string] t. • Extract 3 characters or a 3 character substring • The syntax for perlsubstr (substring function) • $sub = substr ($string, offset position[position to begin extraction], size of substring) • Offset is zero based • # more details on substrings can be found at: • # http://perlmeme.org/howtos/perlfunc/substr.html • Extract words from a sentence: Substring.pl • Extract codon from a DNA seqeunce: substring.pl

  6. Perl Functions for determining the ORF of DNA sequences. • The Unpack function: this a function of the perl language that extracts sets of characters from a sequence of characters and assign them to an array. • So they can be used to extract groups of 3 bases from a DNA sequence. E.g.. open reading frames, and assign each set to an element of an array. • @triplets = unpack("a3" x (length($line)/3), $line); • To determining all possible open reading frames (ORFs) for a DNA sequence (reading frame 1, reading frame 2 and reading frame 3) one needs to shift one base when going from reading frame 1 to reading frame 2 and the same when going from reading frame 2 to reading frame 3 subsequent • Frame Shift (1positions to the right) • @triplets = unpack(‘a1’ . “a3” x (length ($line)/3),$line); • Remember if there are only 2 characters at the end/ beginning of a sequence. Unpack will still assign them to an element of the array. If using hash tables do not forget an exist function may be required, • See Unpack_codons.pl (Run to show the output)

  7. Sample Exercise • Write a script to read in the contents of a fasta file (without descriptor line) and print it out as a string containing all the DNA bases/ Amino acids • Modify the unpack function to use substrings instead of unpack.

More Related