1 / 25

Dot Plot

Dot Plot. Dot Plot. Goal. We will take two nucleotide base strings and look for common patterns – stretches where the bases match. GAATTCATACCAGATCACCGAAAACTGTCCTCCAAATGTGTCCCCCTCACACTCCCAAAT TCGCGGGCTTCTGCTCTTAGACCACTCTACCCTATTCCCCACACTCACCGGAGCCAAAGC.

nora-franco
Télécharger la présentation

Dot Plot

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dot Plot

  2. Dot Plot

  3. Goal • We will take two nucleotide base strings and look for common patterns – stretches where the bases match. • GAATTCATACCAGATCACCGAAAACTGTCCTCCAAATGTGTCCCCCTCACACTCCCAAAT • TCGCGGGCTTCTGCTCTTAGACCACTCTACCCTATTCCCCACACTCACCGGAGCCAAAGC

  4. Start by entering the two sequences in question in Excel

  5. Use the LEN Function to determine the length of the string

  6. Set up a grid – mine was 60-by-60 since the lengths were 60

  7. Enter the length of match one is seeking – start with 1

  8. Enter the formula to look for matches

  9. Anatomy of the formula (Part 1) • =IF(MID($B$1,E$3,$B$4)=MID($B$2,$D4,$B$4),1,0) • Recall MID takes a string $B$1 is the first base sequence and $B$2 is the second base sequence • Then MID takes a part of the string beginning at the “second argument”

  10. Anatomy of the formula (Part 2) • =IF(MID($B$1,E$3,$B$4)=MID($B$2,$D4,$B$4),1,0) • The starting point varies. • E$3 stays in the third row as the formula is copied and uses the various numbers 1 through 60 set up in row 3. • $D4 stays in column D and uses the various numbers 1 through 60 set up in column D.

  11. Anatomy of the formula (Part 3) • The third argument is the length of the match we seek. They are both the same length. • If the two “substrings” (base mini sequences) match, output a 1, otherwise a zero. • Then copy the formula throughout the grid.

  12. With formula copied

  13. Next add some conditional formatting rules

  14. Result of Conditional Formatting

  15. We are we looking for? • In dot plots, one looks for dots (for us colored cells) along diagonals. • A “long” diagonal means that the mini base sequences within the longer sequence match.

  16. Change the length to eliminate some of the “noise”

  17. Increasing the length of the substring match

  18. Question • What is the longest match between these two sequences?

  19. Problem • We are looking for diagonal matches; however, increasing the length of the match only allows only one of the two diagonal types to survive.

  20. New Sheet: Enter one string and also make column of descending numbers

  21. Enter formula that takes one letter at designated position

  22. Use the concatenate formula to create the reversed string

  23. Use Copy/Paste Special/Values to enter reversed string

  24. Repeat the analysis looking for matches between one original and one reversed string

  25. Question • What is the longest match between these one of the original sequences and one of the reversed sequences?

More Related