1 / 27

Suffix Trees and Suffix Arrays

Suffix Trees and Suffix Arrays. OUTLINE. Suffix trees Suffix arrays. Suffix trees. Indexing techniques are used to locate highest – scoring alignments. One method of indexing uses the suffix tree. Suffix is the short sub-sequence. Suffix trees. Problems:

bishop
Télécharger la présentation

Suffix Trees and Suffix Arrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Suffix Trees and Suffix Arrays

  2. OUTLINE • Suffix trees • Suffix arrays

  3. Suffix trees • Indexing techniques are used to locate highest – scoring alignments. • One method of indexing uses the suffix tree. • Suffix is the short sub-sequence.

  4. Suffix trees • Problems: • Given a pattern P (sub-sequence) find all occurances of P in text S. • Given two strings find their longest common sub-string

  5. Suffix trees • Problems in Bioinformatics: • Multiple genome alignment • Identification of sequence repeats

  6. Suffix trees • Suffix tree: • For example: • S: abdfrg (length:6) • S has 6 suffixes: g, rg, frg, dfrg, bdfrg, abdfrg

  7. Suffix trees • Suffixes can be stored in a suffix tree and this tree.  in O(n) time (n: length of the string) • A string pattern of length m can be searched  in O(m) time

  8. Suffix trees • Suffix tree: • S = S[1…n] is a string of length n, • A suffix tree is a tree with n leaves, • n leaves represent n suffixes of the string, • ababc$

  9. Suffix trees • If a suffix is a prefix of another suffix we can not construct a tree with leaves as suffixes • xabxa xa and a are not leaf nodes.

  10. Suffix trees • Insert e special character (for example $) at the end of the string to solve the problem • xabxa$

  11. Suffix trees • How to construct suffix tree: • Assume we have a string S[1…n] • Start from the suffix S • For example consdier vbacxad$

  12. Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix S[2…n] • Which is bacxad$

  13. Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix S[3…n] • Which is acxad$

  14. Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix • Which is cxad$

  15. Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix • Which is xad$

  16. Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix • Which is ad$, we have a matching leaf (first character of acxad$). So split the edge

  17. Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix • Which is ad$, we have a matching leaf (first character of acxad$). So split the edge

  18. Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix • Which is ad$, we have a matching leaf (first character of acxad$). So split the edge

  19. Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix • Which is d$

  20. Suffix trees • How to construct suffix tree (cont.): • Enter the next suffix • Which is $

  21. Suffix trees • Suffix tree of vbacxad$:

  22. Suffix trees • Pattern match using suffix trees: • Try to match a pattern on a path, starting from the root: • The pattern does not match, • The match ends in a node u of the tree, • The match ends inside an edge.

  23. Suffix trees • Example: (considervbacxad$ ) • Suffixes: • vbacxad$ • bacxad$ • acxad$ • cxad$ • xad$ • ad$ • d$ • $

  24. Suffix trees • Example: (considervbacxad$ ) • Suffixes: • vbacxad$ • bacxad$ • acxad$ • cxad$ • xad$ • ad$ • d$ • $ • Search for: • cxa • a • xdb

  25. Suffix arrays • Considerthestring: • Thesuffixarray:

  26. Suffix arrays • Search is in mississippi$:

  27. References • M. Zvelebil, J. O. Baum, “Understanding Bioinformatics”, 2008, Garland Science • Andreas D. Baxevanis, B.F. Francis Ouellette, “Bioinformatics: A practical guide to the analysis of genes and proteins”, 2001, Wiley.

More Related