1 / 23

Genome

Genome. Nucleus. Tissue. Cell. The chromosomes contains the set of instructions for alive beings. The chromosomes are the volumes of an encyclopedia called Genome. Chromosome. >human chromosome

rforan
Télécharger la présentation

Genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome Nucleus Tissue Cell • The chromosomes contains the set of instructions for alive beings • The chromosomes are the volumes of an encyclopedia called Genome

  2. Chromosome >human chromosome TACGTATACTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGCGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCCGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTCGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGGTACGATCGTCGATCGTCAGCTCGATACGTTACGATCTACGATTACGATCATCTATACTATACTATACGATATATCTAGATATCGATCTA.ACTCCATTCTTTAAACCGTACTACACACACTACTGATCGACGATTACGACGACGAAAGGGCCATATCGGCTAACTACATCATAGACAACATCACGGATCGTCTAAGGCCGAGTTAGGTACGATTAACGTACGACTACCTATCGTATATACATCACGGATATAACCTATCTACTACGATTAACACGATCTATCGTACGGCATATGCATCGTATAGCATCGATTAGAATACGTATACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGCGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGCTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGATGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTGTCACGTAGCATGCTGACGTACGATCGATTCGATCGATCGTACGATCGTAGCTAGCTAGTCGTAGCGACGTAGGATTCACGTAGCGATGCGTAGCGTAGCATGCTGACGATGCATCGATCGATGCATCATGCTAGCGTAGCTAGCTAGCATGACTGATCGATTAACGGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGCTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGATCGTATGCTAGCTAGCATGCATGCATGCATGCAT ………..

  3. Recuperació de la informació • Bioinformatics. Sequence and genome analysis • David W. Mount • Flexible Pattern Matching in Strings (2002) • Gonzalo Navarro and Mathieu Raffinot • Algorithms on strings (2001) • M. Crochemore, C. Hancart and T. Lecroq • http://www-igm.univ-mlv.fr/~lecroq/string/index.html

  4. String Matching String matching: definition of the problem (text,pattern) depends on what we have: text or patterns • Exact matching: • The patterns ---> Data structures for the patterns • 1 pattern ---> The algorithm depends on |p| and || • k patterns ---> The algorithm depends on k, |p| and || • Extensions • Regular Expressions • The text ----> Data structure for the text (suffix tree, ...) • Approximate matching: • Dynamic programming • Sequence alignment (pairwise and multiple) • Sequence assembly: hash algorithm • Probabilistic search: Hidden Markov Models

  5. Exact string matching: one pattern How does the string algorithms made the search? For instance, given the sequence CTACTACTACGTCTATACTGATCGTAGCTACTACATGC search for the pattern ACTGA. and for the pattern TACTACGGTATGACTAA

  6. Exact string matching: Brute force algorithm A T G T A A T G T A A T G T A A T G T A A T G T A A T G T A Example: Given the pattern ATGTA, the search is G T A C T A G A G G A C G T A T G T A C T G ...

  7. Exact string matching: Brute force algorithm • How the comparison is made? From left to right: prefix • Which is the next position of the window? The window is shifted only one cell Text : Pattern : Text : Pattern :

  8. Exact string matching: one pattern Text : Pattern : How does the matching algorithms made the search? There is a sliding window along the text against which the pattern is compared: At each step the comparison is made and the window is shifted to the right. Which are the facts that differentiate the algorithms? • How the comparison is made. • The length of the shift.

  9. Exact string matching: one pattern (text on-line) Experimental efficiency (Navarro & Raffinot) BNDM : Backward Nondeterministic Dawg Matching | | BOM : Backward Oracle Matching 64 32 16 Horspool 8 BOM BNDM 4 Long. pattern 2 w 2 4 8 16 32 64 128 256

  10. Horspool algorithm • How the comparison is made? Text : Pattern : Sufix search • Which is the next position of the window? a Text : Pattern : Shift until the next ocurrence of “a” in the pattern: a a a a a a We need a preprocessing phase to construct the shift table.

  11. Horspool algorithm : example Given the pattern ATGTA A C G T • The shift table is:

  12. Horspool algorithm : example Given the pattern ATGTA A 4 C G T • The shift table is:

  13. Horspool algorithm : example Given the pattern ATGTA A 4 C 5 G T • The shift table is:

  14. Horspool algorithm : example Given the pattern ATGTA A 4 C 5 G 2 T • The shift table is:

  15. Horspool algorithm : example Given the pattern ATGTA A 4 C 5 G 2 T 1 • The shift table is:

  16. Horspool algorithm : example Given the pattern ATGTA A 4 C 5 G 2 T 1 • The shift table is: • The searching phase: G T A C T A G A G G A C G T A T G T A C T G ... A T G T A A T G T A A T G T A A T G T A A T G T A A T G T A

  17. Exemple algorisme de Horspool Given the pattern ATGTA A 4 C 5 G 2 T 1 • The shift table is: • The searching phase: G T A C T A G A G G A C G T A T G T A C T G ... A T G T A A T G T A A T G T A A T G T A A T G T A A T G T A A T G T A

  18. Qüestions sobre l’algorisme de Horspool A 4 C 5 G 2 T 1 Given the pattern ATGTA, the shift table is Given a random text over an equally likely probability distribution (EPD): 1.- Determine the expected shift of the window. And, if the PD is not equally likely? 2.- Determine the expected number of shifts assuming a text of length n. 3.- Determine the expected number of comparisons in the suffix search phase

  19. Exact string matching: one pattern (text on-line) Experimental efficiency (Navarro & Raffinot) BNDM : Backward Nondeterministic Dawg Matching | | BOM : Backward Oracle Matching 64 32 16 Horspool 8 BOM BNDM 4 Long. pattern 2 w 2 4 8 16 32 64 128 256

  20. BNDM algorithm • How the comparison is made? Search for suffixes of T that are factors of x Text : Pattern : That is denoted as D2 = 1 0 0 0 1 0 0 Once the next character x is read D3 = D2<<1 & B(x) B(x): mask of x in the pattern P. For instance, if B(x) = ( 0 0 1 1 0 0 0) D = (0 0 0 1 0 0 0) & (0 0 1 1 0 0 0 ) = (0 0 0 1 0 0 0 ) • Which is the next position of the window? Depends on the value of the leftmost bit of D

  21. BNDM algorithm: exaple B(A) = ( 1 0 0 0 1 ) B(C) = ( 0 0 0 0 0 ) B(G) = ( 0 0 1 0 0 ) B(T) = ( 0 1 0 1 0 ) • The mask of characters is: • The searching phase: G T A C T A G A G G A C G T A T G T A C T G ... A T G T A A T G T A A T G T A A T G T A Given the pattern ATGTA D1 = ( 0 1 0 1 0 ) D2 = ( 1 0 1 0 0 ) & ( 0 0 0 0 0 ) = ( 0 0 0 0 0 ) D1 = ( 0 0 1 0 0 ) D2 = ( 0 1 0 0 0 ) & ( 0 0 1 0 0 ) = ( 0 0 0 0 0 ) D1 = ( 1 0 0 0 1 ) D2 = ( 0 0 0 1 0 ) & ( 0 1 0 1 0 ) = ( 0 0 0 1 0 ) D3 = ( 0 0 1 0 0 ) & ( 0 0 1 0 0) = ( 0 0 1 0 0 ) D4 = ( 0 1 0 0 0 ) & ( 0 0 0 0 0) = ( 0 0 0 0 0 )

  22. Exemple algorisme BNDM B(A) = ( 1 0 0 0 1 ) B(C) = ( 0 0 0 0 0 ) B(G) = ( 0 0 1 0 0 ) B(T) = ( 0 1 0 1 0 ) • Given the pattern ATGTA • The mask of characters is : • The searching phase: G T A C T A G A G G A C G T A T G T A C T G ... A T G T A A T G T A D1 = ( 1 0 0 0 1 ) D2 = ( 0 0 0 1 0 ) & ( 0 1 0 1 0 ) = ( 0 0 0 1 0 ) D3 = ( 0 0 1 0 0 ) & ( 0 0 1 0 0 ) = ( 0 0 1 0 0 ) D4 = ( 0 1 0 0 0 ) & ( 0 1 0 1 0 ) = ( 0 1 0 0 0 ) D5 = ( 1 0 0 0 0 ) & ( 1 0 0 0 1 ) = ( 1 0 0 0 0 ) D6 = ( 0 0 0 0 0 ) & ( * * * * * ) = ( 0 0 0 0 0 ) Trobat!

  23. Exemple algorisme BNDM B(A) = ( 1 0 0 0 1 ) B(C) = ( 0 0 0 0 0 ) B(G) = ( 0 0 1 0 0 ) B(T) = ( 0 1 0 1 0 ) • The mask of characters is : • The searching phase: G T A C T A G A A T A C G T A T G T A C T G ... A T G T A A T G T A A T G T A Given the pattern ATGTA How the shif is determined? D1 = ( 0 1 0 1 0 ) D2 = ( 1 0 1 0 0 ) & ( 0 0 0 0 0 ) = ( 0 0 0 0 0 ) D1 = ( 0 1 0 1 0 ) D2 = ( 1 0 1 0 0 ) & ( 1 0 0 0 1 ) = ( 1 0 0 0 0 ) D3 = ( 0 0 0 0 0 ) & ( 1 0 0 0 1 ) = ( 0 0 0 0 0 )

More Related