1 / 35

A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns

A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns. Costas Iliopoulos M. Sohel Rahman. Classic Pattern Matching. Input : A string T of length n (the text) A string P of length m (the pattern). Output Whether P occurs in T

Télécharger la présentation

A New Model to Solve the Swap Matching Problem and Efficient Algorithms for Short Patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A New Model to Solve the Swap MatchingProblem and Efficient Algorithms for ShortPatterns Costas Iliopoulos M. Sohel Rahman SOFSEM 2008

  2. Classic Pattern Matching • Input: • A string T of length n (the text) • A string P of length m (the pattern). • Output • Whether P occurs in T • Occ = {i | P = T [i..i + m − 1]} From Alphabet  Existence Query Computation of Occurrence set SOFSEM 2008

  3. Example P = GAC • We have GAC at position 3 and 12 • Occ = {3, 12}. Occ = {5, 14}. SOFSEM 2008

  4. Swap Matching P = ACGCT 1 2 3 4 5 6 7 8 9 10 11 12 13 A G C T C A C G T C C T T Text A C G C T 1 2 3 4 5 SOFSEM 2008

  5. Swap Matching P = ACGCT Occ = {1,5,6} 1 2 3 4 5 6 7 8 9 10 11 12 13 A G C T C A C G T C C T T Text A C G C T A C G C T A C G C T SOFSEM 2008

  6. Motivation • Swap Error is a common error during typing. • The phenomenon of swaps occurs in gene mutations and duplications. SOFSEM 2008

  7. Existing results O(nm1/3 log m log ) 2000: Amir, Aumann, Landau, Lewenstein, Lewenstein. O(n log2 m) 1998: Amir, Landau, Lewenstein, Lewenstein. (Some very special cases) 2003: Amir, Cole, Hariharan, Lewenstein, Porat. O(n log m log ) All results uses FFT  = min(m,||) SOFSEM 2008

  8. Existing results • Some related variants are also investigated in the literature: • Approximate version: • Amir, Lewenstein, Porat (2002) • Weighted Version: • Zhang, Guo, Iliopoulos (2004) SOFSEM 2008

  9. Our Contribution • A new graph theoretic model • O(m/w n logm) time. • For word-size patterns: O(n log m) • The first non-FFT efficient algorithm for swap matching SOFSEM 2008

  10. The new Model SOFSEM 2008

  11. T-Graph 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a c a c b a c c b a c a c b a T = T-Graph a c a c b a c c b a c a c b a SOFSEM 2008

  12. P-Graph 1 2 3 4 5 a c b a b P = P-Graph 2 1 3 4 5 a c b a b a c b a b b a c a b SOFSEM 2008

  13. P-Graph 1 2 3 4 5 a c c a b P = P-Graph 2 1 3 4 5 a c c a b a c c a b a c c a b SOFSEM 2008

  14. So… P swap matches T P-Graph swap matches T-Graph SOFSEM 2008

  15. An Efficient Algorithm SOFSEM 2008

  16. Degenerate strings • Let  = {A, C, G, T} • Then we can get 2^4 -1 = 15 non-empty sets of letters. • At each position of a degenerate string we have one of those sets. SOFSEM 2008

  17. Degenerate strings…  A C G T A C G A C T A G T C G T C G A C A G A T C T C G A C G T SOFSEM 2008

  18. Degenerate strings… 1 2 3 4 5 6 7 T T X= A C C A C C C A SOFSEM 2008

  19. Degenerate stringsEquality/Match 1 2 3 4 5 6 7 T T X[3] =d Y[1]. WHY? X= A C C A C C C Because, X[3]  Y[1] = A  A Y =d X[1..3] C T Y= A Y =d X[3..5] A C Y =d X[4..6] SOFSEM 2008

  20. P-Graph => Degenerate String 2 1 3 4 5 a c b a a c b a b b c a b a a a a a b b c b b c c SOFSEM 2008

  21. Swap Match vs Deg. Match a a a a a b b P => c b b c c 1 2 3 4 5 6 7 8 9 10 b c b a a a b c b a T = a a a a a According to Deg. Mat, OK! b b c b b According to Swap. Mat, NOT OK! c c SOFSEM 2008

  22. Why Doesn’t Work? 1 2 3 4 5 6 7 8 9 10 b c b a a a b c b a T = a a 2 1 3 4 5 a a a b b a c b a c b b c c a c b a b 1 2 3 4 5 a c c a b b c a b SOFSEM 2008

  23. Forbidden Graph a c a a c c a b c a b SOFSEM 2008

  24. Our Algorithm Shift-Or Algorithm The concept of the Forbidden Graph SOFSEM 2008

  25. D-Mask a a c c a b P = a a => a a b c c c b c D-> a b c X 1 ac 0 1 0 1 2 ac 0 1 0 1 3 ac 0 1 0 1 4 abc 0 0 0 1 5 ab 0 0 1 1 SOFSEM 2008

  26. 2 1 3 4 5 a c a F-Mask a c c a b c a b (a,a) (a,b) (b,b) (c,c) (c,a) (X,X) 1 0 0 0 0 0 0 2 0 1 0 0 1 0 0 0 0 0 0 0 0 3 0 1 1 1 4 0 0 0 0 0 0 1 1 5 0 0 1 0 0 0 0 SOFSEM 2008

  27. Computing R matrix 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X a c a c b a c c b a c a c b a Da F(X,a) 1 0 a 1 1 0 0 0 0 1 1 c 2 1 1 0 0 1 1 1 c 3 Shift Or 1 1 0 0 1 1 1 a 4 1 1 0 0 1 1 1 b 5 1 1 0 0 1 SOFSEM 2008

  28. Computing R matrix 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X a c a c b a c c b a c a c b a Dc F(a,c) 1 0 0 a 1 0 0 0 0 0 1 1 0 c 2 1 0 0 0 0 1 1 1 c 3 Shift Or 1 1 0 0 1 1 1 1 a 4 1 1 0 0 1 1 1 1 b 5 1 1 1 0 1 SOFSEM 2008

  29. Computing R matrix 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X a c a c b a c c b a c a c b a Da F(c,a) 1 0 0 0 a 1 0 0 0 0 0 1 1 0 0 c 2 0 0 0 0 0 1 1 1 0 c 3 Shift Or 1 0 0 0 0 1 1 1 1 a 4 1 1 0 0 1 1 1 1 1 b 5 1 1 0 1 1 SOFSEM 2008

  30. Computing R matrix 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X a c a c b a c c b a c a c b a Db F(c,b) 1 0 0 0 0 1 a 1 0 0 1 0 1 1 1 0 0 0 1 c 2 0 0 1 0 1 1 1 1 0 0 1 c 3 Shift Or 0 0 1 0 1 1 1 1 1 0 0 a 4 0 0 0 0 0 1 1 1 1 1 0 b 5 1 0 0 0 0 SOFSEM 2008

  31. Computing R matrix 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X a c a c b a c c b a c a c b a 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 a 1 1 1 0 0 0 1 1 0 1 1 1 0 0 0 1 1 c 2 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 c 3 1 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 a 4 1 1 1 1 1 0 0 1 1 1 0 1 1 1 0 0 b 5 SOFSEM 2008

  32. Running Time Computing D-Maks: O(m/w (m + ||)) Computing F-Maks: O(m/w m log m) Computing R Values: O(m/w n log m) O(m/w n log m) short patterns (m~w) O(n log m) SOFSEM 2008

  33. Future Works • Explore the possibilities of using Graph pattern matching • Experimental works • Forthcoming paper contains experimental works using biological examples. SOFSEM 2008

  34. The End Thank you very much SOFSEM 2008

  35. SOFSEM 2008

More Related