1 / 46

RNA

Local Exact Pattern Matching for Non-fixed RNA Structures Mika Amit , Rolf Backofen , Steffen Heyne , Gad M. Landau, Mathias Mohl , Christina Schmiedl , Sebastian Will. RNA. RNA R is an ordered pair (S,B) where:. C. A. G. U. A. C. U. A.

Télécharger la présentation

RNA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Local Exact Pattern Matching for Non-fixed RNA StructuresMika Amit,Rolf Backofen, Steffen Heyne, Gad M. Landau, Mathias Mohl, Christina Schmiedl, Sebastian Will

  2. RNA RNA R is an ordered pair (S,B) where: C A G U A C U A S is a sequence defined over 𝚺 = {A,C,G,U} G C G C U B is a set of base pairs C-G, G-C, A-U, or U-A C U base pair singlebase U backbone connection G G U A G C A U C A C C C U U U CPM 2012, Helsinki

  3. RNA RNA R is an ordered pair (S,B) where: C A G U A C U A S presents the primary structure of R G C G C B presents the secondary structure of R U C U U G G U A G C A U C A C C C U U U CPM 2012, Helsinki

  4. RNA Representations C A G U A C C U U U GC U A G C GC Tree G C UA U C U A G C A U C U G G U A G C A U C A C C C U U U Arc annotated string CPM 2012, Helsinki

  5. RNASecondaryStructure • Determines the activity and functionality of the RNA C A G U A C U A G C • Usually more preserved during evolution C A G C G C C C C U A C U U C G A G G G A A C U A G G A C U A U G C G The secondary structures of RNA is highly researched A CPM 2012, Helsinki

  6. RNAStructure • Predicting the secondary structure of RNA molecule is a difficult task C A G U A C U A G C C A G C G C C C C U A C U U C G A • The structure is sometimes given in a non-fixed form, where each base pair has a probability ≤ 1 to exist in the RNA G G G A A C U A G G A C U A U G C G A CPM 2012, Helsinki

  7. Nested Structure In all of these examples, the structure of R is Nested: Each base can be connected by a bond connection to at most one other base, and there are no crossing arcs C A G U A C C U U U GC U A G C GC G C UA U C U A G C A U C U G G U A G C A U C A C C C U U U CPM 2012, Helsinki

  8. Unlimited Structure Arc annotated substrings can represent Unlimited structures, as well G G U A G C A U C A C C C U U C C A G A C U G A A CPM 2012, Helsinki

  9. Bounded-Unlimited Structure Arc annotated substrings can represent Bounded-Unlimited structures: Each base can be connected to a constant number of other bases, G G U A G C A U C A C C C U U C C A G A C U G A A and crossing arcs are allowed CPM 2012, Helsinki

  10. RNA Similarity Algorithms Many algorithms for finding similarity between RNA molecules use tree similarity algorithms • Tree Edit Distance: • Tai (’79) O(n6) • Zhang & Shasha (‘89) O(n4) • Klein (‘98) O(n3logn) • Ma et al. (‘99) O(n3logn) • Demaine et al. (‘07) O(n3) GC UA AU GC CG GC GC UA GC UA UA GC CG A G C A U C U C A G C CPM 2012, Helsinki A C A G A C U

  11. RNA Similarity Algorithms Many algorithms for finding similarity between RNA molecules use tree similarity algorithms • Tree Alignment: • Jiang et al. (’95) • Schirmer & Giegerich (‘11) • Backofen et al. (‘07) • Mohl et al. (’09) GC UA AU GC CG GC GC UA GC UA UA GC CG A G C A U C U C A G C CPM 2012, Helsinki A C A G A C U

  12. RNA Similarity Algorithms Many algorithms for finding similarity between RNA molecules use tree similarity algorithms • Longest Arc Preserving Common Subsequence: • Evans (’99) • Lin et al. (’02) • Alber et al. (’04) • Jiang et al. (’04) GC UA AU GC CG GC GC UA GC UA UA GC CG A G C A U C U C A G C CPM 2012, Helsinki A C A G A C U

  13. RNA Similarity Algorithms Many algorithms for finding similarity between RNA molecules use tree similarity algorithms • Similar Subforests • Jansson & Peng (’11) GC UA AU GC CG GC GC UA GC UA UA GC CG A G C A U C U C A G C CPM 2012, Helsinki A C A G A C U

  14. Exact Pattern Matching Problem In this work, we search for local common sequence-structure regions (patterns) between two given RNA molecules Pattern CPM 2012, Helsinki

  15. Patterns in RNAs In this work, we search for local common sequence-structure regions (patterns) between two given RNA molecules CPM 2012, Helsinki

  16. Exact Pattern Matching Problem Finding all maximal common structure-sequence regions between two RNAs Solved by Backofen & Siebert in O(n2) for fixed Nested x Nested Structures G A A C C U C A G G C U U U C C U A A single base match left endpoint match type mismatch G A A G A A C A G G C U U A C C C U U C G CPM 2012, Helsinki

  17. Exact Pattern Matching Problem In this work, we solve the problem for non-fixedNested x Nested Structures arc breaking G A A C C U C A G G C U U U C C U A A G A A G A A C A G G C U U A C C C U U C G CPM 2012, Helsinki

  18. Arc Breaking Operation • We support the operation of arc-breaking, in which a base pair can be deleted, with no penalty base pair G U A G U C U G A C C C A G G G A C single bases CPM 2012, Helsinki

  19. Arc Breaking Operation • We support the operation of arc-breaking, in which a base pair can be deleted, with no penalty base pair A G C U C C C U A G A G G G U A G C single bases CPM 2012, Helsinki

  20. Arc Breaking • We support the operation of arc-breaking, in which a base pair can be deleted, with no penalty GC UA U AU GC CG A GC GC UA GC UA UA GC CG A G C A U C U C A G C A C A G A C U CPM 2012, Helsinki

  21. Arc Breaking Patterns are now less restricting: CPM 2012, Helsinki

  22. Exact Pattern Matching Algorithms We describe three algorithms for finding the local exact pattern matching between two RNAs: • A simple O(n4) algorithm • (using ideas from Zhang & Shasha (‘89) ) • An improved O(n3logn) algorithm • (using ideas from Klein (‘98) ) • An O(n3) algorithm • (using ideas from Demaine, Weimann et al. (‘07) ) CPM 2012, Helsinki

  23. Exact Pattern Matching Algorithm Input: R1=(S1,B1) and R2=(S2,B2), |R1|=n, |R2|=m, n>m Output: Local exact pattern matching between R1 and R2 R1: R2: CPM 2012, Helsinki

  24. Exact Pattern Matching Algorithm We compare each base pair from R1 with each base pair from R2, in increasingorder of their sizes R1: R2: CPM 2012, Helsinki

  25. Exact Pattern Matching Algorithm For each two base pairs we compute the matching inside the base pairs, and the extensions to their outsides … … … … CPM 2012, Helsinki

  26. Matching Inside the Base Pairs • Dynamic programming algorithm • Similar to the LCS\Edit distance algorithms of strings CPM 2012, Helsinki

  27. Matching Inside the Base Pairs On each comparison we compute only prefixes of the substrings and select the maximal score over 4 expressions : Match base pairs bp1 i 1 + S1(i)==S2(j) ? + 1 j bp2 CPM 2012, Helsinki

  28. Matching Inside the Base Pairs Match single bases bp1 1 i S1(i)==S2(j) ? 1 j bp2 CPM 2012, Helsinki

  29. Matching Inside the Base Pairs Delete from R1 Delete from R2 bp1 1 i-1 i 1 j bp2 CPM 2012, Helsinki

  30. Matching Inside the Base Pairs On each comparison we compute the maximal match from left-to-right … … C A A G U A G C U A U A U G C C G A C 1 i j 1 … … C G A C A A G C U U A U A U A U A U G C C CPM 2012, Helsinki

  31. Matching Inside the Base Pairs On each comparison we compute the maximal match from right-to-left … … C A A G U A G C U A U A U G C C G A C 1 i j 1 … … C G A C A A G C U U A U A U A U A U G C C CPM 2012, Helsinki

  32. Matching Inside the Base Pairs • There are two tricky parts here: • What happens when a mismatch occurs? … … C A A G U A G C U A U A U G C C G A C C 1 i j 1 … … C G A C A A G C U U A U A U A U A U G C C G CPM 2012, Helsinki

  33. Matching Inside the Base Pairs • There are two tricky parts here: • What happens when the matchings overlap? … … C A A G U A G C U A U A U G C C G A C 1 i j 1 … … C G A C A A G C U U A U A U A U A U G C C CPM 2012, Helsinki

  34. Matching Inside the Base Pairs The solution: on each comparison we compute the best score going from both right-to-left and left-to-right … … C A A G U A G C U A U A U G C C G A C 1 i j 1 … … C G A C A A G C U U A U A U A U A U G C C CPM 2012, Helsinki

  35. Time Complexity • We only compare prefixes of the base pairs • There are O(n2) prefixes for each RNA • Each comparison is computed in O(1) time • The total time is O(n4) CPM 2012, Helsinki

  36. Extending the Match We compute the maximal pattern extension for all bases in R1 and all bases in R2 in one run. The time complexity: O(n2) R1: … n i j m … R2: CPM 2012, Helsinki

  37. Total Time Complexity Computing the pattern match inside all base pairs is done in O(n4) Computing the pattern match extensions to the right and to the left is done in O(n2) The total time complexity is O(n4) + = CPM 2012, Helsinki

  38. An O(n3logn)Algorithm We use Klein’s Tree Edit Distance (‘98) ideas:we decompose the largest RNA into heavy paths: The root base pair is marked light, and continue recursively: Select the maximal child base pair and mark it as heavy, mark the rest of the children as light C C G A A U C C G A G U U C G G G U C C C A G G CPM 2012, Helsinki

  39. Special Substrings For each base pair we define its specialsubstrings bp The no. of special substrings of a base pair is: |bp| - |hp| + 1 hp U U C C A C G G G U C C C A G G a x y b U C G G G U C C C A Lemma (Sleator & Tarjan ‘83): There are O(nlog n) special substring in R of size n U U C G G G U C C C A U U C C G G G U C C C A U U C C A G G G U C C C A C U U C C A G G G U C C C A A C U U C G G G U C C C A C G U U C C A C G G G U C C C A G G CPM 2012, Helsinki

  40. An O(n3logn)Algorithm We compare all O(n2) substrings of R2 with O(nlogn)specialsubstrings of R1 bp hp U U C C A C G G G U C C C A G G a x y b U C G G G U C C C A U U C G G G U C C C A U U C C G G G U C C C A U U C C A G G G U C C C A C U U C C A G G G U C C C A A C U U C G G G U C C C A C G U U C C A C G G G U C C C A G G CPM 2012, Helsinki

  41. An O(n3logn)Algorithm The comparisons are made between the rightmost or leftmost bases, according to the special substring bp hp U U C C A C G G G U C C C A G G a x y b U C G G G U C C C A U U C G G G U C C C A U U C C G G G U C C C A U U C C A G G G U C C C A C U U C C A G G G U C C C A A C U U C G G G U C C C A C G U U C C A C G G G U C C C A G G CPM 2012, Helsinki

  42. An O(n3logn)Algorithm The total number of compared substrings is O(n3logn), each one computed in O(1) time, which gives a total of O(n3logn) running time. bp hp This algorithm works for Nested x Bounded-Unlimited structures also. U U C C A C G G G U C C C A G G a x y b U C G G G U C C C A U U C G G G U C C C A U U C C G G G U C C C A U U C C A G G G U C C C A C U U C C A G G G U C C C A A C U U C G G G U C C C A C G U U C C A C G G G U C C C A G G CPM 2012, Helsinki

  43. An O(n3)Algorithm Based on Demaine et al. (‘07) algorithm we decompose both RNAs to heavy paths, the special substrings are decided on each base pairs comparison: the base pair that has the largest root light base pair, is the dominant one 1 R1: 4 2 3 6 8 5 9 7 C C G A A U C C G A G U U C G G G U C C C A G G A R2: D C B F E C C U A C U C U G C C U U G C U U G C A G A CPM 2012, Helsinki

  44. An O(n3)Algorithm The number of compared substrings is O(n3) This algorithm can work with Nested X Nested structures only R1: 1 4 2 6 8 3 5 9 7 C C G A A U C C G A G U U C G G G U C C C A G G R2: A D C B E F C C U A C U C U G C C U U G C U U G C A G G CPM 2012, Helsinki

  45. More Algorithms • Find the local approximate pattern matching between Nested x Nested structures in O(n3k2) • for k allowed mismatches • Find the local approximate pattern matching between Nested x Bounded-Unlimited structures in O(n3k2logn) for k allowed mismatches • Find the most similar sibling substructures between Nested x Nested structures in O(n3) CPM 2012, Helsinki

  46. T H A N K Y O U !

More Related