330 likes | 347 Vues
Mira Abraham-Cohen and Haim J.Wolfson. Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel. Why RNA?. RNA (ribonucleic acid) is: not solely a carrier of genetic information (non-coding RNAs). X. Protein. DNA. RNA. The Central Dogma of Molecular Biology.
E N D
Mira Abraham-Cohen and Haim J.Wolfson Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel
Why RNA? RNA (ribonucleic acid) is: not solely a carrier of genetic information (non-coding RNAs) X Protein DNA RNA The Central Dogma of Molecular Biology
Why RNA? RNA (ribonucleic acid) is: not solely a carrier of genetic information (non-coding RNAs) a key player in essential cellular processes (e.g. protein synthesis and transport, gene silencing) involved in pathological processes (e.g. cancerous tumors, AIDS) a potential drug or drug-target (e.g. RNAi, bacterial ribosomes as antibiotic-targets)
RNA Structure 1D 2D 3D
?Why RNA secondary structure • “RNA structure” usually refers to 2D structure • Easier to achieve (more common than 3D structures) • Secondary structure elements • Helix • Loop
Secondary Structure elements Helix Internal loop Bulge Multi branch loop Hairpin
GUCUGUCCCCACACGACAGAUAAUCGGGUGCAACUCCCGCCCCUUUUCCGAGGGUCAUCGGAACCAGUCUGUCCCCACACGACAGAUAAUCGGGUGCAACUCCCGCCCCUUUUCCGAGGGUCAUCGGAACCA .((((((.......))))))....((((.......)))).[[[..((((((]]]...))))))...
Pseudoknot structural motif • Important for the function of many RNAs helix1 i1 < i2 < j1 < j2 helix2 • RNA 2D structure alignment • Disregarding pseudoknots O(n4) [Zhang and Shasha 1989] • Including pseudoknots NP-Hard [Zhang et al. 1999]
Why do pseudoknots make a difference? Are they common? Over 30% of the functional groups Less than 70% 2D similarity
Previous work – RNA 2D alignment • Methods disregarding pseudoknots • RNAforester [Hofacker et al. 2004] • Migals [Allali and Sagot 2005] • MARNA [Siebert and Backofen 2005] • Methods that deal with limited cases • rna_align (DP) [Jiang et al. 2001] • pkalign (DP) [Mohl et al. 2009]
Previous work – RNA 2D alignment • A method that deals with the general problem • LARA (ILP) [Bauer et al. 2007] • All current methods dealing with pseudoknots • High time and memory complexity • Impractical for big structures • rna_align < 150 nts • pkalign < 800 nts • LARA < 1600 nts on pc-wolfson1 (2GB RAM)
HARP Motivation Preserved 3D structure Preserved function Preserved relative 3D distances Preserved function Preserved relative 2D distances Preserved function ?
HARP • Aligns RNA 2D structures with no limitation on the pseudoknot type • Exploits inherent RNA distance constraints • Distances between 2D elements are usually conserved • Pseudoknots often create spatial distance constraints • Goal: Finding the largest set of conserved helices • Heuristic method based on an analog of Geometric Hashing
Geometric hashing Each pair of points defines a “view” Voting table Point of “view”
HARP - Overview R1 R2 Generate reduced “helix” graph representations G1 G2 Build a look-up table of geodesic distances in all bases Query the look-up table Refine alignments and extend the match
Reduced graph representation • Vertices- stable helices • Helix beginning, termination and length • Edges connect adjacent helices • Direction: polymerization direction • Weight: minimal number of nucleotides needed for connection
Graph representation i k j backward forward k k 11 20 4 4 7 16 i i j j
Building a look-up table forward backward Shortest path between any two vertices Any two vertices (i,j) define a “view”
Similar views Inserting G1 triangles Querying with G2 triangles
Querying the vote table Indexing edges Basis edge • Filtering by • Triangle type F/B • ε-vicinity • Querying the table with the indexing edges of G2 • ε-vicinity
Alignment refinement G1 G2 w Distance between the vertices Hungarian algorithm Correlation between helices’ lengths
Alignment extension and scoring • Greedy approach • Starting with the largest (pair of bases) match • Extending by adding the pair that contributes most to the extension • Score
Complexity Generating reduced graphs representations In practice: Average size structures less than a second Big structures (~2800 nucleotides) less than a minute and 10 MB Building a look-up table Querying the look-up table Generating alignments: Alignments refinement Alignment extension
Results • HARP’s statistics • Average score and p-value • Comparison with LARA • Alignment examples
Similar 2D yet different function 5S ribosomal RNA SRP
Comparison with LARA 23 rRNA
Comparison with LARA Sensitivity TP/P=TP/(TP+FN) HARP LARA 1-Specificity = FPR FP / N = FP / (FP + TN)
Self splicing group I introns 68.9% similarity (left) PDB id 1zzn chain B, 10 stable helices. (right) PDB id 1y0q chain A, 13 stable helices.
Catalytic domains of ribonuclease P (left) PDB id 2a2e chain A, 19 stable helices (right) PDB id 2a64 chain A, 16 stable helices .
Conclusions HARP • HARP is a tool for the alignment of RNA secondary structures, which may include pseudoknots • Accurate tool capable of distinguishing between homologous structures and non-homologous structures • Highly efficient • Takes less than a second for average-size structures • Less than a minute and 10 MB for very big structures • Web server : http://bioinfo3d.cs.tau.ac.il/HARP
Thank you for your attention !