Mira Abraham-Cohen and Haim J.Wolfson

Mira Abraham-Cohen and Haim J.Wolfson Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel

Why RNA? RNA (ribonucleic acid) is: not solely a carrier of genetic information (non-coding RNAs) X Protein DNA RNA The Central Dogma of Molecular Biology

Why RNA? RNA (ribonucleic acid) is: not solely a carrier of genetic information (non-coding RNAs) a key player in essential cellular processes (e.g. protein synthesis and transport, gene silencing) involved in pathological processes (e.g. cancerous tumors, AIDS) a potential drug or drug-target (e.g. RNAi, bacterial ribosomes as antibiotic-targets)

RNA Structure 1D 2D 3D

?Why RNA secondary structure • “RNA structure” usually refers to 2D structure • Easier to achieve (more common than 3D structures) • Secondary structure elements • Helix • Loop

Secondary Structure elements Helix Internal loop Bulge Multi branch loop Hairpin

GUCUGUCCCCACACGACAGAUAAUCGGGUGCAACUCCCGCCCCUUUUCCGAGGGUCAUCGGAACCAGUCUGUCCCCACACGACAGAUAAUCGGGUGCAACUCCCGCCCCUUUUCCGAGGGUCAUCGGAACCA .((((((.......))))))....((((.......)))).[[[..((((((]]]...))))))...

Pseudoknot structural motif • Important for the function of many RNAs helix1 i1 < i2 < j1 < j2 helix2 • RNA 2D structure alignment • Disregarding pseudoknots O(n4) [Zhang and Shasha 1989] • Including pseudoknots NP-Hard [Zhang et al. 1999]

Why do pseudoknots make a difference? Are they common? Over 30% of the functional groups Less than 70% 2D similarity

Previous work – RNA 2D alignment • Methods disregarding pseudoknots • RNAforester [Hofacker et al. 2004] • Migals [Allali and Sagot 2005] • MARNA [Siebert and Backofen 2005] • Methods that deal with limited cases • rna_align (DP) [Jiang et al. 2001] • pkalign (DP) [Mohl et al. 2009]

Previous work – RNA 2D alignment • A method that deals with the general problem • LARA (ILP) [Bauer et al. 2007] • All current methods dealing with pseudoknots • High time and memory complexity • Impractical for big structures • rna_align < 150 nts • pkalign < 800 nts • LARA < 1600 nts on pc-wolfson1 (2GB RAM)

HARP Motivation Preserved 3D structure Preserved function Preserved relative 3D distances Preserved function Preserved relative 2D distances Preserved function ?

HARP • Aligns RNA 2D structures with no limitation on the pseudoknot type • Exploits inherent RNA distance constraints • Distances between 2D elements are usually conserved • Pseudoknots often create spatial distance constraints • Goal: Finding the largest set of conserved helices • Heuristic method based on an analog of Geometric Hashing

Geometric hashing Each pair of points defines a “view” Voting table Point of “view”

HARP - Overview R1 R2 Generate reduced “helix” graph representations G1 G2 Build a look-up table of geodesic distances in all bases Query the look-up table Refine alignments and extend the match

Reduced graph representation • Vertices- stable helices • Helix beginning, termination and length • Edges connect adjacent helices • Direction: polymerization direction • Weight: minimal number of nucleotides needed for connection

Graph representation

Graph representation i k j backward forward k k 11 20 4 4 7 16 i i j j

Building a look-up table forward backward Shortest path between any two vertices Any two vertices (i,j) define a “view”

Similar views Inserting G1 triangles Querying with G2 triangles

Querying the vote table Indexing edges Basis edge • Filtering by • Triangle type F/B • ε-vicinity • Querying the table with the indexing edges of G2 • ε-vicinity

Alignment refinement G1 G2 w Distance between the vertices Hungarian algorithm Correlation between helices’ lengths

Alignment extension and scoring • Greedy approach • Starting with the largest (pair of bases) match • Extending by adding the pair that contributes most to the extension • Score

Complexity Generating reduced graphs representations In practice: Average size structures less than a second Big structures (~2800 nucleotides) less than a minute and 10 MB Building a look-up table Querying the look-up table Generating alignments: Alignments refinement Alignment extension

Results • HARP’s statistics • Average score and p-value • Comparison with LARA • Alignment examples

HARP’s statistics

Similar 2D yet different function 5S ribosomal RNA SRP

Comparison with LARA 23 rRNA

Comparison with LARA Sensitivity TP/P=TP/(TP+FN) HARP LARA 1-Specificity = FPR FP / N = FP / (FP + TN)

Self splicing group I introns 68.9% similarity (left) PDB id 1zzn chain B, 10 stable helices. (right) PDB id 1y0q chain A, 13 stable helices.

Catalytic domains of ribonuclease P (left) PDB id 2a2e chain A, 19 stable helices (right) PDB id 2a64 chain A, 16 stable helices .

Conclusions HARP • HARP is a tool for the alignment of RNA secondary structures, which may include pseudoknots • Accurate tool capable of distinguishing between homologous structures and non-homologous structures • Highly efficient • Takes less than a second for average-size structures • Less than a minute and 10 MB for very big structures • Web server : http://bioinfo3d.cs.tau.ac.il/HARP

Thank you for your attention !

Mira Abraham-Cohen and Haim J.Wolfson

Mira Abraham-Cohen and Haim J.Wolfson

Presentation Transcript

Abraham and Isaac

mira road property |09999684955| mira road mumbai- mira road

Ronen Abraham Ido Cohen Yuval Efrati Tomer Sole'

Abraham and Family

ISRAEL Current challenges 19 July 2010 Haim Dror haim@humints.com

Abraham and Isaac:

Corey Haim

Mira and Camina

Haim Ginott

MIRA

PDNA/ MIRA

Haim Ginott

Recovery And Abraham

VLT Interferometry of non-Mira and Mira giants

Mira Nair

Nimrod and Abraham

Congruent Communitcation Haim Ginott

Haim Cohen 1,2 , Roey Nir Lieberman 1,3 and Yaniv Knop 4