200 likes | 497 Vues
Prediction of Secondary Structure of RNA. Swetha Nandyala. Overview. Introduction How RNA Secondary Structure is obtained ? Various types of loops possible in RNA Secondary Structure Motivation Resource Implements the Algorithms to be Understood Algorithm Nussinovs RNA Folding Example.
E N D
Prediction of Secondary Structure of RNA Swetha Nandyala
Overview Introduction How RNA Secondary Structure is obtained ? Various types of loops possible in RNA Secondary Structure Motivation Resource Implements the Algorithms to be Understood Algorithm Nussinovs RNA Folding Example
RNA Secondary Structure • Primary Structure of RNA : A sequence of the bases A,G,C and U. • Due to hydrogen bonds, the bases of the RNA may form base pairs a) Watson-Click base pairs: • G≡C : formed by a triple hydrogen bond • A=U : formed by a double hydrogen bond b) Wooble base pairs: G-U : formed by single hydrogen bond • Secondary Structure of an RNA : The W-C and Wooble base pairs occurring in an RNA Fold
Structural Features Of RNA Hairpin Loop Multibranched Loop Stem Bulge Internal Loop External Base
MOTIVATION Why is Prediction of RNA Secondary Structure important? • Tertiary Structure Prediction • Identify highly conserved Motifs • Design and Testing of Pharmaceutical products.
Algorithm • Input • an alignment of RNA sequence • Output • single common structure for the sequence • The Model • The evolutionary model (Nussinovs) • The SCFG
http://ludwig-sun2.unil.ch/~bsondere/nussinov/form.html#nussinovhttp://ludwig-sun2.unil.ch/~bsondere/nussinov/form.html#nussinov Is this the only Secondary Structure possible? No How to distinguish the correct structure? Need for both a function and an algorithm arises at this point, Hence Dynamic Programming Key Idea of Nussinovs algorithm Recursive and only 4 possible ways Resource: Questions:
Nussinovs’ Dynamic Programming i+1 i j-1 j i Unpaired base i Unpaired base j j j-1 i+1 i j k k+1 i j Bifurcation paired i,j
Energy Matrix • E(i,j)=maximum energy for sub chain starting at i and ending at j and • s(ri,rj)=energy of pair ri, rj(rj=base at position j) • i is unpaired E(i,j)=E(i+1,j) • j is unpaired E(i,j)=E(i,j-1) • i,j is paired E(i,j)=E(i+1,j-1)+s(ri,rj) • Bifurcation E(i,j)=E(i,k)+E(k+1,j)
RNA Secondary Structure Algorithm • Given: RNA Sequence x1,x2,x3…….,xL • Initialization: E(i,i-1)=0 for i=2 to L E(i,i)=0 for i=1 to L • Recursion: for n=2 to L //iteration over length E(i,j)=max{ E(i+1,j), E(i,j-1), E(i+1,j-1)+s(ri,rj) max i<k<j {E(i,k)+E(k+1,j)} }
Example j Let s(ri,rj)=1 if ri, rj form a base pair and 0 otherwise Input : GGGAAAUCC E(i,j)=maximum energy conformation for sub chain from i to j i Here we should have max energy for AAAUC
GGG (i=1,j=3) max {0, 0, 0+s(G,G) }=0 AAU (i=5,j=7) max {0, 0, 0+s(A,U) }=1
Recovering the Structure • Main difference to sequence Alignment-We are tracing back a tree like structure not a single optimal path (Bifurcation introduces branch points) • Method 1: Leave pointers as you compute the table:for each element of the table store(atmost two) pointers the subsequences used in solution • Method2: Recover history based on numerical values in the table -Stacking- check value along diagonal -Bifurcation-find k such that E(i,k)+E(k+1,j)=E(i,j)
Trace back Algorithm • Initialization: Push(1,L) onto stack • Recursion: • Repeat until stack is empty • pop(i,j) • if i>=j continue; else if E(i+1,j)=E(i,j) push(i+1,j) else if E(i+1,j)=E(i,j) push(i,j-1) else if E(i+1,j)=E(i,j) push(i,j) record i,j base pair push(i+1,j-1) else for k=i+1 to j-1 : if E(i,k)+E(k+1,j)=E(i,j); push(k+1,j) push(i,k) break
A A A • U G • C G • C G
Problems and Improvements • Advantages: • Accurate Output can be obtained if able to provide certain basic details • Cost:O(n3) • Success rate was 70% • Main Drawbacks: • Hairpin loops could be of any length. • Developed in 1978.Therefore, it is not state of art today, but is a good starting point • Improvements: • Minimization Gibbs’ Free energy • SCFGs’
http://ludwig-sun2.unil.ch/~bsondere/nussinov/ http://www.daimi.au.dk/~schauser/genome_analysis_F03/lectures_F03/RNA-struct-prediction.pdf http://www.scs-pw.gmu.edu/jamison/CSI730F01/Lec10.pdf Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids- R.Durbin, S.Eddy, A.Korgh, G.Mitchison. University of Cambridge Press,1998 References: