1 / 48

RNA Secondary Structure Prediction

RNA Secondary Structure Prediction. Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia Columbia, MO 65211-2060 E-mail: xudong@missouri.edu 573-882-7064 (O) http://digbio.missouri.edu. Final Report. Due on Dec. 8.

harva
Télécharger la présentation

RNA Secondary Structure Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RNA Secondary Structure Prediction Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia Columbia, MO 65211-2060 E-mail: xudong@missouri.edu 573-882-7064 (O) http://digbio.missouri.edu

  2. Final Report • Due on Dec. 8. • A numerical score (0-15) will be assigned for based on • Clear formulation of the project (2) • Method (4) • Significant results achieved (4) • Discussion (3) • Writing of the project (2)

  3. Final Presentation • Preferably shown in powerpoint file, pdf is fine • Preferably 20 minutes (up to 25 min), plus 5min for questions • 15 points for the presentation (introduction, methods, results, discussions) • 15 points for software demo • Implementation of the software • Major functionalities • Documentation • Perform a test run

  4. Presentation Evaluation • A numerical score (0-15) will be assigned based on • Did the student put enough effort? (3) • Is the work interesting or novel? (3) • Is the method technically sound? (3) • Is the discussion insightful? (3) • Is the presentation clear? (3)

  5. Software Demo • A numerical score (0-15) will be assigned based on • Whether the program can run using actual biological data (3) • Documentations (3) • Whether it is easy to use (3) • Performance in accuracy (3) • Performance in computing time and memory usage (3)

  6. Outline • RNA Secondary Structure • Comparative Approach • Base-Pair Maximization • Free Energy Minimization • Local Structure Prediction

  7. RNA Types siRNA, short interfering RNA; miRNA, microRNA; small temporal RNA stRNA; snoRNA small nucleolar RNA ; snRNA: Small nuclear RNA.

  8. Features of RNA • RNA: polymer composed of a combination of four nucleotides • adenine (A) • cytosine (C) • guanine (G) • uracil (U)

  9. Features of RNA • G-C and A-U form complementary hydrogen bonded base pairs (canonical Watson-Crick) • G-C base pairs being more stable (3 hydrogen bonds) A-U base pairs less stable (2 bonds) • non-canonical pairs can occur in RNA -- most common is G-U

  10. RNA Pairs A-U G-C G-U

  11. RNA Structure Hierarchy Primary structure: 5’ ACCACCUGCUGA 3’ Secondary Structure Tertiary structure:

  12. Secondary Structure Categories Hairpin loop Hairpin loop Stem Stem Internal loop Internal loop Bulge loop Bulge loop Pseudoknots

  13. tRNA structure

  14. Circular Representation

  15. Assumptions in Secondary Structure Prediction • Most likely structure similar to energetically most stable structure • Energy associated with any position is only influenced by local sequence and structure • Structure formed does not produce pseudoknots

  16. Exceptions Pseudoknot Kissing hairpins Hairpin-bulge Do not obey “parentheses rule”

  17. Outline • RNA Secondary Structure • Comparative Approach • Base-Pair Maximization • Free Energy Minimization • Local Structure Prediction

  18. Inferring Structure By Comparative Sequence Analysis • First step is to calculate a multiple sequence alignment • Requires sequences be similar enough so that they can be initially aligned • Sequences should be dissimilar enough for correlated mutation to be detected

  19. Mutual Information • fxi : frequency of a base in column i • fxixj: joint (pairwise) frequency of a base pair between columns i and j • Information ranges from 0 and 2 bits • If i and j are uncorrelated, mutual information is 0

  20. Mutual Information Plot

  21. Mutual Information Plot

  22. Outline • RNA Secondary Structure • Comparative Approach • Base-Pair Maximization • Free Energy Minimization • Local Structure Prediction

  23. Dot Plot

  24. Base-Pair Maximization • Find structure with the most base pairs • Efficient dynamic programming approach to this problem introduced by Nussinov (1970s). • Four ways to get the best structure between position i and j from the best structures of the smaller subsequences

  25. Nussinov Algorithm • 1)Add i,j pair onto best structure found for subsequence i+1, j-1 • 2)add unpaired position i onto best structure for subsequence i+1, j • 3)add unpaired position j onto best structure for subsequence i, j-1 • 4)combine two optimal structures i,k and k+1, j

  26. Dynamic Programming - 1 Notation: • e(ri,rj) : free energy of a base pair joining ri and rj • S(i,j) : optimal free energy associated with segment ri…rj

  27. Dynamic Programming - 2 • i is unpaired, added on to • a structure for i+1…j • S(i,j) = S(i+1,j) • j is unpaired, added on to • a structure for i…j-1 • S(i,j) = S(i,j-1)

  28. Dynamic Programming - 3 • i j paired, but not to each other; • the structure for i…j adds together • structures for 2 sub regions, • i…k and k+1…j • S(i,j) = max {S(i,k)+S(k+1,j)} • i j paired, added on to • a structure for i+1…j-1 • S(i,j) = S(i+1,j-1)+e(ri,rj) i<k<j

  29. Dynamic Programming - 4 Since there are only four cases, the optimal score S(i,j) is just the maximum of the four possibilities:

  30. j Initialisation: No close basepairs i

  31. j Propagation: C5….U9 : C5 unpaired: S(6,9) = 0 U10 unpaired: S(5,8)=0 C5-U10 paired S(6,8) +e(C,U)=0 C5 paired, U10 paired: S(5,6)+S(7,9)=0 S(5,7)+S(8,9)=0

  32. j Propagation: C5….G11 : C5 unpaired: S(6,11) = 3 G11 unpaired: S(5,10)=3 C5-G11 paired S(6,10)+e(C,G)=6 C5 paired, G11 paired: S(5,6)+S(7,11)=1 S(5,7)+S(8,11)=0 S(5,8)+S(9,11)=0 S(5,9)+S(10,11)=0

  33. j Propagation: i

  34. j Traceback: i

  35. Final Prediction C G U G C G C U A U U A A U AUACCCUGUGGUAU Total free energy: -12 kcal/mol

  36. Some Notes • Computational complexity: N3 • Does not work with pseudo-knot (would invalidate DP algorithm) • Methods that include pseudo knots: Rivas and Eddy, JMB 285, 2053 (1999) These methods are at least N6

  37. Outline • RNA Secondary Structure • Comparative Approach • Base-Pair Maximization • Free Energy Minimization • Local Structure Prediction

  38. Energy Minimization Methods • RNA folding is determined by biophysical properties • Energy minimization algorithm predicts the correct secondary structure by minimizing the free energy (G) • G calculated as sum of individual contributions of: • loops • base pairs • secondary structure elements • Energies of stems calculated as stacking contributions between neighboring base pairs

  39. Example for Thermodynamic Parameters

  40. Calculating Best Structure • sequence is compared against itself using a dynamic programming approach • similar to the maximum base-paired structure • instead of using a scoring scheme, the score is based upon the free energy values • Gaps represent some form of a loop • The most widely used software that incorporates this minimum free energy algorithm is MFOLD.

  41. How well do they perform? • Current RNA folding programs get about 60-70% of base pairs correct, on average: useful, but not yet good. • The problem is the scoring system: thermodynamic model is accurate within 5-10%, and many alternative structures are within 10%. • Possible solution: combination of thermodynamic score with comparative sequence information

  42. Outline • RNA Secondary Structure • Comparative Approach • Base-Pair Maximization • Free Energy Minimization • Local Structure Prediction

  43. RNA Motif in HIV TAR motif: Transactivating Response Element

  44. RNA Motifs Associated with Transcription termination Rho-independent terminator stop the transcription process via its hairpin structure

  45. Algorithm in Rnall • Definition 1. A “match” : canonical base pairs • Definition 2. A “mismatch”: non-canonical base pair • Definition 3. An “insertion”/“deletion”: nucleotide unpaired

  46. RNA LSS in HIV TAR (30) DIS (260) PolyA (82) SD (292) PSI (319)

  47. Some RNA Resource • Comparative RNA web site http://www.rna.icmb.utexas.edu/ • RNA world http://www.imb-jena.de/RNA.html • RNA page by Michael Suker http://www.bioinfo.rpi.edu/~zukerm/rna/ • RNA structure database http://www.rnabase.org/ http://ndbserver.rutgers.edu/ (nucleic acid database) http://prion.bchs.uh.edu/bp_type/ (non canonical bases) • RNA structure classification http://scor.berkeley.edu/ • RNA visualisation http://ndbserver.rutgers.edu/services/download/index.html#rnaview http://rutchem.rutgers.edu/~xiangjun/3DNA/

  48. Reading Assignments • Suggested reading: • Chapter 14 in “Current Topics in Computational Molecular Biology, edited by Tao Jiang, Ying Xu, and Michael Zhang. MIT Press. 2002.” • Optional reading: • http://www.bioinfo.rpi.edu/~zukerm/seqanal/mfold-3.0-manual.pdf

More Related