Folding RNA a confluence of biology, mathematics, and physics

Folding RNA a confluence of biology, mathematics, and physics A. Zee Institute for Theoretical Physics University of California Santa Barbara, CA, USA

Parisi Fest, Rome 2008 [ From http://chimera.roma1.infn.it/GIORGIO/indexhome.htm]

Unhappily, I have co-authored only one paper (thanks to Marc Mezard) with Giorgio Parisi M. Mezard, G. Parisi, & A. Zee, Spectra of Euclidean Random Matrices, Nuclear Physics B559 (3), pp. 689-701, 1999 But certainly, in my work with E. Brezin, M. Kardar, and others, I am constantly invoking and quoting Parisi’s classic work, such as the BIPZ paper of random matrix theory.

Field Theorists Fold RNA An international collaboration with: • Henri Orland • Matt Pillsbury • Anthony Taylor • Graziano Vernizzi • Paolo Ribeca • Michael Bon (started with a sabbatical year spent by H. Orland at KITP) We exploit topological concepts embedded in large N matrix field theory to attack an important problem in biology— the folding of RNA

Think of beads on a string DNA consists of two complementary strands twisted into a double helix Attraction between C and G & between A and T Our parents worked hard to give us DNA, a message written in an alphabet with 4 letters: C, G, A, T

“Central Dogma of Biology” Watson & Crick Information storage ~ hard disk ~ floppies to give to your friends Function (enzyme, motor, muscle,…) “You don’t want to mess with your hard disk, but you can have as many copies of RNA as you want”

RNA Much much much shorter than DNA • A message written in a 4-letter alphabet, C, G, A, U • Attraction between C and G & between A and U • Hydrogen bond: saturates --> once a C paired with a G, it does not pair with another G • Think of beads on a long chain connected by rigid rods. Saturation of hydrogen bond means that all beads could be glued at most once.

In each one of our cells, stretches of DNA open up & information is copied onto RNA

RNA is single stranded, unlike DNA Because of attraction between C and G & between A and U, once RNA floats free, it folds into a definite shape Different message givesdifferent shape RNA folding problem: Why is this important? Given the message what is its shape?

Watson & Crick thought that RNA is “merely” a “passive carrier of information” from DNA to protein (“messenger RNA”) Revenge of RNA Important discovery of last 15 years or so: RNA plays a crucial enzymatic role Biochemistry incredibly complicated, but to first approximation: shape of molecule (“lock and key”) By the way, biologists working on the origin of life now generally believe that it started with an “RNA world.” DNA came later.

Yeast tRNA 40% overlap with corresponding human tRNA “common metabolism” Mutant form of U sequence space shape space

Secondary Structure: small subunit ribosomal RNA C,G,A,U nucleotides “hairpin” “cloverleaf” Escherichia coli

PDB 283d PDB 405d PDB 1e4p PDB 1r7w PDB 1kh6 Examples of basic RNA secondary structure motifs (spacefill view, 3d secondary structure, secondary structure motifs)

— glue • planar as shown: two “hairpins” • but if glue 4 to 3, no longer planar • Rigidity constraint: chain not infinitely flexible, gluing 1 and 2 not allowed. • If glue 5 and 1, gives “kissing hairpins”

Statistical mechanics — Gluing the nucleotide at site i to the nucleotide at site j gives a contribution of V_ij to the partition function; V contains the Boltzmann factor including the rigidity constraint etc. V carries information about the specific sequence we are studying. Exists a literature on random sequences.

Z = 1 + Σ Vij + Σ Vij Vkl + Σ VikVjl + Σ VikVjlVmn +… + Σ VVV…V These look like Feynman diagrams: a quark propagating along, emitting and absorbing gluons Just list all ~L! possibility

Mathematically, better to glue the two ends to form a circle

Mathematicians do not like to draw on a flat piece of paper; prefer manifolds without boundary, “compactify” plane into sphere so no boundary at infinity. Put a hole in the sphere to represent RNA chain “punctured sphere” Sphere: genus g = o

Physicists: lines cross Mathematicians: on what kind of surface (what genus?) can we draw this without the lines crossing? “Punctured Torus” Torus (or doughnut, bagel) with a hole put into it Punctured Torus Torus: genus g=1 Different meaning in biology and mathematics!

Genus Mathematics (Topology): How many handles? Better word than holes Biology (Taxonomy): Group of species, e.g. Homo, Pan A joke about a conference in mathematical biology

Two handles

Biologists called planar diagrams, i.e. diagrams that could be drawn on a sphere without crossing secondary structures. They call diagrams that could be drawn on a torus without crossing tertiary structures Experimentally, the relative importance of tertiary structures to secondary structures can be dialed by varying Mg++ ion concentration (has to do with screening due to Mg++)

Hairpin PDB 1a51 H-pseudoknot PDB 1a60

More RNA pseudoknots

Euler’s Theorem F V E 4 4 6 • F ∆V ∆E • 4-2 1 3 • = 2 = ∆ F + ∆V – ∆E = 0 F + V – E = 2 = 4 + 4 – 6

Now theoretical physicists could exploit our knowledge of random matrix theory and large N quantum chromodynamics to formulate the RNA folding problem and have fun! Random matrix theory was invented by Wigner (Nobel laureate) in the mid-1950s to study nuclear physics, but its growth and ramifications over the last 50 years have been beyond the wildest dreams of Wigner and his friends,with implications and uses in diverse fields, e.g. disordered materials in condensed matter physics, pure mathematics, operator algebra, M-theory (supposed to contain string theory) & more recently econophysics

Hollywood already knows! The matrix secretly rules the universe.

Matrix theory representation of Z } Flavor or site L matrices living in 1d where Color Non-translation invariant Gaussian field theory with non-trivial observable Wick contraction reproduces Z as given earlier Expansion in 1/N gives us a systematic procedure for extracting secondary structure, tertiary structure, etc.

Trouble is N implicit in size of matrices • But field theorists have a bag of tricks to massage the integral & render the N dependence explicit • Product as inverse of operator • Inverse as ∫over fermionic variables • ∫ out φ interacting 4-fermionic theory • Fierz transform to write in terms of color singlets • Introduce matrix field A to coupleto color singlets matrix Now N explicit & can do large N expansion

Pillsbury, Orland, & Zee, Phys. Rev. E72 showed that there are 8 topologies of pseudoknots with genus 1. In an obvious pairing notation: ABAB ABACBC ABCABC ABCADBCD + 4 obtained by nesting as shown in following figure

There are only 8 types of irreducible pseudoknot with genus g=1 Quite common in RNA databases Very rare Not yet been reported Encouragement to experimentalists! (e.g., Prof. A. Mondragon, Northwestern University, Evanston IL)

Recent progress: focus on massive Monte Carlo, numerical work largely carried out by Graziano Vernizzi et al. See Bon, Vernizzi, Orland, & Zee, q-bio.BM/0607032 (21 July 2006) • Caution to particle and quantum field theorists: • In this paper #Loops=#Colors-1 and • Genus g = (P-L)/2 (P is #pairs) • But concepts of reducibility, primitive diagrams, etc are same • as in Dyson.

More progress: We (this means the young people!) analyzed the two main databases for RNA: • PSEUDOBASE • wwPDB (world wide protein database, it contains some RNA)

PDB 1vou: 70s ribosome from Escherichia Coli primitive arc structure Genus 7 Length 2825 Pseudoknot with genus 1, decorated with 6 simple H- and K-pseudoknots

As a theoretical benchmark we solved analytically a simple • model (V taken to be 1 except for the diagonal elements, • non-zero as required to regulate the calculation) • Vernizzi, Orland, & Zee, Phys. Rev. Letters 94 (2005) • (generalized Laguerre polynomials vs. Kazakov method) • Asymptotically, genus ~ 0.25 L . For L=2000, g ~ 500. • Including steric constraints reduces g ~ 280. • (Vernizzi, Ribeca, Orland, & Zee, Phys. Rev. E73 (2006)) • Interestingly, Nature reduces the possible topology. • (Our analysis shows that most structures built from very simple • primitive blocks, with genii 1 or 2, nested inside more complex • pseudoknots, of genus typically smaller than 8.)

Recent paper: • Bon, Vernizzi, Orland, & Zee, • Journal of Molecular Biology, vol. 379 (2008), p. 900. • explanatory and review written for biologists

An amusing footnote: • A classic problem in combinatorial optimization is the bipartite • matching problem, known as the “marriage problem” • Given “happiness” (energy<0) when m_i and w_j matched • Minimize total energy • Studied by Knuth, Nieuwenhuizen, Orland, Mezard & Parisi, • Y-C. Zhang, ….. • Obviously, the N = 1 limit of our RNA folding representation!

Folding RNA a confluence of biology, mathematics, and physics