Mathematical Model of the Genetic Code: Structure and Applications

A mathematical model of the genetic code: structure and applications Antonino Sciarrino Università di Napoli “Federico II” INFN, Sezione di Napoli TAG 2006 Annecy-leVieux, 9 November 2006

Mathematical Model of the Genetic Code Work in collaboration with Luc FRAPPAT Paul SORBA Diego COCURULLO

SUMMARY • Introduction • Description of the model • Applications : Codon usage frequencies DNA dimers free energy • Work in progress

It is amazing that the complex biochemical relations between DNA and proteins were very quickly reduced to a mathematical model. Just few months after the WATSON-CRICK discovery G. GAMOW proposed the “diamond code”

Gamow “diamond code” Gamow, Nature (1954) Nucleotides are denoted by number 1,2,3,4 Amino-acids FIT the rhomb -shaped “holes” formed by the 4 nucleotides  20 a.a. !

Since 1954 many mathematical modelisations of the genetic coded have been proposed (based on informatiom, thermodynamic, symmetry, topology… arguments) Weak point of the models: often poor explanatory and/or predictive power

The genetic code

Crystal basis model of the genetic code L.Frappat, A. Sciarrino, P. Sorba: Phys.Lett. A (1998) 4 basisC, U/T (Pyrimidines) G, A (Purines) are identified by a couple of “spin” labels (+  1/2, -  -1/2) Mathematically - C,U/T,G,A transform as the 4 basis vectors of irrep. (1/2, 1/2) of U q  0 (sl(2)H sl(2)V)

Crystal basis model of the genetic code • Dinucleotides are composite states ( 16 basis vectors of (1/2, 1/2)2 ) belonging to “sets” identified by two integer numbers JH JV Ineach “set” the dinucleotide is identified by two labels - JH  JH,3  JH - JV  JV,3  JV Ex. CU = (+,+)  (+, -) ( JH = 1/2, JH,3 = 1/2; JV = 1/2, JV,3 = 1/2) Follows from property of U(q  0)(sl(2))

DINUCLEOTIDE Representation Content

Crystal basis model of the genetic code • Codons are composite states ( 64 basis vectors of (1/2, 1/2) ) belonging to “sets” identified by half- integerJH JV (“set”  irreducible representation = irrep.) Ex. CUA = (+,+)  (-, +)  (-,-) ( JH = 1/2, JH,3 = 1/2; JV = 1/2, JV,3 = 1/2) Follows from property of U(q  0)(sl(2))

Codons in the crystal basis

Codon usage frequency • Synonymous codons are not used uniformly (codon bias) • codon bias (not fully understood) ascribed to evolutive-selective effects • codon bias depends  Biological species (b.sp.)  Sequence analysed  Amino acid (a.a.) encoded  Structure of the considered multiplet  Nature of codon XYZ  …………………….

Codon usage in Homo sap.

Our analysis deals with global codon usage , i.e. computed over all the coding sequences (exonic region) for the b.sp. of the considered specimen  To put into evidence possible general features of the standard eukaryotic genetic code ascribable to its organisation and its evolution

Let us define the codon usage probability for the codon XZN (X,Z,N  {A,C,G,UT in DNA} )P(XZN) = limit n   n XZN / N totn XZNnumber of times codon XZN used in the processes N tot total number of codons in the same processes For fixed XZ Normalization ∑NP(XZN) = 1 Note - Sextets are considered quartets + doublets  8 quartets

Def. - Correlation coefficient rXY for two variables X P..XY P..Y

Specimen (GenBank Release 149.0 09/2005 - Ncodons > 100.000) • 26 VERTEBRATES • 28 INVERTEBRATES • 38 PLANTS • TOTAL - 92 Biological species

Correlation coefficient VERTEBRATES

Correlation coefficient PLANTS

Correlation coefficient INVERTEBRATES

Averaged value of P(..N)

Averaged value of sum of two correlated P(N)  

Ratios of obs2(X+Y) and th2(X+Y) = obs2(X)+ obs2(Y) averaged over the 8 a.a. for the sum of two codon probabilities

 Indication for correlation for codon usage probabilitiesP(A) and P(C) (P(U) and P(G)) for quartets.

Correlation between codon probabilities for different a.a. • Correlation coefficients between the 28 couples P XZN-X’Z’N where XZ(X’Z’) specify 8 quartets. The following pattern comes out for the whole eucaryotes specimen (n = 92)

The set of 8 quartets splits into 3 subsets • 4 a.a. with correlated codon usage (Ser, Pro, Arg, Thr) • 2 a.a. with correlated codon usage (Leu, Val) • 2 a.a. with generally uncorrelated codon usage (Arg, Gly)

Statistical analysis   Correlation for P(XZA)-P(XZC),XZ quartets  Correlation for P(N) between {Ser, Pro, Thr, Ala} and {Leu, Val} The observed correlations well fit in the mathematical scheme of the crystal basis model of the genetic code

In the crystal basis model P(XYZ) can be written as function of

ASSUMPTION

SUM RULES K INDEPENDENT OF THE b.s. XZ  QUARTETS

SUM RULES  “Theoretical” correlation matrixXZ = NC,CG,GG,CU,GU

Observed averaged value of the correlation matrix , in red the theoretical value

Shannon Entropy Let us define the Shannon entropy for the amino-acid specified by the first two nucleotide XZ (8 quartes)

Shannon Entropy Using the previous expression forP(XZN) we get N  (XZN), HbsN Hbs(XZN),PN  P(XZN)  SXZlargely independent of the b.sp.

Shannon Entropy

DNA dinucleotide free energy Free energy for a pair of nucleotides, ex. GC, lying on one strand of DNA, coupled with complementary pair, CG, on the other strand. CG from 5’  3’ correlated with GC from 3’  5’

DINUCLEOTIDE Representation Content

SUM RULES for FREE ENERGY

Comparison with exp. data G in Kcal/mol

DINUCLEOTIDE Distribution

Comparison with experimental data

Work in progress and future perspectives Fron the correspondence {C,U/T,G,A} I.R. (1/2,1/2) of U q  0 (sl(2)H sl(2)V)  Any ordered N nucleotides sequence  Vector of I.R.  (1/2,1/2)Nof U q  0 (sl(2)H sl(2)V)  New pametrization of nucleotidees sequences

“Spin” parametrisation

Algorithm for the “spin” parametrisation of orderedn-nucleotide sequence

From this parametrisation: • Alternative construction of mutation model, where mutation intensitydoes not depend from the Hamming distance between the sequences, but from the change of “labels” of the “sets”. C. Minichini, A.S., Biosystems (2006) • Characterization of particular sequences (exons, introns, promoter, 5’ or 3’ UTR sequences,….) L. Frappat, P. Sorba, A.S., L. Vuillon, in progress

Mathematical Model of the Genetic Code: Structure and Applications

Mathematical Model of the Genetic Code: Structure and Applications

Presentation Transcript

Classical and Modern Genetics

Introduction to API Process Simulation

Genetic Engineering

The Genetic Code, Mutations, and Translation

Ontology Generation and Applications

WRF Model: Software Architecture and Code Structure

Models of Transactions

Genetic Testing

Liquid Crystals : Structure, Properties and Applications

Osmosis and Gap Junctions in Spreading Depression: A Mathematical Model

Genetic Engineering

Genetic Algorithms

Chapter 30 Protein Synthesis

14.02 Recitation

Evolutionary Analysis

Module 3: Relational Model

DNA --The Blueprint of Life

Pedigree Analysis ONE

Chapter 21

Relational Model and Relational Algebra

Structure of the Atom

Clean Coders Hate What Happens To Your Code When You Use These Enterprise Programming Tricks