1 / 77

Mathematics Population Genetics. Introduction to the Stochastic Theory

Mathematics Population Genetics. Introduction to the Stochastic Theory. Guanajuato March 2009 Warren J Ewens.

adamdaniel
Télécharger la présentation

Mathematics Population Genetics. Introduction to the Stochastic Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mathematics Population Genetics.Introduction tothe Stochastic Theory Guanajuato March 2009Warren J Ewens

  2. Genes are of different types (= different “alleles” = different colors). We assume initially that at the gene locus of interest there are only two possible alleles, usually denoted (and denoted in the handout notes) as A1 and A2. To be colorful, in both sense of the word, we sometimes refer to these as the “red” allele and the “green” allele respectively.

  3. The individual shown is A1A2 (= red / green). The other two possibilities are (of course) A1A1 (=red / red) and A2A2 (= green / green). We next consider the entire population (of genes) at this locus, and discuss the evolution of the A1 and A2 allelic frequencies. Although these lectures (and slides) concern the stochastic theory of population genetics, we first consider (briefly) some simple aspects of the deterministic theory.

  4. Hardy-Weinberg frequencies Genotype: A1A1 A1A2 A2A2 Frequencies: x2 2x(1-x) (1-x)2 (eqn. (6)) Fitnesses w11 w12 w22 (eqn. (8)) or 1 + s 1 + sh 1 (eqn. (9)) or 1 – s1 1 1 – s2 (eqn. (10))

  5. x' – x ≈ sx(1-x) {x + h(1-2x)} (eqn. (11)) dx/dt ≈ sx(1-x) {x + h(1-2x)} (eqn. (12)) (eqn. (13))

  6. Markov chain theory Standard results are given in the notes in equation (20) - absorption probabilities, equation (21) - mean absorption times, equations (24)-(28) – conditional processes, equation (32) – stationary distribution equation (34) – reversibility.

  7. We use Markov chain theory to discuss the case where random changes in these frequencies occur from one generation to the next. We first consider the cases where there are no complicating features such as selection, mutation, two sexes, etc. Even for this very simple situation, there are MANY possible stochastic models describing these changes, (with greater or lesser accuracy). The first one that we consider is the “simple” Wright-Fisher model. This is a model of pure binomial sampling. It assumes a diploid population size that is constant over time at the value N, with non-overlapping generations, and no complicating features.

  8. Since only two alleles (A1 and A2) are allowed, and since the population size is assumed to be constant (= N individuals = 2N genes), it is sufficient to focus on the number of A1 genes in any generation. In generation t, this number is denoted by X(t). Thus number of A2 genes in generation t is since the number of green genes is automatically 2N – X(t). The binomial random sampling assumption implies that the Markov chain model for the number of ‘red” genes in the population is as shown on the following slide.

  9. The “simple” Wright-Fisher model (eqn. (35))

  10. There are two absorbing states (corresponding to “all genes are A1” and “all genes are A2”). With probability 1, one or other of these two states will eventually be entered, and “fixation” has occurred. We can ask: (i) what is the probability that the “all A1” state is eventually entered?” (ii) What is the mean number of generations until one of the absorbing states is entered? (iii) Given that eventually all genes are A1, what is the mean number of generations until this happens?

  11. The answer to question 1 is straightforward. Standard Markov chain shows that this probability depends on the initial number of A1 genes. If for different possible initial numbers i, (i = 0, 1, 2, …, 2N), this probability is denoted by πi, the set of values (π0, π1, π2,…, π2N) satisfies πi = Σj pij πj, (i = 1, 2, …, 2N-1), π0 = 0, π2N = 1. It is easy to see from this that πi = i / (2N). (eqn. (36)) Thus the required probability is X(0) / 2N. This result can also be found using martingale arguments – see eqn. (37).

  12. A more “genetic” way of getting this result is this: eventually all genes in the population will be descended from one gene in the parental generation. The probability that this is an A1 gene is, by symmetry, simply the initial proportion X(0) / 2N of A1 genes in the population. (Later we “time-reverse” this argument when considering the coalescent.)

  13. “Mean time” questions are much harder to answer, and to this day no exact answers are known. Early approaches to this problem centered around the eigenvalues of the Wright-Fisher transition matrix – see eqn. (38) - λ0 = λ1 = 1, λj = {(2N)(2N-1)…(2N-j+1)} / (2N)j, j = 2, 3, …. , 2N. In particular, λ2 = 1 – 1/(2N).

  14. The right - eigenvector corresponding to λ2 is r2' = (0, 1(2N-1), …, i(2N-i), …. 1(2N-1), 0). The left-eigenvector is unknown. It is approximately (1,1,1,…., 1,1,1). This leads to pij(n) ≈ Ci(2N-i){1-1/(2N)}n for large n.

  15. The Taylor series approach. (This is essentially the diffusion approximation approach – see later.) eqns(41, 42, 43)

  16. Mean times – Taylor series approximation eqns(47,49,50)

  17. Mean times with one initial A1 gene.eqns (49) and (53)

  18. Conditional process (conditional on fixation of A1)eqns.(24,27,28)

  19. Conditional mean times eqn(59,60,61)Applying these to the Wright-Fisher model, we get

  20. One-way mutation:the Wright-Fisher modeleqn. (63)

  21. One-way mutation: Taylor series (=diffusion) approximationeqns. (66), (67)

  22. Two-way mutationeqns. (76),(77),(78)

  23. Homozygosity probabilityThe case= u = veqn. (79)

  24. The Cannings (exchangeable) model

  25. Suppose that in the Cannings model, we write Xt for the number of A1 genes in generation t. There will then be a transition matrix for Xt. Then the eigenvalues of this transition matrix (describing the number of A1 genes in any generation) are (eqn. (81)):- λ0 = 1, λj = E(y1y2y3∙∙∙∙yj), j = 1, 2, …., 2N. Here λ1 ≥λ2 ≥λ3 …… ≥λ2N . This is a very useful formula.

  26. An example eqn(84)

  27. The Moran (birth-death) model eqns. (92,93,94)

  28. Mean sojourn times eqn. (97)

  29. Mean times to fixation or loss eqn. (98)

  30. Conditional mean times eqns. (99,100, 101)

  31. Largest non-unit eigenvalue and its eigenvectors eqn. (104)

  32. (Approximate) mean times (with one-way mutation) eqns. (109,110)

  33. Another (approximate) expression

  34. Infinitely many alleles:Wright-Fisher model eqn. (119)

  35. Homozygosity probability eqns. (120,121)

  36. Identity probability with three genes eqn. (136)

  37. Population mean of K eqns. (125,126,127)

  38. Identity probability with i genes eqn. (138)

  39. Sample partition formula eqn. (143)

  40. Sample distribution of K eqns. (145,146,147)

  41. From the sampling formula, Prob {one allele observed in a sample of n genes} = (n-1)! / (1+θ)(2+θ)∙∙∙∙(n-1+θ).

  42. Moran model:the entire population eqns. (151,152)

  43. Exact (Moran model) mean number of alleles with j representing genes eqn. (157), used in eqn. (156)

  44. Probability of quasi-fixationeqn. (158). See also eqn. (159)

  45. Quasi-fixation probabilities: the case θ = 1eqn. (161)

  46. Mean number of generations until loss of all current alleles

More Related