180 likes | 348 Vues
Population genetics. break. coalesce To grow together; fuse . To come together so as to form one whole; unite : The rebel units coalesced into one army to fight the invaders. The coalescent. 4Nu determines the level of variation under the neutral model:. The coalescent.
E N D
Population genetics break
coalesce • To grow together; fuse. • To come together so as to form one whole; unite: The rebel units coalesced into one army to fight the invaders.
The coalescent 4Nu determines the level of variation under the neutral model:
The coalescent Each two alleles have a common ancestor -> can be represented by a tree.
The coalescent s3 s4 s1 s2 T = 0 T = t1 T = t2 T = t3 The genealogy of the sample. The alleles might be the same by state or not.
The coalescent s3 s4 s1 s2 T = 0 The total time in the coalescent is: T = t1 T = t2 T = t3 Define Ti to be the time needed to reduce a coalescent with i alleles to a one with i-1 alleles. Thus, T4=t1, T3=t2-t1, and T2=t3-t2. Joining these equations we obtain: Or in general for n alleles:
The coalescent n alleles Tc is a function of N=population size and n=number of alleles in the sample. We can compute Tc assuming the infinite allele model. n-1 alleles Focusing on the last generation. For 2 alleles, what is the probability that they have different ancestors in the previous generation?
The coalescent We have n alleles. What is the probability that they all have different ancestors in the previous generation? Assuming N is very big, and thus ignoring terms in which N2 appears in the denominator, we obtain:
The coalescent The probability that n alleles have different ancestors in the previous generation? The probability that at least 2 allele out of n alleles have a common ancestor in the previous generation? This is the probability of a coalescent in each generation
The coalescent The probability of a coalescent in a single generation is: The number of generation till a coalescent is geometrically distributed with p=n(n-1)/4N. Thus, the expected time till a coalescent event is 1/p=4N/n(n-1). In other words:
The coalescent From the following two equations, we can obtain E(Tc)
The coalescent: adding mutation. s3 s4 s1 s2 T = 0 T = t1 T = t2 T = t3 The n alleles are either the same by states or not. Each mutation in the history of these alleles resulted in a segregating site. If there was one mutation, there is one segregating site. If there were 2 mutations, there are 2 segregating sites (the infinite allele model). In general: k mutation -> k segregating sites.
The coalescent Let u be the mutation rate per generation. Thus, the total number of mutation in a coalescent is, on average, uTc, which is: But, this is exactly the expectation of the number of segregating sites, S Since S can be estimated from the sample (i.e., the number of segregating sites observed) we can get an estimate of θ.
The coalescent Example: Assume 11 sequences, each 768 nucleotides, were sampled and 14 segregating sites were found. Estimate θfor each allele (sequence) and for each nucleotide site. Here, n=11 and the sigma equals to 2.929. E(S) is estimated to be 14, and hence the estimate of θ is 14/2.929 = 4.78. Hence 4Nu is estimated to be 4.78, for u which is the allele mutation rate. 4Nu in which u denotes the nucleotide mutation rate is 4.78/768 = 0.0062.
The coalescent • A few words about the harmonic series: • The sum is infinite. Proof: • The partial sum converges in the sense that • So the rate of growth of the series is the same as that of ln(n). For the series to be equal 3, one needs 10 samples. For the series to be equal 4, one already needs 30 samples.
The coalescent • We thus have 2 methods for estimating θ. • Based on the general heterozygosity: • Based on the number of segregating sites:
The coalescent The estimation based on general heterozygosity does not use the information from each site. The contrast between the two formulas can be used to test the neutral theory (Tajima’s D test).