1 / 34

Lecture 8: Linkage Analysis I

Lecture 8: Linkage Analysis I. Date: 9/19/02 General likelihood method for phase-known gametes Backcross, F2 variants, mixed crosses Statistical properties of q estimate. Limitations of Partitioned Test Statistics. Test statistics can be partitioned only when:

rafer
Télécharger la présentation

Lecture 8: Linkage Analysis I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 8: Linkage Analysis I Date: 9/19/02 General likelihood method for phase-known gametes Backcross, F2 variants, mixed crosses Statistical properties of q estimate.

  2. Limitations of Partitioned Test Statistics • Test statistics can be partitioned only when: • single mating type in all crosses (families) • same genotypes in all crosses (families) • In addition, the partitioning becomes cumbersome when the number of loci is large, as there is one partition for every locus.

  3. locus genotype A Gj = A1A2 B Gk = B1B1 General Likelihood Method • c = number of crosses (or families) • nAi = number of genotypes at locus A, cross i. • nBi = number of genotypes at locus B, cross i. • fijk = observed counts of genotype jk in cross i. • pijk = expected frequency of genotype jk in cross i, a function of qi.

  4. A Test for Heterogeneity in Linkage • Perhaps the first thing you want to do is check for heterogeneity in linkage. If there is no heterogeneity, then the crosses (families) can be pooled. • To proceed you must obtain the MLE, which makes the LR approach more tedious when a goodness-of-fit statistic would also apply.

  5. Finding the MLE: General Approach • Only in the backcross (BC) is an analytic MLE available. • In general, numeric methods are required. After listing all observable genotypes/phenotypes, the following are needed for each: • List the observed counts. • Calculate the expected frequency in terms of q. • For EM, also need Pi(recombination | genotype) • For NR or confidence intervals, need information.

  6. Newton-Raphson: Linkage Analysis • Obtain an expression for the score S(q) = dL(q)/dq • Obtain an expression for the information I(q) • Make a first guess q0. • Iterate qn+1 = qn – S(q)/[NI(q)] until | qn+1 –qn| < tolerance (e.g. 0.00001)

  7. EM Algorithm: Linkage Analysis • Make an initial guess qprevious. • Compute expected number of recombinants Ei = fiPi(R|G) • Compute new maximum likelihood estimate q new = 1/N ´ SiEi • Iterate until |qnew – qprevious|< tolerance

  8. Advantages of EM Algorithm for Linkage Analysis • No need to calculate the first and second derivative of the log likelihood. • The calculations are simpler.

  9. Information and Variance • Recall, that the variance of the MLE estimate is approximately normal with • So, calculating I(q) is necessary for NR & variance estimates.

  10. Three Things We Need • Expected frequencies pi. • Conditional probabilities of recombination Pi(R|G). • Score & information per individual.

  11. Inserted Slide: Genotype Probabilities

  12. Finding pi and Pi(R|G):F2 Double Codominant

  13. Finding pi and Pi(R|G):F2 Coupled Dominant

  14. Sample Data:F2 Coupled Dominant

  15. Log Likelihood Profile:F2 Coupled Dominant

  16. Finding pi and Pi(R|G):F2 Repulsion Dominant

  17. Finding pi and Pi(R|G):F2 Codominant/Dominant

  18. Information Per Observation

  19. ML Estimation in Mixed Populations: Data

  20. ML Estimation in Mixed Populations: Profile

  21. ML Estimation in Mixed Populations: Hypothesis Tests • GF2 = 2[LF2(qmle)-LF2(0.5)] = 2(103.03 – 52.33) = 101.4 (<0.00001); qmle=0.11 • GBC = 2[LBC(qmle)-LBC(0.5)] = 2(-200.16 + 277.1) = 154.18 (<0.00001); qmle=0.20 • Gpool = 2 [Lpool(qmle)-Lpool(0.5)] = 2(-101.15 + 224.77) = 247.24 (<0.00001); qmle=0.17 • Gtotal = GF2 + GBC = 255.6 • Gheterogeneity = Gtotal – Gpool = 8.36 (<0.01)

  22. Statistical Properties of qmle • In the backcross formulation, qmle is distributed as a binomial random variable. • For all other crosses and mixes, qmle distribution is obtained from the asymptotic properties of MLE estimators. Sample size needs to be large.

  23. Empirical Variance Using Bootstrap Sample from the data with replacement b times to generate b bootstrap data sets, such that the ith data set has genotype counts: ifA-B-, ifaaB-, ifA-bb, and ifaabb

  24. Comparison of Parametric & Empirical Variance Estimates

  25. Empirical Estimate of Bias

  26. Confidence Intervals • Simulation studies have shown that the bootstrap confidence intervals give smaller intervals and better coverage probabilities than the normal approximation when sample size is small (100-200).

  27. Power and Linkage Analysis • G(q) = 2[lnL(q)– lnL(1/2)] • Calculate the noncentrality parameter for a given qthat you wish to be able to detect. That noncentrality parameter is given by EG = E[G(q)]. • The power is given by:

  28. Sample Size and Linkage Analysis EG0 is the per observation expected log likelihood ratio.

  29. Sample Size and Cross sample size needed to detect given q with 95% power

  30. The Problem of Dominant Markers • The success of techniques like RAPD and AFLP has created many dominant markers. • Dominant markers in repulsion phase have low information (require larger sample size to obtain same confidence). • The q estimate is also biased for dominant markers in repulsion.

  31. Trans Dominant Linked Markers (TDLM) • Two dominant markers in linkage repulsion can be recoded as a codominant marker if they are linked closely enough.

  32. Assumption Violation: Segregation Distortion • Assume no segregation distortion at individual loci. • Additive distortion P(Aa) += a. Then the false positive rate increases in F2 cross. • Penetrance distortion P(Aa) *= a. Then the power for detection decreases.

  33. Summary • General likelihood method for phase-known gametes. • NR & EM for linkage analysis. • Backcross, F2 double codominant, F2 coupled dominant, F2 repulsion dominant, mixed populations • Bootstrap estimates of variance & bias. • Some problems: dominant markers, segregation distortion.

  34. Self Test • Can you derive the expected frequencies and conditional probabilities given in the tables here?

More Related