1 / 60

Association mapping for mendelian, and complex disorders

Association mapping for mendelian, and complex disorders. UG Bioinformatics specialization at UCSD. Abstraction of a causal mutation. Looking for the mutation in populations.

yehuda
Télécharger la présentation

Association mapping for mendelian, and complex disorders

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Association mapping for mendelian, and complex disorders Bafna, BfB

  2. UG Bioinformatics specialization at UCSD Bafna, BfB

  3. Abstraction of a causal mutation Bafna, BfB

  4. Looking for the mutation in populations A possible strategy is to collect cases (affected) and control individuals, and look for a mutation that consistently separates the two classes. Next, identify the gene. Bafna, BfB

  5. Looking for the causal mutation in populations Problem 1: many unrelated common mutations, around one every 1000bp Case Control Bafna, BfB

  6. Case Control Bafna, BfB

  7. Looking for the causal mutation in populations Problem 2: We may not sample the causal mutation. Case Control Bafna, BfB

  8. How to hunt for disease genes • We are guided by two simple facts governing these mutations • Nearby mutations are correlated • Distal mutations are not Case Control Bafna, BfB

  9. This lecture • Nearby mutations are correlated • Distal mutations are not Case Control • The bottom line: How do these facts help in finding disease genes? • The genetics: why should this happen? • The computation • Challenge of complex diseases. Bafna, BfB

  10. The basics of association mapping 0 0 Case Control 0 • Sample a population of individuals at variant locations across the genome. Typically, these variants are single nucleotide polymorphisms (SNPs). • Create a new bi-allelic variant corresponding to cases and controls, and test for correlations. • By our assumptions, only the proximal variants will be correlated. • Investigate genes near the correlated variants. 0 1 1 1 1 Bafna, BfB

  11. So, why should the proximal SNPs be correlated, and distal SNPs not? Bafna, BfB

  12. A bit of evolution Time • Consider a fixed population (of chromosomes) evolving in time. • Each individual arises from a unique, randomly chosen parent from the previous generation Bafna, BfB

  13. Genealogy of a chromosomal population Time Current (extant) population Bafna, BfB

  14. Adding mutations Infinite sites assumption: A mutation occurs at most once at a site. Bafna, BfB

  15. SNPs The collection of acquired mutations in the extant population describe the SNPs Bafna, BfB

  16. Fixation and elimination • Not all mutations survive. • Some mutations get fixed, and are no longer polymorphic Bafna, BfB

  17. Removing extinct genealogies Bafna, BfB

  18. Removing fixed mutations Bafna, BfB

  19. The coalescent Bafna, BfB

  20. Disease mutation • We drop the ancestral chromosomes, and place the mutations on the internal branches. Bafna, BfB

  21. Disease mutation • A causal mutation creates a clade of affected descendants. Bafna, BfB

  22. Disease mutation • Note that the tree (genealogy) is hidden. • However, the underlying tree topology introduces a correlation between each pair of SNPs Bafna, BfB

  23. What have we learnt? • The underlying genealogy creates a correlation between SNPs. • By itself, this is not sufficient, because distal SNPs might also be correlated. • Fortunately, for us the correlation between distal SNPs is quickly destroyed. Bafna, BfB

  24. Recombination Bafna, BfB

  25. Recombination • In our idealized model, we assume that each individual chromosome chooses two parental chromosomes from the previous generation Bafna, BfB

  26. Multiple recombination change the local genealogy Bafna, BfB

  27. A bit of evolution • Proximal SNPs are correlated, distal SNPs are not. • The correlation (Linkage disequilibirium) decays rapidly after 20-50kb Bafna, BfB

  28. Basic statistics Bafna, BfB

  29. Testing for correlation • In the absence of correlation Bafna, BfB

  30. Testing for correlation • When correlated Bafna, BfB

  31. Assigning confidence Observed Expected X X X X Bafna, BfB

  32. Assigning confidence Observed Expected X X X X Bafna, BfB

  33. Assigning confidence Observed Expected X X X X Bafna, BfB

  34. Statistical tests of association Bafna, BfB

  35. Tests for association: Pearson Cases Controls O1 MM Mm mm O2 • Case-control phenotype: • Build a 3X2 contingency table • Pearson test (2df)= O3 O4 O5 O6 Bafna, BfB

  36. The χ2 test Cases Controls O1 O2 MM O3 O4 Mm O5 O6 mm • The statistic behaves like a χ2 distribution. • A p-value can be computed directly Bafna, BfB

  37. Χ2 distribution properties A related distribution is the F-distribution Bafna, BfB

  38. Likelihood ratio • Another way to check the extremeness of the distribution is by computing a (log) likelihood ratio. • We have two competing hypothesis. Let N be the total number of observations Bafna, BfB

  39. LLR • An LLR value close to 0, implies that the null hypothesis is true. Asymptotically, the LLR statistic also follows the chi-square distribution. Bafna, BfB

  40. Exact test • The chi-square test does not work so well when the numbers are small. • How can we compute an exact probability of seeing a specific distribution of values in the cells? • Remember: we know the marginals (# cases, # controls, Bafna, BfB

  41. Fischer exact test Cases Controls a b MM c d Mm e f mm • Num: #ways of getting configuration (a,b,c,d,e,f) • Den: #ways of ensuring that the row sums and column sums are fixed Bafna, BfB

  42. Fischer exact test • Remember that the probability of seeing any specific values in the cells is going to be small. • To get a p-value, we must sum over all similarly extreme values. How? Bafna, BfB

  43. Test for association: Fisher exact test Cases Controls a b MM c d Mm e f mm • Here P is the probability of seeing the exact count. • The actual significance is computed by summing over all such tables that are at least this extreme. Bafna, BfB

  44. Test for association: Fisher exact test Cases Controls a b MM c d Mm e f mm Bafna, BfB

  45. Continuous outcomes • Instead of discrete (Case/control) data, we have real-valued phenotypes • Ex: Diastolic Blood Pressure • In this case, how do we test for association Bafna, BfB

  46. Continuous outcome ANOVA • Often, the phenotypes are not offered as case-controls but like a continuous variable • Ex: blood-pressure measurements • Question: Are the mean values of the two groups significantly different? MM mm Bafna, BfB

  47. Two-sided t-test • For two categories, ANOVA is also known as the t-test • Assume that the variables from the two sets are drawn from Normal distributions • Different means, equal variances • Null hypothesis is that they are both from the same distribution Bafna, BfB

  48. t-test continued Bafna, BfB

  49. Two-sample t-test • As the variance is not known, we use an estimate S, defined by • The T-statistic is given by • Significant deviations from 0 are used to reject the Null hypothesis Bafna, BfB

  50. Two-sample t-test (unequal variances) • If the variances cannot be assumed to be equal, we use • The T-statistic is given by • Significant deviations from 0 are used to reject the Null hypothesis Bafna, BfB

More Related