1 / 78

Linkage Analysis I -- Parametric

Linkage Analysis I -- Parametric . 2006.3.3 I-Ping Tu. Book reference. http://www.math.chalmers.se/Stat/Grundutb/Chalmers/TMS120/kompendium.pdf Genetic Linkage Web Resource: http://linkage.rockefeller.edu/. 1 Introduction. Quality Trait: e.g. tall/short, green/yellow,

oral
Télécharger la présentation

Linkage Analysis I -- Parametric

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linkage Analysis I-- Parametric 2006.3.3 I-Ping Tu

  2. Book reference • http://www.math.chalmers.se/Stat/Grundutb/Chalmers/TMS120/kompendium.pdf • Genetic Linkage Web Resource: http://linkage.rockefeller.edu/

  3. 1 Introduction • Quality Trait: e.g. tall/short, green/yellow, affected/unaffected • Assume Genetic Model  • parametric linkage analysis • lod score method • large pedigrees • No genetic model assumption • Nonparametric linkage analysis • Affected relative pairs

  4. Parametric vs. Non-parametriclinkage analysis • Parametric • Assume genetic model known • Non-parametric • No assumptions about the genetic model • The parametric model is more powerful when the genetic model is correctly specified. • Problem size limitations • Parametric – large pedigrees, small number of markers • Non-parametric – small pedigrees, many markers

  5. Phenotype • Binary • affected or unaffected • Left handed or right handed • Affected, unaffected, and unknown • Unknown – possibly part of the syndrome • Quantitative • Insulin resistance • Blood Pressure

  6. Marker alleles 1 and 4 Allelesat the disease locus A and a Definitions • Locus • Position on a chromosome • Marker locus • Disease locus • Marker • A measurable unit on a chromosome • Dinucleotide repeat (CA)n • Single nucleotide polymorphism(SNP) • Allele • The measurement at a marker locus • 2 alleles per locus (one per chromosome)

  7. The recombination fraction Θ Θ = Probability of recombination between two loci. Θ = 0.5 if ”large” distance. Θ < 0.5 if ”short” distanc An odd number of crossovers = recombination An even number = no recombination

  8. Haldane’s Mapping function

  9. Recombination fraction – An example No! Recombination fractions are not additive for large distances.

  10. Penetrance( Gentic Model) • Probability of being affected • Penetrance parameters: f = (f0 f1 f2) Definition: fk = Probability of being affected if you have k disease alleles k=0, 1, 2. fk = P(affected conditional on k disease alleles) k=0, 1, 2. fk = P(affected | k disease alleles) k=0, 1, 2. Notation: A = Disease allele a = Normal allele Disease genotypes: aa, Aa, or AA

  11. Penetrance continued

  12. aa Aa AA = Affected Population prevalence Kp = Proportion of affected individuals in a population = P(aff) Disease allele frequency p = 0.05 Assume that the population is in HWE P(aa) = (1-p)2 = 0.952 = 0.9025 P(Aa) = 2p(1-p) =0.095 P(AA) = p2 = 0.0025 Definition of conditional probability Kp = P(aff) = ?

  13. aa Aa AA Population prevalence contd. Kp = Area of the red square / Total area (aa + Aa + AA) = = P(aff ∩ aa) + P(aff ∩ Aa) + P(aff ∩ AA) = = P(aff | aa)P(aa) + P(aff | Aa)P(Aa) + P(aff | AA)P(AA) = = f0*(1-p)2 +f1*2p(1-p) + f2*p2 = = 0.03*0.9025 + 0.12*0.095 + 0.50*0.0025 = 0.039725 0.04 The Law of Total Probability

  14. Estimation of the genetic model • Segregation analysis • It is possible to estimate • mode of inheritance • number of loci contributing to a segregating phenotype. • penetrance parameters • Relative frequency (p) of the disease allele in the population • Problems? • Large population based samples required • Ascertainment bias • In parametric linkage analysis we assume that the genetic model is known.

  15. 2. Parametric two-pointlinkage analysis • Let q be the recombination freq between the diseased gene and the observed marker. • H0: q = 0.5 VS HA: q < 0.5

  16. Example: N = 4 trios with affected mother and daughter Estimation of the recombination fraction θ Assume : that all the 12 individuals have been genotyped for a specific DNA marker that all the mothers are heterozygous at the marker locus that mothers and fathers have disease genotypes (Aa) and (aa), respectively that each daughter has inherited a disease allele from her mother that parental marker genotypes are not identical that the phase is known for all the mothers (unrealistic) Data : Trio 1-3: No recombination between marker and disease locus Trio 4: Recombination between marker and disease locus Estimate :θ* = 1/4

  17. Estimation of θ continued • Assume that all meioses can be scored unequivocally as recombinant or non-recombinant with regard to a marker locus and a disease locus • n = Number of meioses • r = Number of recombinant meioses Estimate :θ* = r/n Estimates above 0.5 are not relevant from a biological point of view Definition: θ * = min(0.5, r/n)

  18. The binomial distribution The number of recombinants r among n independent meioses follows a binomial distribution. The probability of r recombinants out of n is a function of the recombination fraction θ. Let us denote this function L(θ). Note that L(θ) is the probability (likelihood) of the observed data if the recombination fraction is θ. The maximum likelihood estimate (MLE) of θ is the value θ* for which L(θ) reaches its maximum. MLE: θ*= r/n

  19. Lod score history • Score proposed by Haldane & Smith 1947 • Newton E. Morton analysed the distribution of the lod score statistic under various assumptions • Lod scores below -2 are generally accepted as significant evidence against linkage. • Common in replicating studies.

  20. More complicated situations • Phase Unknown • Marker or Disease gene homozygosity • Reduced penetrane • Varying penetrance • age, sex, phenotype, diagnostic uncertinty • Phenocopies • Missing marker data • Extended pedigrees • Pedigree loops • Multilocus genotypes

  21. Recessive mode of inheritance • Prerequisites • Autosomal recessive inheritance • 100% penetrance f0=f1=0, f2=1 • No phenocopies • Nuclear family typed for one informative marker • All four meioses are informative

  22. More complicated situations • Reduced penetrane • Varying penetrance • age, sex, phenotype, diagnostic uncertinty • Phenocopies • Missing marker data • Extended pedigrees • Pedigree loops • Multilocus genotypes

  23. Lod score assignment

  24. The pedigree likelihood contd. g = (G1, G2, G3, G4) in the recessive example. P(y|g) depends on the penetrance parameters f = (f0, f1, f2) P(g|θ) depends on disease and marker allele frequencies Ex: G1 in the recessive example: (1A|2a , 3A|4a) P(g|θ) = 2pq*2p1p2 for the father 2pq*2p3p4 for the mother θ2/4 for the affected daughter3 θ2/4 for the affecteddaughter4

  25. P(g|q) • P(y|g): genetic model • P(g|q)=PP(gi) PP(gj|gFjgMj) • i means founder • j means non-founder • Genotypes g includes those of marker and disease genes • Missing data, multilocus markers…

  26. More on missing marker data • Good estimates of the allele frequencies necessary • Assuming a uniform allele frequency distribution is usually no good idea • Bias • See e.g. Ott (1999) • Allele frequencies for markers available on Web-sites. • Genotype say 50 unrelated controls from the same population • Possible to use also alleles from individuals in the study without introducing bias.

  27. Heterogeneity • Allelic heterogeneity • Ex: Different mutations in BRCA1 will lead to the same phenotype • Genetic heterogeneity • Only a proportion of the families in a study can be explained by one disease locus. • Test for heterogeneity • Smith (1963) - The admixture test • Implemented in HOMOG (a program in the • LINKAGE package) • Estimates the proportion of linked families

More Related