Introduction to linkage analysis

Introduction to linkage analysis Harald H.H. Göring Course “Study Design and Data Analysis for Genetic Studies”, Universidad ded Zulia, Maracaibo, Venezuela, 9-10 April 2005

“Marker” loci • There are many different types of polymorphisms, e.g.: • single nucleotide polymorphism (SNP): • AAACATAGACCGGTT • AAACATAGCCCGGTT • microsatellite/variable number of tandem repeat (VNTR): • AAACATAGCACACA----CCGGTT • AAACATAGCACACACACCGGTT • insertion/deletion (indel): • AAACATAGACCACCGGTT • AAACATAG--------CCGGTT • restriction fragment length polymorphism (RFLP) • …

Tracing chromosomal inheritanceusing “marker” locus genotypes 1/2 5/5 1/4 3/4 5/5 5/5 1/5 4/5

Tracing chromosomal inheritance(fully informative situation)

Linkage analysis:locus with known genotypes 1/2 3/3 2/4 1/1 1/3 1/2 Where do the observed genotypes “fit”? 2/3 1/3

Linkage analysis In linkage analysis, one evaluates statistically whether or not the alleles at 2 loci co-segregate during meiosis more often than expected by chance. If the evidence of increased co-segregation is convincing, one generally concludes that the 2 loci are “linked”, i.e. are located on the same chromosome (“syntenic loci”). The degree of co-segregation provides an estimate of the proximity of the 2 loci, with near complete co-segregation for very tightly linked loci.

Let’s step back…to Mendel

One of Mendel’s pea crosses P1 Mendel’s law of uniformity F1 F2 Mendel’s law of independent assortment observed: ~ ratio: 315 108 101 32 9 : 3 : 3 : 1

1 1 2 2 12 12 1 1 1 2 2 2 P1 Mendel’s law of uniformity F1 Mendel’s law of segregation F2 25% 50% 25% (in expectation)

a a b b ab ab a a a b b b P1 Mendel’s law of uniformity F1 Mendel’s law of segregation F2 25% 50% 25% (in expectation)

1 2 2 2 1 1 12 1 2 12 2 2 1 1 1 1 2 2 1 1 1 2 2 2 b b a a ab ab b b a a a a a b b b a a a b a b b b Assume, we did this experiment and observed the following: non-independent assortment 25% 50% 25% P1 Mendel’s law of uniformity F1 F2 Mendel’s law of independent assortment 6.25% 12.5% 6.25% 12.5% 12.5% 6.25% 12.5% 6.25% 25 % (in expectation)

P1 generation (diploid) 1 1 2 2 Co-segregation(due to linkage) a a b b gametes (haploid) 1 2 a b F1 generation (diploid) Mendel’s law of uniformity 1 2 1 2 a b a b gametes (haploid) 1 2 1 2 a b a b F2 generation (diploid) Mendel’s law of segregation 1 1 1 2 2 2 a a a b b b 25 % 50 % 25 %

Recombination Recombination between 2 loci is said to have occurred if an individual received, from one parent, alleles (at these 2 loci) that originated in 2 different grandparents.

1/1 2/2 a/a b/b 1 2 1 3 2 3 2 3 1 3 1 3 1 3 1 3 2 3 2 3 2 3 b c b c b c a b b c a c a c b c a c a c a c 3/3 c/c ? ? ? ? ? ? ? ? ? ? Who is a recombinant? N N N R N N N N R N

2 1 2 1 1 2 b b a a b a 1 1 2 2 b a a b Possible explanations for recombination 1/1 2/2 a/a b/b N R R N 1 1 2 2 I different chromosomes b a a b homologous recombination during meiosis 1 1 2 2 II b a a b III genotyping error 2 R a

Recombination fraction The recombination fraction between 2 loci is defined as the proportion of meioses resulting in a recombinant gamete. For loci on different chromosomes (or for loci far apart on the same, large chromosome), the recombination fraction is 0.5. Such loci are said to be unlinked. For loci close together on the same chromosome, the recombination fraction is < 0.5. Such loci are said to be linked. The closer the loci, the smaller the recombination fraction ( 0).

1/1 2/2 a/a b/b 1 3 2 3 1 2 2 3 1 3 2 3 2 3 1 3 2 3 1 3 1 3 a b a c b c a c b c a c b c b c a c b c a c 3/3 c/c Estimation of recombination fraction N N N R N N N N R N

2 3 2 3 1 2 1 3 1 3 1 3 1 3 2 3 2 3 2 3 1 2 1 3 a c a c a b a c a c b c a c b a b c b c b c b c phase 2: phase 1: R N R N N R R N N R R N N R N R R N N R Missing phase information:Who is a recombinant?? 1/2 3/3 a/b c/c

Missing phase and genotype information:Who is a recombinant?? 1 3 2 3 2 3 1 3 2 3 1 3 2 3 1 3 2 3 1 3 a c a c b c a c b c b c b c b c a c a c ?/? 1/2 3/3 a/b c/c

Missing phase and genotype information:Who is a recombinant??? 2 3 1 3 2 3 2 3 1 3 2 3 1 3 2 3 1 3 1 3 a c a c b c a c b c b c b c b c a c a c ?/? ?/? a/b c/c

Likelihood • The likelihood of a hypothesis (e.g. specific parameter value(s)) on a given dataset, L(hypothesis|data), is defined to be proportional to the probability of the data given the hypothesis, P(data|hypothesis): L(hypothesis|data) = constant * P(data|hypothesis) • Because of the proportionality constant, a likelihood by itself has no interpretation. • The likelihood ratio (LR) of 2 hypotheses is meaningful if the 2 hypotheses are nested (i.e., one hypothesis is contained within the other): • Under certain conditions, maximum likelihood estimates are asymptotically unbiased and asymptotically efficient. Likelihood theory describes how to interpret a likelihood ratio.

Evaluating the evidence of linkage:lod score The lod (logarithm of odds) score is defined as the logarithm (to the base 10) of the likelihood of 2 hypothesis on a given dataset: In linkage analysis, typically the different hypotheses refer to different values of the recombination fraction:

1/1 2/2 a/a b/b 1 2 1 3 2 3 2 3 1 3 1 3 1 3 1 3 2 3 2 3 2 3 b c b c b c a b b c a c a c b c a c a c a c 3/3 c/c ? ? ? ? ? ? ? ? ? ? Who is a recombinant? N N N R N N N N R N

Example lod score calculation

2 3 2 3 1 2 1 3 1 3 1 3 1 3 2 3 2 3 2 3 1 2 1 3 a c a c a b a c a c b c a c b a b c b c b c b c phase 2: phase 1: R N R N N R R N N R R N N R N R R N N R Missing phase information:Who is a recombinant?? 1/2 3/3 a/b c/c

Example lod score calculation(missing phase information) P(data|q) = P(phase 1) P(data|phase 1, q) + P(phase 2) P(data|phase 2 , q)

Missing phase and genotype information:Who is a recombinant??? 2 3 1 3 2 3 2 3 1 3 2 3 1 3 2 3 1 3 1 3 a c a c b c a c b c b c b c b c a c a c ?/? ?/? a/b c/c

Example lod score calculation(missing phase and genotype information) Assuming 3 equally frequent alleles , i.e. P(1) = P(2) = P(3) = 0.333: q Z(q) 0 -0.304 0.1 0.204 0.2 0.346 0.3 0.264 0.4 0.096 0.5 0 Assuming P(1) = 0.495, P(2) = 0.495, P(3) = 0.010: q Z(q) 0 -0.378 0.1 0.183 0.2 0.332 0.3 0.253 0.4 0.091 0.5 0

known phase, known genotypes unknown phase, known genotypes 3 unknown phase, unknown genotypes

Interpretation of lod score • The traditional threshold for declaring evidence of linkage statistically significance is a lod score of 3, or a likelihood ratio of 1000:1, meaning the likelihood of linkage on the data is 1000-times higher than the likelihood of no linkage on the data. • Asymptotically, a lod score of 3 has a point-wise significance level (p-value) of 0.0001. In other words, the probability of obtaining a lod score of at least this magnitude by chance is 0.0001. • Due to the many linkage tests being conducted as part of a genome-wide linkage scan, a lod score of 3 has a significance level of ~0.05.

P-value The p-value is defined as the probability of obtaining an outcome at least as extreme as observed by chance (i.e. when the null hypothesis is true). Example: Testing whether a coin is fair H0: P(head) = 0.5 H1: P(head)  0.5 (2-sided alternative hypothesis). You observe 1 head out of 10 coin tosses. The p-value then is the probability of observing exactly 1 head in 10 trials (observed outcome), or 0 head in 10 trials (more extreme outcome), or 9 (equally extreme outcome) or 10 (more extreme outcome) heads in 10 trials.

P-value The p-value is defined as the probability of obtaining an outcome at least as extreme as observed by chance (i.e. when the null hypothesis is true). Example: Testing whether 2 loci are linked H0: P(recombination) = 0.5 H1: P(recombination) ≤ 0.5 (1-sided alternative hypothesis). You observe 0 recombinant and 10 non-recombinant in 10 informative meioses. The p-value then is the probability of observing exactly 0 recombinants in 10 trials (observed outcome; there is no more extreme outcome).

Example: Testing whether 2 loci are linked H0: P(recombination) = 0.5 H1: P(recombination) ≤ 0.5 (1-sided alternative hypothesis). You observe 0 recombinant and 10 non-recombinant in 10 informative meioses. The p-value then is the probability of observing exactly 0 recombinants in 10 trials (observed outcome; there is no more extreme outcome). Lod score In the ideal case, 10 fully informative meioses may suffice to obtain significant evidence of linkage.

Lod score and significance level

Linkage analysis reducesmultiple testing problem • Linkage analysis is so useful because it greatly reduces the multiple testing problem: ~3,000,000,000 bp of DNA are interrogated in ~500 independent linkage tests for human data. This is possible because a meiotic recombination event occurs on average only once every 100,000,000 bp. • No specification of prior hypotheses is therefore necessary, as all possible hypotheses can be screened.

Linkage analysis: trait locus with unknown genotypes ?/? ?/? ?/? ?/? ?/? ?/? ?/? ?/? Where do the observed genotypes “fit”?

observed marker genotypes correlation to be detected observed trait phenotypes ? etiology unobserved trait locus genotypes genetic distance (linkage, allelic association) Statistical gene mapping with trait phenotypes

Many different types of linkage methods • penetrance model-based linkage analysis (“classical” linkage analysis) • penetrance model-free linkage analysis (“model-free” or “non-parametric” linkage analysis • affected sib-pair linkage analysis • affected relative-pair linkage analysis • regression-based linkage analysis • variance components-based linkage analysis • …

Variation with each linkage method • 2-point analysis vs. multiple 2-point analysis vs. multi-point analysis • exact calculation vs. approximation (e.g., MCMC) • qualitative trait vs. quantitative traits • rare “simple mendelian” diseases vs. common “complex multifactorial diseases” • …

Penetrance-model-based linkage analysis

Segregation analysis In segregation analysis, one attempts to characterize the mode of inheritance of a trait, by statistically examining the segregation pattern of the trait through a sample of related individuals. In a way, heritability analysis is a way of segregation analysis. In heritability analysis, the analysis is not focused on characterization of the segregation pattern per se, but on quantification of inheritance assuming a given mode of inheritance (such as, generally, additivity/co-dominance).

Relationship between genotypes and phenotypes (penetrances) at the ABO blood group locus penetrance: P(phenotype given genotype) Phenotype (blood group) Genotype A B AB O A/A 1 0 0 0 A/B 0 0 1 0 A/O 1 0 0 0 B/B 0 1 0 0 B/O 0 1 0 0 O/O 0 0 0 1

Probability model correlating trait phenotypes and trait locus genotypes:penetrances penetrance: P(phenotype given genotype) Ex.: fully-penetrant dominant disease without “phenocopies” Phenotype Genotype unaffected affected +/+ 1 0 D/+ or +/D 0 1 D/D 0 1

observed marker genotypes correlation to be detected observed trait phenotypes not affected = affected D/+ +/+ unobserved trait locus genotypes genetic distance (linkage, allelic association) Statistical gene mapping with trait phenotypes:“simple” dominant inheritance model

Linkage analysis: trait locus (genotypes based on assumed dominant inheritance model) +/+ D/+ +/+ +/+ D/+ +/+ D/+ D/+ Where do the observed genotypes “fit”?

Example of multipoint lod score curve: Pseudoxanthoma elasticum From: Le Saux et al (1999) Pseudoxanthoma elasticum maps to an 820 kb region of the p13.1 region of chromosome 16. Genomics 62:1-10

Genetic heterogeneity locus homogeneity, allelic homogeneity time locus homogeneity, allelic heterogeneity locus heterogeneity, allelic homogeneity (at each locus) time locus heterogeneity, allelic heterogeneity (at each locus)

Pros and cons ofpenetrance-model-based linkage analysis + potentially very powerful (under suitable penetrance model) + statistically well-behaved - requires specification of penetrance model; not powerful at all under unsuitable penetrance model

Effects of model misspecification uninformative informative dominant inheritance: +/+ D/+ 1/2 3/4 P(aff.|DD or D+) = 1 P(aff.|++) = 0 D/+ +/+ D/+ 1/3 1/4 2/3 informative uninformative recessive inheritance: D/+ D/D 1/2 3/4 P(aff.|DD) = 1 P(aff.|++ or D+) = 0 D/D D/+ D/D 1/3 1/4 2/3

Pros and cons ofpenetrance-model-based linkage analysis + potentially very powerful (under suitable penetrance model) + statistically well-behaved - requires specification of penetrance model; not powerful at all under unsuitable penetrance model - modeling flexibility limited - computationally intensive

Introduction to linkage analysis

Introduction to linkage analysis

Presentation Transcript

Linkage analysis

AN INTRODUCTION TO RECOMBINATION AND LINKAGE ANALYSIS

Linkage Analysis I -- Parametric

Basics of Linkage Analysis

Linkage Analysis: An Introduction

Power of linkage analysis

Linkage analysis: basic principles

Introduction to Linkage and QTL mapping

Linkage analysis - genehunting

Linkage Analysis in Merlin

LINKAGE ANALYSIS

Introduction to Linkage

Genetic Linkage Analysis

Linkage Analysis

Linkage Analysis

Linkage Analysis

Linkage analysis

Introduction to Linkage Analysis

Genetic linkage analysis

AN INTRODUCTION TO RECOMBINATION AND LINKAGE ANALYSIS