Human Genetic Variation Genetics of Complex Diseases
Challenge 2: Correcting genotyping errors • How can we detect genotyping errors? • Hardy-Weinberg Equilibrium • If we have Mother-father-child trios we can check Mendelian consistency.
Challenge 3: Population Substructure • Imagine that all the cases are collected from Africa, and all the controls are from Europe. • Many association signals are going to be found • The vast majority of them are false; Why ??? Different evolutionary forces: drift, selection, mutation, migration, population bottleneck.
Shaping Genetic Variation • Mutations add to genetic variation • Natural Selection controls the frequency of certain traits and alleles • Genetic drift
Ancestral population migration
different allele frequencies Ancestral population Genetic drift
Population Substructure • Imagine that all the cases are collected from Africa, and all the controls are from Europe. • Many association signals are going to be found • The vast majority of them are false; What can we do about it?
Ancestry Inference • To what extent can population structure be detected from SNP data? • What can we learn from these inferences? • Can we build the tree of life? • How do we analyze complexpopulations (mixed)? Novembre et al., Nature, 2008
Principal Component Analysis • Dimensionality reduction • Based on linear algebra • Intuition: find the ‘most important’ features of the data.
Principal Component Analysis Plotting the data on a onedimensional line for which the spread is maximized.
Principal Component Analysis • In our case, we want to look at two dimensions at a time. • The original data points have many dimensions – each SNP corresponds to one dimension.
International consortium that aims in genotyping the genome of 270 individuals from four different populations. HUJI 2006
Launched in 2002. • First phase (2005): • ~1 million SNPs for 270 individuals from four populations • Second phase (2007): • ~3.1 million SNPs for 270 individuals from four populations • Third phase (ongoing): • > 1 million SNPs for 1115 individuals across 11 populations HUJI 2006
HapMap Populations MKK LWK YRI GIH ASW MEX JPT CHD CHB CEU TSI
Lessons from the HapMap • African populations have higher genetic diversity than other populations • Evidence for bottlenecks or founder effect in the other population • Evidence for the out-of-Africa theory • HapMap was used to detect: • Common deletions across the genome • Regions under selection • Recombination rates, hotspots • Associations of SNPs with disease
Example: detection of deletions using SNPs Conrad et al., Nature Genetics, 2006
Example: detection of deletions using SNPs • Conrad et al. applied the method on the HapMap and found: • Typical individuals have roughly 30-50 deletions larger than 5kb (500kb-750kb total sequence length). • Deletions tend to be gene-poor. • The deletions detected in the HapMap span 267 known and predicted genes. • Deletions were found to be related to different conditions such as Schizophrenia (Steffanson et al., 2008), lupus glomerulonephritis (Aitman et al., Nature, 2006), and others.
Distribution of deletion length Conrad et al., Nature Genetics, 2006
Significant Region • Why do we have differences between data1 and data2? • How come so many SNPs seem to be associatedin this region? • Maybe there are multiple ‘causal SNPs’? • Or maybe there are correlations between the SNPs… ?
Linkage Disequilibrium Signatures of History
Genotype T C C ì ü ì ü ì ü mother chromosome father chromosome A CG í ý í ý í ý G A A î þ î þ î þ ATACGA AGCCGC AGACGA ATCCGC Possible phases: …. Haplotypes vs. Genotypes • Cost effective genotyping technology gives genotypes and not haplotypes. Haplotypes ATCCGA AGACGC