1 / 52

The International HapMap Project: a Rich Resource of Genetic Information

The International HapMap Project: a Rich Resource of Genetic Information. Julia Krushkal Department of Preventive Medicine The University of Tennessee Health Science Center jkrushka{at}utmem.edu. HapMap Population Samples.

Télécharger la présentation

The International HapMap Project: a Rich Resource of Genetic Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The International HapMap Project: a Rich Resource of Genetic Information Julia Krushkal Department of Preventive Medicine The University of Tennessee Health Science Center jkrushka{at}utmem.edu

  2. HapMap Population Samples Project launched in 2002 to provide a public resource for accelerating medical genetic research • 270 Individuals from 4 Geographically Diverse Populations • YRI: 90 Yorubans from Ibadan, Nigeria • 30 parent-offspring trios • CEU: 90 northern and western European-descent living in Utah, USA from the Centre d’Etude du PolymorphismeHumain (CEPH) collection • 30 parent-offspring trios • CHB: 45 unrelated Han Chinese from Beijing, China • JPT: 45 unrelated Japanese from Tokyo, Japan http://www.hapmap.org/ HapMap http://www.genome.gov/page.cfm?pageID=10001688 NHGRI

  3. The International HapMap Project “…Determine the common patterns of DNA sequence variation in the human genome, by characterizing sequence variants, their frequencies, and correlations between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe.” Nature (2003) • Population-specific sequence variation • Allele frequencies • Linkage disequilibrium patterns • Haplotype information • Tag SNPs • Structural genome variation • Better understanding of human population dynamics and of the history of human populations • Cell lines available from Coriell Inst. for Medical Research • A rich resource for biomedical genetic analysis

  4. International HapMap Project Papers • The Int. HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs.Nature 449, 851-861. 2007 • The Int. HapMap Consortium. A Haplotype Map of the Human Genome. Nature 437, 1299-1320. 2005 • The Int. HapMap Consortium. The International HapMap Project.Nature 426, 789-796.. 2003 • The Int. HapMap Consortium. Integrating Ethics and Science in the International HapMap Project. Nature Reviews Genet 5, 467 -475. 2004 • Thorisson et al. The International HapMap Project Web site.Genome Res 15:1591-1593. 2005 • HapMap-related papers • Sabeti et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913-918. 2007. • Clark et al. Ascertainment bias in studies of human genome-wide polymorphism. Genome Res, 15:1496-1502. 2005 • Clayton et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nature Genet 37(11):1243-1246. 2005 • de Bakker et al. Efficiency and power in genetic association studies. Nature Genet, 37(11):1217-1223 2005 • Goldstein, Cavalleri. Genomics: Understanding human diversity. Nature 437:1241-1242. 2005. • Hinds et al. Whole genome patterns of common DNA variation in three human populations. Science 307:1072-1079. 2005. • Myers et al. A fine-scale map of recombination rates and hotspots across the human genome. Science, 310:321-324. 2005 • Nielsen R et al.Genomic scans for selective sweeps using SNP data.Genome Res 15:1566-1575. 2005 • Smith et al. Sequence features in regions of weak and strong linkage disequilibrium. Genome Res 15: 1519-1534. 2005 • Weir et al. Measures of human population structure show heterogeneity among genomic regions.Genome Res 15: 1468-1476. 2005.

  5. Nature (2003)

  6. Human Chromosomes • Contain DNA • 22 pairs of autosomes+ • sex-chromosomes (X and Y) + mitochondrial genome • Contain functional units (genes) and other DNA Human genome sequence is available as a reference, as a result of the Human Genome Project A significant amount of inter-individual variation exists

  7. Some Basic Definitions Locus- A site in the genome The DNA in the human genome is not a static entity. There are differences between different copies: Allele– a genetic variant, i.e., a form (state) of a locus Mutation- a genetic change An individual carries two copies of each locus on autosomes Individual alleles are inherited from parents to offspring (1 from each parent) Genotype- A set of alleles an individual is carrying at a given locus

  8. Chromosomes are sets of continuously linked genetic loci Example: Integrated map of chromosome 5 from the International HapMap Project, http://www.hapmap.org

  9. GeneticVariation • Some DNA loci vary among individuals • Linked genetic loci are inherited non-independently • Loci may change with time (mutation, selection, genetic drift) • Some DNA changes lead to quantitative changes in RNA expression and to quantitative or qualitative changes in protein production • Some genetic changes, even small, may lead to disease • A large amount of natural variation occurs in healthy individuals, i.e., • many changes are neutral • Loci genetically linked to the disease-causing locus can be used as genetic markers to search for the disease locus SNP1 SNP2 There are many types of DNA variation, e.g. Sequence variation AAAC/TGGCTA Microsatellite repeats …AATG AATGAATGAATG…

  10. Polymorphic Site A locus with commonDNA variation  2 alleles in a population Shows difference in DNA sequence among individuals In most definitions: the most common allele with frequency < 99%, or minor allele frequency (MAF)  1%, or MAF  2%, or at least two alleles have frequencies  1%. A rare allele that occurs in <1% of the population is usually non considered a polymorphic site.

  11. SNP=Single Nucleotide Polymorphism A SNP locus on the distal end of the long arm of human chromosome 5 (data from Ensembl) SNP locus rs6870660 http://www.ensembl.org CAAATTCCATG[A or C]AGAAGGAAATACAT A and C are alleles at SNP locusrs6870660

  12. A SNP locus on the distal end of the long arm of chromosome 5 SNP locus rs6870660 http://www.hapmap.org

  13. Regulatory Interactions: The ENCODE Project <> 2003-Pilot project launched (1% of the genome) 2007- Pilot project completed; production phase launched on the entire genome High-through-put experimental and computational approaches to studies of DNA regulatory sites, regulatory interactions, and DNA modification Production Scale Effort Pilot Scale Effort Data Coordination Center Technology Development Effort

  14. Genome SNP Variation Size of human genome is  3.2  109 bp 99.9% identical 9-10 mln SNPs may have MAF 5%  30,000 genes HapMap SNP Density Coverage • Phase I (published in 2005) • 1,007,329 SNPs that passed quality control • 1 SNP / 3000 bp • 11,500 nsSNP • 10 ENCODE regions, 500 kb each • 17,944 SNPs • 1 SNP / 279 bp • Phase II (published in 2007) • >3,806,000 SNPs • 1 SNP / 875bp • 25-30% of all SNPs with MAF  5% The cumulative number of non-redundant SNPs (each mapped to a single location in the genome) is shown as a solid line, as well as the number of SNPs validated by genotyping (dotted line) and double-hit status (dashed line). Years are divided into quarters (Q1–Q4).

  15. http://www.hapmap.org/

  16. SNP Differences among Individuals Far Exceed Differences among Populations Phase 1: Autosomes: Across the 1 million SNPs genotyped, only 11 have fixed differences between CEU and YRI, 21 between CEU and CHB/JPT, and 5 between YRI and CHB/JPT. X chromosome 123 SNPs were completely differentiated between YRI and CHB/JPT, but only 2 between CEU and YRI and 1 between CEU and CHB/JPT.

  17. Haplotypes A haplotype is a set of alleles at multiple loci located on the same copy of the chromosome Genotype calls obtained from sequencing or DNA chip genotyping do not provide the information about which of the two chromosomal copies a particular allele belongs to. E.g., genotypes for individual X: Haplotypes SNP# Genotypes SNP A A1 A2 A T SNP B B1 B2 T C SNP C C1 C2 G C A C C A1 B2 C2 Haplotype 1 Haplotype 2 A2 B1 C1 T T G

  18. A1 B1 A2 B2 Recombination “Random” event Occurs during meiosis The larger the distance between loci or as more generations pass, the more likely recombination(s) will occur A1 B1 A2 B2 Recombination (crossing-over) x A2 B1 A1 B2 A2 B2 A1 B1 Nonrecombinant Recombinant Haplotypes Haplotypes

  19. Two ancestral chromosomes being scrambled through recombination over many generations to yield different descendant chromosomes. If an A allele on the ancestral chromosome increases the risk of a disease, the two individuals in the current generation who inherit that part of the ancestral chromosome will be at increased risk. Source: the International HapMap Project

  20. Linkage Disequilibrium Associations among alleles at different loci A1 B1 D = Linkage disequilibrium coefficient Coefficient of association A2 B2 D=pA1B1-pA1pB1 Locus A Locus B Normalized disequilibrium coefficient Correlation coefficient D’=D/|D|max |D| max = | min(pA1pB2, pA2pB1)| -1  D’  1 =D/pA1pA2pB1pB2 In case of no association, D=0 (linkage equilibrium) Practical implications in fine gene mapping: Search for locus B using association of marker loci with disease

  21. The value of D decreases geometrically with each generation A B a  b D(t)=(1-  ) D(t-1) D(t)=(1-  ) tD(0) Unless the two loci are closely linked, the value of D should rapidly decrease to 0. The occurrence of association between two loci implies that they are closely linked.

  22. Haplotype Maps Generated by The International HapMap Project

  23. Haplotype Maps of the Human Genome Helmuth 2001, Science 293:583-585 Find correlations among groups of SNPs Haplotypes were inferred for the HapMap project from trios data and from unrelated individuals using Phase (Stephens 01; Stephens and Donnely 03)

  24. Haplotype Maps of the Human Genome Genome regions decomposed into discrete haplotype blocks, which capture similarity in haplotype organization Patil et al. 2001, Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21. Science 294(5547):1719-23

  25. Haplotype Block Partition Results for Three Populations 1,586,383 (SNPs) genotyped in 71 Americans of European, African, and Asian ancestry Population Blocks Average size, kb* Required SNPs African-American 235,663 8.8 570,886 European-American 109,913 20.7 275,960 Han Chinese 89,994 25.2 220,809 * Average distance spanned by segregating sites in each block. Minimum number of SNPs required to distinguish common haplotype patterns with frequencies of 5% or higher. Hinds et al. 2005 Science

  26. Extended LD bin and haplotype block structure around the CFTR gene. LD bins, where each bin has at least one SNP with r2 > 0.8 with every other SNP, are depicted as light horizontal bars, with the positions of constituent SNPs indicated by vertical tick marks as well as the extreme ends of the bars. Isolated SNPs are indicated by plain tick marks. Haplotype blocks, within which at least 80% of observed haplotypes could be grouped into common patterns with frequencies of at least 5%, are depicted as dark horizontal bars. Unlike haplotype blocks that are by design sequential and nonoverlapping, SNPs in one LD bin can be interdigitated with SNPs in multiple other overlapping bins Hinds et al 2005 Population differences in local bin structure Differences in allele and haplotype frequencies “Although analysis panels are characterized both by different haplotype frequencies and, to some extent, different combinations of alleles, both common and rare haplotypes are often shared across populations” (The Int. HapMap Project, Nature, 2005)

  27. Tag SNP (htSNP) selection Pairwise LD-based and haploblock-based tagging methods Partition haplotypes into blocks Can use haplotype-based (haploblocks) or genotype-based (LD-blocks) partitioning Select representative htSNPs from each block Latest DNA microarrays aim to capture SNPs with r2  0.8 “Tags are the subset of variants genotyped in a disease study. SNPs that are not typed in the study but whose effect can be studied through LD with a tag are termed proxies. A tag with perfect correlation (r2 = 1) to an untyped putative causal allele is termed a perfect proxy.” De Bakker et al., 2005

  28. Tag SNP, Haplotypes, and LD The Int. HapMap Consortium, Nature, 2005

  29. Use of Haplotypes in Association Analysis • Testing one marker at a time for associations is very time-consuming • Problem of multiple testing • Testing individual SNPs, we are not utilizing information from other markers Benefits of Using Haplotypes • Haplotypes allow us to use information from multiple loci simultaneously • LD information between loci is captured

  30. Benefits of Haplotype Analysis • Construct a single highly informative mega-locus from a number of less informative but closely linked loci • Identify genotyping or data entry errors. • Likelihood ratio tests indicate which typings are more likely to be an error • Find boundaries of conserved haplotypes associated with a trait. • Employs recombinations from the entire history a population

  31. Amount of Captured Sequence Variation in HapMap Phase II For common variants (MAF  0.05) the mean maximum r2 of any SNP to a typed one is 0.90 in YRI, 0.96 in CEU and 0.95 in CHB /JPT. 1.09 million SNPs capture all common Phase II SNPs with r2  0.8 in YRI. Very common SNPs with MAF  0.25 are captured extremely well (mean maximum r2 of 0.93 in YRI to 0.97 in CEU) Rarer SNPs with MAF,0.05 are less well covered (mean maximum r2 of 0.74 in CHB/JPT to 0.76 in YRI).

  32. Recombination Hot Spots

  33. Structural Genome Variation HapMap samples are also used as a resource for CNV analysis • Large number of copy number variants (CNVs) and other genome rearrangements found among individuals • Some variation is assumed normal, other may cause disease • Genome databases, e.g. Database of Genomics Variants at the TCAG of the Toronto Hospital of Sick Children, the Copy Number Variation Project Map at the Sanger Center

  34. Segmental duplications are recombination hotspots, causing global genome rearrangements

  35. HapMap Genome Browser

  36. Perlegen Genotype Browser

  37. UCSC Genome Browser http://genome.ucsc.edu/

  38. DNA Chips and Resequencing: High-through-put Analysis of Sequence Variation An easy way to access genome-wide variation Both Affymetrix and Illumina DNA chips contain representative SNP and CNV probes AffymetrixGeneChip 6.0: 1.8 million markers for genetic variation, including 906,000 SNPs and 946,000 copy number probes. Illumina 1M Bead Chip and 1M-duo Bead Chip: ~950,000 genome-spanning tag SNPs; ~100,000 additional non-HapMap SNPs, >565,000 SNPs in and near coding regions such as nsSNPs, promoter regions, 3’ and 5’ UTRs; dense coverage in ADME and MHC regions. ~260,000 markers located in novel and reported copy number polymorphic regions. Sequenom mass arrays (based on Maldi-TOF)

  39. Genome-Wide Association Select representative htSNPs from low diversity haplotype blocks Adjustment for multiple comparisons LD values highly variable: smoothing function needed Haplotypes in a sliding window OR screen for top SNPs likely functional SNPs SNPs in genes involved in pathways of interest

  40. Use of Phase-Resolved Data in Association Analysis • Find association with haplotypes similar to analyses of individual SNP alleles; Need to consider multiple testing • Test for tendency of cases to ‘cluster’ around groups of ‘similar’ haplotypes • Extend log-linear approach to take haplotype structure into account • Modifications also used for ambiguous phase

  41. http://www.genome.gov/26525384 As of 04/14/2008, GWAS of 150 traits posted

More Related