1 / 43

Human Genome Structural Variation

Human Genome Structural Variation. Evan Eichler Howard Hughes Medical Institute University of Washington. June 3rd, 2006, HGM, Helsinki. Structural Variation & Disease. Insertions, Deletions, Inversions, Duplications, Translocations Three different models of human “genetic” disease

saeran
Télécharger la présentation

Human Genome Structural Variation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human Genome Structural Variation Evan Eichler Howard Hughes Medical Institute University of Washington June 3rd, 2006, HGM, Helsinki

  2. Structural Variation & Disease • Insertions, Deletions, Inversions, Duplications, Translocations • Three different models of human “genetic” disease • Rare Mendelian Disease—rare structural variants that segregate & cause disease (eg. Parkinson’s & Alzheimer Disease) • Recurrent Genomic Disorders—recurrent microdeletion or microduplication syndromes mediated by duplication (most de novo) (eg. PWS, CMT1A) • Common Disease Susceptibility—common, copy-number polymorphisms that predispose to disease (Lpa & coronary heart disease, FCGR3 & lupus; CC3L1 & AIDS susceptibility).

  3. Genome-Wide Screens of Normal Variation Large (~100kb) Small (~7kb) • 1503 variants, 115 Mb, 800 genes structurally variant • Non-randomly distributed • Environmental interaction/segmental duplications • Normal individuals differ by thousands of events Eichler (2006) Nat. Genet http://humanparalogy.gs.washington.edu/structuralvariation

  4. Structural Variation Disease Hypothesis • Structural variants are common, more likely to be under selective constraint, more recurrent and associated with environmental interaction genes. • We hypothesize that they will have a significant impact on rare and common disease in the population that can not be tracked by LD association mapping • Approach: Genomics Architecture Approach—Target dynamic regions and then follow up with disease.

  5. Objectives. • Discovery of Novel Genomic Disorders • Sequence-Based Resolution of Structural Variation

  6. A B C A B C TEL TEL Model of Genomic Disease/Variation A B C TEL A B C TEL Aberrant Recombination GAMETES Human Disease Triplosensitive, Haploinsufficient and Imprinted Genes • Hypothesis: Mechanism underlying Uncharacterized Mental Retardation?

  7. Duplication Map of Human Genome • 130 candidate regions (298 Mb) • 23 associated with genetic disease • Target patients array CGH Bailey et al. (2002), Science:293:1003-1007

  8. Normal Human DNA Sample Cy3 Channel Array of Human BAC Clones Hybridization Cy5 Channel Disease individual DNA Sample Merge Array Comparative Genomic Hybridization 12 mm • High-throughput detection of large-scale variation (>50 kb), • CNV (Copy-Number Variation)

  9. Duplication Microarray: Experimental Design BACs TEL dist: >50 kb<5 Mb prop: 95% identity, 10 kb • 130 regions of the human genome • 2178 BACs or on average ~10-12 BACs per region • Perform ArrayCGH—reciprocal dye swap experiments • Study Population: • Normal: 269 samples + 75 individual samples=344 total (284 unrelated; 60 trios) • Idiopathic Mental Retardation: 291 probands (Flint cohort, negative for FraX & normal karyotype)

  10. Study Populations • Normal unaffected (diversity panel and HapMap Samples). Target= 800 samples, Completed: 75 + 269 samples=344 total (284 unrelated; 60 trios). • Idiopathic Mental Retardation

  11. Large-Scale CNPs among HapMap Samples 108 CNP novel BACs 149 CNP BACs previously identified Altered CNV Frequency • Total of 384 CNV BACs among HapMap DNAs (n=263 HapMap samples passed criteria) • A total of 257 CNP BACs were identified (>2 individuals) • 147 CNP observed previously in our study of sample of 55 or published by others • 3.1% (9/257) were population-specific • 127 singleton BAC observations

  12. Validation using Nimblegen High-Density Oligo Arrays Log2 T/R Deletion Duplication • n=9, high density oligonucleotide array (385,000) • Overall, validation for 194/257 (75.4% true positives). Locke, Sharp, McCarroll et al., in press

  13. Study Populations • Normal unaffected (diversity panel and HapMap Samples). Target= 800 samples, Completed: 75 + 269 samples=344 total—Identified additional 257 CNPs. • Idiopathic Mental Retardation: • Target =700 samples; (300 samples Flint/Knight, 400 CWRU samples); 291 complete

  14. Evidence for Novel Recurrent Microdeletion Syndromes from the Screening of ~291 Mental Retardation/MCA Cases 4 individuals with apparently identical microdeletion – never seen in >300 normals

  15. SD Critical Region SD CNPs 212.8 kb 489.9 kb 459.7 kb Intrachromosomal Duplication “Hub” >100 kb @ >98.5% identity Refinement of the Breakpoints of 17q21.3 Microdeletion • Customized oligonucleotide microarray (n=11,000) Seg Dups IMR103 Father 1.0 0.0 IMR103 Mother Log2 Relative Hyb intensity -1.0 1.0 0.0 IMR103 -1.0 1.0 0.0 -1.0 (1.5 STDEV)

  16. 17q21.3 Microdeletion is Recurrent. Normal Normal Affected IMR103 Affected IMR253 Affected IMR255 Affected IMR376 Normal Normal • - • 4/291 patients , estimate ~ 1% of mental retardation

  17. Clinical Phenotypes of 17q21.3 Microdeletion Patients IMR253-moderate dev. delay, sparse eyebrows, protruding tongue, low-set lop ears, large nose, seizures (4 yrs), markedly blonde hair IMR669-delayed speech, hypotonia, mild dysmorphic features, seizures & fits until 16 months, upslanting palpebral fissure prominent philtrum, bulbous nose, fair IMR376-severe learning difficulties, markedly hypotonia, hypopigmented, Mongolian slant, pale blue, almond shaped eyes, protruding tongue, extensible joints IMR669

  18. 1q21.1 Duplication-Mediated Microdeletion IMR43 Father Mother 2.4 Mb

  19. 15q24.1-24.2 Duplication-Mediated Microdeletion IMR349 Father Mother 4.7 Mb

  20. Variation in IMR • 291 IMR samples (Oxford Cohort) screened to date • 23 (n=31) novel sites of variation defined by >2 BACs • 5 are seen in more • than one unrelated patient • 7/9 events are de novo • Novel Genomic Disorder Candidates

  21. Objectives. • Discovery of Novel Genomic Disorders • Sequence-Based Resolution of Structural Variation

  22. Intermediate-Size Structural Variation (ISV) and Inversions Gene Type Freq. Locus Size Phenotype Dup GSTT1 Deletion 20% -/- 22q11.2 54.3 kb halothane/epoxide sensitivity 17kb/94% DEF3A-OR Inversion 26% -/+ 8p23 5 Mb heart disease susceptibility 400kb/98.9% EMD/FLN Inversion 33% -/+ Xq28 219 kb none 48kb/99% HERC2 susceptibility to Angelman syndrome Inversion 4% -/+ 15q11.2 3 Mb >300kb/99.8% IGVH26 Deletion/Dup 4-15% +/- 14q32.3 Variable immune response variation 91-97% toxin resistance, cancer susceptibility GSTM1 Deletion 50% -/- 1p13.3 18 kb 24kb/95.6% CYP2D6 1-29% +++ Duplication 22q13.1 5 kb Antidepressant drug resistance 5.4kb/91-97% CYP21A2 Duplication 1.6% +/- 6p21.3 35 kb Congenital adrenal hyperplasia 0 CYP2A6 Duplication 1.3% +/- 19q13.2 7 kb nicotine metabolism 24kb/96.2% SMN2 Duplication 50% +++/- 5q13 >100 kb SMA susceptibility 88.7/99.8% Adapted from Buckland, Ann Med

  23. Genome-wide Detection of Structural Variation (>8kb) by Fosmid End-Sequence Pairs b) a) Insertion >48 kb Putative Deletion < 32 kb Putative Insertion Deletion Inversion c) discordant by orientation (yellow/gold) discordant size (red) duplication track • Identified ~295 potential candidates, Deletions, Insertions & Inversions • enriched among environmental interaction genes Tuzun et al., 2005

  24. Fine-Scale Structural Variation Map: (build35 vs. Fosmids) • 1.3% Discordant Fosmids • Identify 295 clusters (2 or more) • 246 supported by second haplotype • 147 inserts, 93 deletions, 57 inverts • 18 putative L1 events—10 deletions • and 8 insertions (6 kb insertion) • 89 locate within gene regions. • 138 unique regions of the genome • 159 duplicated regions of the genome Insertion(Fosmid) Deletion Inversions “Heterochromatic” regions “Duplicated” regions

  25. Sequenced Structural Variation of APOBEC3B Primer B Primer A Allele #1 Breakpoint Structure Primer A Primer C Allele #2 Breakpoint Structure 1 2 3 4 5 6 7 8 9 10 GM15510 • 24.5 kb deletion eliminates most of APOBEC3B but creates fusion gene • Fusion APOBEC3A/3B= <1% frequency Africans, >35% Papua New Guineans

  26. Sequenced Structural Variation of DEFA1 Oligo ArrayCGH Log2 T/R Deletion Insertion http://humanparalogy.gs.washington.edu/structuralvariation >25 kb duplication DEFA1, additional exon 3

  27. Sequenced Structural Variation of LCE1E/D Oligo ArrayCGH Log2 Relative Signal Intensity • 9.2 kb deletion of LCE1D gene creates a fusion gene LCE1D/E • confirmed in 3 unrelated individuals by oligonucleotide microarray technology http://humanparalogy.gs.washington.edu/structuralvariation

  28. a) b) SIGLEC5A MEGF11 b35 b35 fosmid fosmid KCNJ16 c) d) LSP1 TNNT3 KCNJ2 b35 b35 fosmid fosmid e) f) b35 b35 fosmid GSST2 DDT GSST2 fosmid Sequencing Genic Structural Variation

  29. PCR Breakpoint Genotyping Assays for Structural Variation • Tested 11 structural variants (5 insertions, 4 deletions, and 2 inversions) • 7 successful assays (6 >20% minor allele frequency)

  30. Genotyping: Illumina Golden-Gate Assays for Binary Events Newman et al., 2006

  31. A Human Genome Structural Variation Initiative • 2 scientific meetings (2005) • 2 working groups (AHG, MSWG (12/05) • Coordinating Committee (1/06) • NIH Council (2/06) • Press Release (3/15/06) • Goal: Complete Characterization • of Structural Variation in • 48 HapMap Samples Japanese and Chinese Yoruba CEPH

  32. A Structural Variation Map of the Human Genome (<1 Mb) ABC9 Japanese ABC8 African ABC7 African G248 Hispanic Insertion(Fosmid) Deletion Inversions “Heterochromatic” regions “Duplicated” regions Putative Structural Variants from Four Individuals

  33. Summary • Genomic architecture approach: systematically identify • dynamic regions of structural variation >>>>phenotype • Large-Scale Variation • Normals: Identified 257 CNPs using a targeted • microarray to duplicated regions • IMR: Identified 31 sites (>2 BACs) unique to patients • (n=291 probands) (5 are recurrent and 3 are confirmed de novo) • Goal: Discovery of Novel Genomic Disorders • Fine-Scale Variation: Developed an approach to map and • sequence common fine-scale variation within the human • ~1000 differences > 8 kb from 4 individuals. • Goal: Disease association with common disease/susceptibility

  34. Acknowledgements Nimblegen Rebecca Selzer Peggy Eis MIT Steve McCarroll David Altshuler Eichler Lab Andy Sharp Devin Locke Sierra Hansen Sean McGrath Eray Tuzun Matthew Johnson Zhaoshi Jiang Jon Bleyhl Tera Newman Jeff Bailey Anne Morrison Lisa Pertz Ze Cheng Xinwei She James Sprague UCSF Dan Pinkel Donna Albertson UWGSC Maynard Olson Rajinder Kaul Hillary Hayden Eric Haugen CWRU/UChicago Stuart Schwartz Laurie Christ Agencourt Doug Smith Oxford Jonathan Flint Samantha Knight NHGRI Jim Mullikin AHG/MSWG UW Mark Rieder Debbie Nickerson

  35. b) Optimize insert size distribution Test sublibraries prior to Large-scale end sequence Select Library with low STDEV Mean: 39.4 Median: 39.5 Mode: 39.2 STDEV: 2.2 Capturing Smaller Variants (<8 kb) a) Select >2 STDEV Clones 96 clones Fingerprint MCD Select 32 variant clones Sequence

  36. The Missing Human Genome Fosmids Traverse Gaps • Closed 21 gaps, represents 207.5 kb of novel sequence (Genbank) Singleton Fosmids Extend into Gaps • Extended 55 gaps (69 clones) adding 1,152 kb of novel sequence Orphan Fosmid Contigs • 450 fosmids placed into 119 contigs/singletons tested by FISH • 22 contigs (1,296 kb) acro, 12 contigs (458 kb) pericentromeric, • 48 contigs (2,577 kb) subtelomeric • 32 contigs interstitial euchromatin (1,608 kb) (9 correspond to gaps)

  37. ……Finding Novel Human Sequence

  38. Fosmid Pairs that fail to Map to build35 • 1573 fosmid paired-end sequences fail to map to build 35. • 644 have 150 bp >Q30 at either end and have >100 bp unique seq • 565 of these have no hit to HTGS BAC sequence • Four independent restriction enzymes (EcoR I, Hind III, Bgl II and Nsi I ) • 26 Contigs (constructed from Composite Mutual Overlap Statistic (CMOS) and 94 singletons • Range in size from 208 kb to 38 kb (based on fingerprint data). • Do they represent human sequence?

  39. FISH Summary of Orphan Fosmids • 119 contigs/singletons tested by FISH • 22 contigs (1,296 kb) acro, 12 contigs (458 kb) pericentromeric, • 48 contigs (2,577 kb) subtelomeric • 32 interstitial euchromatin (9 corresponding to known gaps)

  40. Hybridization 2 R921 1.5 1 0.5 0 -0.5 -1 -1.5 D3767 1.5 1-3 5 10 15 20 4-5 1 6 0.5 Log2 Hybridization Relative Intensity Test/Reference 7-14 0 15 -0.5 16-20 -1 -1.5 1.5 0 5 10 15 20 R1080 1 0.5 0 -0.5 -1 -1.5 -2 0 5 10 15 20 BAC Probes

  41. Genomic Variation Forms of genetic variation. Sequence • Single base-pair changes – point mutations • Small insertions/deletions– frameshift, microsatellite, minisatellite • Mobile elements—retroelement insertions (300bp -10 kb in size) • Large-scale genomic variation (>10 kb) • Large-scale Deletions • Segmental Duplications • Chromosomal variation—translocations, inversions, fusions. Cytogenetics

More Related