1 / 82

Array-based Comparative Genomic Hybridization

Array-based Comparative Genomic Hybridization. Bastien JOB 2010-10-19. Structural Genomics Sequence variations (CGHa, SNPa, DNAseq, mutations…). Fonctional Genomics Gene expression / splicing… (GEa, Q-PCR, RNAseq… ). Proteomics (Antibody arrays, 2D EP +MS/MS, HPLC+MS / MS, … ). Genome.

cana
Télécharger la présentation

Array-based Comparative Genomic Hybridization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Array-based Comparative Genomic Hybridization Bastien JOB 2010-10-19

  2. Structural Genomics Sequence variations (CGHa, SNPa, DNAseq, mutations…) Fonctional Genomics Gene expression / splicing… (GEa, Q-PCR, RNAseq…) Proteomics (Antibody arrays, 2D EP +MS/MS, HPLC+MS / MS, …) Genome Transcriptome Proteome DNA: gene RNA Post-trad modification mRNA: transcript protein Intron Transcription Translation Exon Splicing, editing miRNA Nucleus Promotor, regulating seq Cell Membrane

  3. History and context Technical principle, classical designs Description of oligo CGH arrays Data preprocessing Bioinformatic analysis Cross-technology correlation

  4. History and context CGH arrayis a methodaimingat the identification of the variation in number of the genomic content of a test sample, by comparison to a referencesample, using an array of (at least) thousands of measure points on the genome. A bit of history of cytogenomics • [196x] : Karyotyping • [1993] : Spectral karyotyping (SKY) • [199x] : CGH (comparative genomichybridization) on chromosomes • [200x] : cDNA-based and BAC-based CGH array • [2005] : oligo-based CGH array In cancer : • The profiling of the patterns defined by thesealterations for a patient or a pathology. • Explore for the association betweensome of these patterns and clinical annotations. Other uses : • Developmentabnormalities, autism, diabetes, inter-individualsCNVs (HapMapproject), ... It’s an establishedmethod in the cancer researchfield, in establishment for the diagnostic field.

  5. 196x : Karyotype 1993 : SKY 199x : CGH on chr 200x : cDNA/BAC-based CGH array 2005 : Oligo-based CGH array

  6. Rearrangements in tumors creating fusion genes

  7. Rearrangements in tumors altering gene regulation MYC – IgH translocation in Burkitt lymphoma IMAGE CREDIT: Gregory Schuler, NCBI, NIH, Bethesda, MD, USA Also a common fusion in prostate cancer (Tomlins et al., Science 2005)

  8. Chromosomal amplifications EGFR amplification in lung cancer as HSR (homogeneously stained region) EGFR amplification in lung cancer as several double minutes Varella-Garcia et al, J Clin Pathol 2009

  9. Common alterations across tumorsand pathologies • Mutations activating / repressingpathways • Breakpointscreating duplications / amplifications / deletions / fusions • Known « master genes » like TP53, PTEN, CDKN2A/B, MYC, EGFR, FGF, …, • Some are tissue-specific, others more widelyspread Duplicated genes Deleted genes activation repression

  10. History and context Technical principle, classical designs Description of oligo CGH arrays Data preprocessing Bioinformatic analysis Cross-technology correlation

  11. Technical principle (dual color)

  12. Designs (dual color) • For dual-channel CGHarray, most of the time : Test sample DNA (tumor) Cy5 -vs- Reference DNA (normal) Cy3 • Mainly use of a sex-matched commercial normal DNA as reference • Sex-matched: anomalies on gonosomes • « outside » reference : polymorphisms (CNV, « copy number variations ») • More rarely (cancer field) : using the same person’s normal DNA • No polymorphism • Same origin ≈ same preparation • Some difficulties for blood DNA extraction • Use of a « stable » cell-line with a complete ploidy as a reference (ex: Coriell NA10851) • More complex designs can be performed (circular, …)

  13. T (R) CGH array simplified process on the platform : From sample to analysis Fragmentation & labelling DNA extraction Hybridization Samples Qualification & quantitation oligo microarray Bioinformatic analyses Segmentation & visualization Scan, signals acquisition & normalization

  14. History and context Technical principle, classical designs Description of oligo CGH arrays Data preprocessing Bioinformatic analysis Cross-technology correlation

  15. Long oligo Agilent CGHarrays G2 : 244 K Agilent oligoarray Spots : 60µm (@ 5µm/px) Spots : 30µm @ 2µm/px G3 : 4 x 180 K Agilent oligoarray

  16. Available formats (for Human) 2ndgeneration • 4 x 44K • 2 x 105K • 1 x 244 K • 3rd generation (current) • 8 x 60K • 4 x 180K • 2 x 400 K • 1 x 1M • Most formats alsoavailable for mouse and rat • Possibility to design one’sown custom array for any format

  17. Long oligo NimbleGen arrays

  18. Short oligo Affymetrix SNP 6.0 array 4x 906,600 SNP probes 945,826 CN probes * • 25-mer oligos • ~700b averageinterval • ~2 Kb real CN interval * ~200,000 CNVs

  19. Illumina Infinium BeadChips

  20. History and context Technical principle, classical designs Description of oligo CGH arrays Data preprocessing Bioinformatic analysis Cross-technology correlation

  21. Simplified bioinformatics analysis pipeline Genomic profile Segmentation Signals acquisition Quality controls Normalization CBS Feature Extraction v10.x Description of the population Identification of genomic regions of interest Describing genomic contents Public databases + Clinical Annotations R, aCGH STAC

  22. SIGNALS ACQUISITION

  23. Spot position identification • by 2D intensityhistograms • By a circle (fixed / variable diameter) • Adaptative segmentation by randomseed propagation Credits : Pierre NEUVIAL (ENSAE) Currentoligogeneration : perfect disc-shaped spots.

  24. Spot extraction • Twomethods : • Intensity segmentation • Isolation of real signal from a local background • Needed for bothsignals • Needs a background correction method • Then a ratio canbecomputed • Linearregression (Novikov, 2004) • (1) First linearregression on all intensities • (2) Identification of outliers • (3) Sequentialremoving of outliers pixels • (4) Unbiasedlinearregression on kept pixels • Can onlybeusedwhen background isfairlylow and homogeneous. • The ratio isdirectlyextracted as the slope. (2, 3) (1) (4) Credits : Pierre NEUVIAL

  25. ARRAY QUALITY CONTROLS

  26. Scans visualization

  27. Array quality controls (from Agilent) General information and some parameters Grid positioning check Control of channels (signal, background, …) Control of outliers (number and position) Control of intensity distributions Control of the randomness of signals

  28. QC : Spatial homogeneity controls Spatial representation of signals, background, log2(ratio), p-value, errors (…) Distribution of signals and log2(ratio)

  29. Spatial Homogeneity (the bubbly one)

  30. NORMALIZATION Why ? Some biasescanberemovedby specific algorithms

  31. Spatial biases Intensity gradients Block effects Print-tip bias Local bias Most of thesebiases are linked to spottedarrays

  32. Spatial biases correction (example) Credits : Pierre NEUVIAL

  33. Dye biases : intensities

  34. Dye biases : impact on log2(ratio)

  35. GC% biases

  36. GC% biases

  37. Dye + GC (step 1)

  38. Dye + GC (step 2)

  39. Dye + GC (step 3)

  40. CENTRALIZATION Why ? Data generated by thismethodare relativevalues (ratio of a test versus a reference) : we are lacking information about « real » normalitylevel.

  41. Centralization : an obvious example Identifying the most probable normal genomic level is easy here, as we have a main central peak. Frequency Ratio Log2(ratio) Chromosomes

  42. Centralization : a cancer example It’s much more difficult here, to the higher complexity of the distribution / profile… Frequency Ratio Log2(ratio) Chromosomes

  43. Centralization

  44. Centralization : simplification of the distribution

  45. Centralization : Comparing to the center of the distribution

  46. Centralization : Comparing peaks height

  47. GENOMIC PROFILE VISUALIZATION& DATA SEGMENTATION Why segmenting ? Data reduction : The data obtained are a list of hundreds of thousands of values. However, a genomic profile can be simplified to a limited list of segments considered as abnormal.

  48. A normalized, centered, segmented genomic profile with called aberrations Example taken from a breast cancer profile

  49. Challenge : identifying breakpoints • Data consist in a continuous log2(ratio) distribution • Two main difficults : • Localizationof breakpointsisunknown by default • Neithertheirquantity • Twogeneralmodels : • Homoscedastic (m) • Heteroscedastic (m, V)

  50. Several segmentation methods available • Initial methods Median smoothing EM mixture clustering • « Newer », wellknownmethods HMM/EM CBS

More Related