1 / 102

Introduction to microarray technology and analysis

Introduction to microarray technology and analysis. Carol Bult Associate Professor The Jackson Laboratory carol.bult@jax.org. Measuring Gene Expression.

olesia
Télécharger la présentation

Introduction to microarray technology and analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to microarray technology and analysis Carol Bult Associate Professor The Jackson Laboratory carol.bult@jax.org

  2. Measuring Gene Expression Idea: measure the amount ofmRNAto see whichgenesare beingexpressedin (used by) the cell. Measuringproteinmight be more direct, but is currently harder.

  3. Central Assumption of Gene Expression Microarrays • The level of a given mRNA is positively correlated with the expression of the associated protein. • Higher mRNA levels mean higher protein expression, lower mRNA means lower protein expression • Other factors: • Protein degradation, mRNA degradation, polyadenylation, codon preference, translation rates, alternative splicing, translation lag…

  4. Principal Uses of Microarrays • Genome-scale gene expression analysis • Differential gene expression between two (or more) sample types • Responses to environmental factors • Disease processes (e.g. cancer) • Effects of drugs • Identification of genes associated with clinical outcomes (e.g. survival)

  5. Microarray example: Biomarker identification - lung cancer Samples Genes Garber, Troyanskaya et al. Diversity of gene expression in adenocarcinoma of the lung. PNAS 2001, 98(24):13784-9.

  6. Data partitioning clinically important: Patient survival for lung cancer subgroups 1 Cum. Survival (Group 1) .8 Cum. Survival (Group 2) Cum. Survival (Group 3) .6 Cum. Survival .4 .2 0 0 10 20 30 40 50 60 Time (months) p = 0.002 for Gr. 1 vs. Gr. 3 Garber, Troyanskaya et al. Diversity of gene expression in adenocarcinoma of the lung. PNAS 2001, 98(24):13784-9.

  7. Biological question Differentially expressed genes Sample class prediction etc. Experimental design Microarray experiment Image analysis Normalization Estimation Testing Clustering Discrimination Biological verification and interpretation

  8. Technology basics Microarrays are composed of short, specific DNA sequences attached to a glass or silicon slide at high density A microarray works by exploiting the ability of an mRNA molecule to bind specifically to, or hybridize, the DNA template from which it originated RNA or DNA from the sample of interest is fluorescently-labeled so that relative or absolute abundances can be quantitatively measured

  9. Two color vs single color Bakel and Holstege. 2007. http://www.cell-press.com/misc/page?page=ETBR

  10. Other applications of microarray technology (besides measuring gene expression) DNA copy number analysis SNP analysis chIP-chip (interaction data) Competitive growth assays …

  11. Major technologies cDNA probes (> 200 nt), usually produced by PCR, attached to either nylon or glass supports Oligonucleotides (25-80 nt) attached to glass support Oligonucleotides (25-30 nt) synthesized in situ on silica wafers (Affymetrix) Probes attached to tagged beads

  12. Probe selection Non-redundant set of probes Includes genes of interest to project Corresponds to physically available clones Chip layout Grouping of probes by function Correspondence between wells in microtiter plates and spots on the chip cDNA Microarray Design

  13. Building the chip Ngai Lab arrayer , UC Berkeley Print-tip head

  14. http://transcriptome.ens.fr/sgdb/presentation/principle.php

  15. Example dual channel cDNA array results

  16. Probes are oligos synthesized in situ using a photolithographic approach There are at least 5 oligos per cDNA, plus an equal number of negative controls The apparatus requires a fluidics station for hybridization and a special scanner Only a single fluorochrome is used per hybridization Affymetrix GeneChips

  17. http://genome.ucsc.edu/cgi-bin/hgTracks

  18. Affy There may be 5,000-100,000 probe sets per chip A probe set = 11-20 PM, MM pairs

  19. http://www.weizmann.ac.il/home/ligivol/pictures/system.jpg

  20. Interpreting Affymetrix OutputPerfect Match/Mismatch Strategy Each probe designed to be perfectly complementary to a target sequence, a partner probe is generated that is identical except for a single base mismatch in its center. These probe pairs, called the Perfect Match probe (PM) and the Mismatch probe (MM), allow the quantitation and subtraction of signals caused by non-specific cross-hybridization. The difference in hybridization signals between the partners serve as indicators of specific target abundance

  21. Biological question Differentially expressed genes Sample class prediction etc. Experimental design Microarray experiment Image analysis Normalization Estimation Testing Clustering Discrimination Biological verification and interpretation

  22. Experimental Design Bakel and Holstege. 2007. http://www.cell-press.com/misc/page?page=ETBR

  23. - Donald Rumsfeld, former Secretary of Defense Microarray Analysis: Controlling for the Known Knowns and Unknown Unknowns

  24. http://www.bioconductor.org/workshops/2003/NGFN03/experimental-design.pdfhttp://www.bioconductor.org/workshops/2003/NGFN03/experimental-design.pdf

  25. Selected references • http://discover.nci.nih.gov/microarrayAnalysis/Experimental.Design.jsp Best advice? Consult a statistician before you start!

  26. Statistical Power • The probability that a test will reject a null hypothesis if it is false • Type I and Type II errors • Type 1 – fail to accept the null hypothesis • We say there is a difference in gene expression between gene A and gene B when there really isn’t • Type 2- fail to reject the null hypothesis • We say there is no difference in gene expression between gene A and gene B when there actually is!

  27. Power in Perspective What are the 4 main components that determine what conclusions are drawn from a study? • Sample size • Number of units • Effect size • Signal to noise • Alpha level • Significance level • Power • Likelihood of detecting a treatment effect if it is there

  28. Check out this pithy description of Statistical Power and Hypothesis Testing • http://www.socialresearchmethods.net/kb/power.php

  29. MicroArray Image Analysis Based on slides from Robin Liechti (robin.liechti@ie-bpv.unil.ch)

  30. Microarray analysis • Array construction, hybridisation, scanning • Quantitation of fluorescence signals • Data visualisation • Meta-analysis (clustering) • More visualisation

  31. pseudo-colourimage sample(labelled) probe (on chip) [image from Jeremy Buhler] Technical

  32. Experimental design • Track what’s on the chip • which spot corresponds to which gene • Duplicate experimental spots • reproducibility • Controls • DNAs spotted on glass • positive probe (induced or repressed) • negative probe (bacterial genes on human chip) • oligos on glass or synthesised on chip (Affymetrix) • point mutants (hybridisation plus/minus)

  33. Images from scanner • Resolution • standard 10m [currently, max 5m] • 100m spot on chip = 10 pixels in diameter • Image format • TIFF (tagged image file format) 16 bit (65’536 levels of grey) • 1cm x 1cm image at 16 bit = 2Mb (uncompressed) • other formats exist e.g.. SCN (used at Stanford University) • Separate image for each fluorescent sample • channel 1, channel 2, etc.

  34. Images in analysis software • The two 16-bit images (cy3, cy5) are compressed into 8-bit images • Goal : display fluorescence intensities for both wavelengths using a 24-bit RGB overlay image • RGB image : • Blue values (B) are set to 0 • Red values (R) are used for cy5 intensities • Green values (G) are used for cy3 intensities • Qualitative representation of results

  35. Pseudo-color overlay cy3 cy5 Images : examples

  36. Processing of images • Addressing or gridding • Assigning coordinates to each of the spots • Segmentation • Classification of pixels either as foreground or as background • Intensity extraction (for each spot) • Foreground fluorescence intensity pairs (R, G) • Background intensities • Quality measures

  37. File or archive your e-mail on your own computer

  38. Parameters to address the spots positions • Separation between rows and columns of grids • Individual translation of grids • Separation between rows and columns of spots within each grid • Small individual translation of spots • Overall position of the array in the image ScanAlyze Addressing (I) • The basic structure of the images is known (determined by the arrayer)

  39. Addressing (II) • The measurement process depends on the addressing procedure • Addressing efficiency can be enhanced by allowing user intervention (slow!) • Most software systems now provide for both manual and automatic gridding procedures

  40. Segmentation (I) • Classification of pixels as foreground or background -> fluorescence intensities are calculated for each spot as measure of transcript abundance • Production of a spot mask : set of foreground pixels for each spot

  41. Segmentation (II) • Segmentation methods : • Fixed circle segmentation • Adaptive circle segmentation • Adaptive shape segmentation • Histogram segmentation

  42. Bad example ! Fixed circle segmentation • Fits a circle with a constant diameter to all spots in the image • Easy to implement • The spots need to be of the same shape and size

  43. Dapple finds spots by detecting edges of spots (second derivative) Adaptive circle segmentation • The circle diameter is estimated separately for each spot • Problematic if spot exhibits oval shapes

  44. Adaptive shape segmentation • Specification of starting points or seeds • Regions grow outwards from the seed points preferentially according to the difference between a pixel’s value and the running mean of values in an adjoining region.

  45. Bkgd Foreground Histogram segmentation • Uses a target mask chosen to be larger than any other spot • Foreground and background intensity are determined from the histogram of pixel values for pixels within the masked area • Example : QuantArray • Background : mean between 5th and 20th percentile • Foreground : mean between 80th and 95th percentile • Unstable when a large target mask is set to compensate for variation in spot size

More Related