330 likes | 479 Vues
Microarray Design and Analysis. Jeremy D. Glasner. Genetics 875 November 20, 2007. What is a Microarray?. A collection of DNA sequences arrayed on a solid substrate usually with thousands of individual DNA spots. Gene expression analysis. Massively parallel biochemistry aimed
E N D
Microarray Design and Analysis Jeremy D. Glasner Genetics 875 November 20, 2007
What is a Microarray? A collection of DNA sequences arrayed on a solid substrate usually with thousands of individual DNA spots
Gene expression analysis Massively parallel biochemistry aimed at measuring RNA levels
Why do gene expression analysis? • Predict new gene functions by expression patterns • Determine the effect of a drug on gene expression • Compare a mutant strain to wild-type • Identify an expression signature for a cancer type • Find the targets of a transcriptional regulator • Measure the half-lives of RNAs in the cell . . .
TraSH: transposon site hybridization Genome wide localization of insertion mutations Sassetti CM, Boyd DH, Rubin EJ. Proc Natl Acad Sci U S A. 2001 98(22):12712-7
CGH: Comparative genome hybridization Rajashekara G, Glasner JD, Glover DA, Splitter GA. J Bacteriol. 2004 186(15):5040-51.
ChIP-Chip: Chromatin immunoprecipitation, chip hybridization a.k.a. genome-wide occupancy profiling Identify the chromosomal locations of a DNA binding protein
Flow of Information in Array Analyses Experimental Design Array Production Sample Preparation Scanning Image Analysis Data Processing Data Analysis Information Integration
Experimental Design Issues Number of Replicates What samples should be compared? Directly on same chip/across arrays? Calibrators and common references What controls are necessary?
Considerations when designing the sequences for a chip Sequence Annotation ORF, UTR, functional RNA prediction Oligo Selection Array Design, Replication
Sensitivity vs. specificity as a function of oligo length They are inversely related Hughes TR, et al., Nat Biotechnol 2001. Apr;19(4):342-7.
Two methods for array production photolithography spotting
Affymetrix Chips 105-106 “Probes” Perfect Match and Mismatch Average Difference Values
Affymetrix “Units” PM MM A “probe set” A “probe pair”
DMD “The Digital Light Switch” DMD Close-Up • Mirrors spacing 17 um • Mirror transit time <20 us • Tilt angle +10 degrees • Five mirrors = diameter human hair • Analog pictures from digital switches?
Sample Preparation RNA samples are extracted for the experiment and fluorescent dyes are incorporated RNA stabilization Direct vs. indirect labeling
Hybridization & Scanning PMT settings Lasers Focus Data Tracking
Image Analysis Spot Finding Background subtraction Intensity Calculation
Automatic Grid Finding Sum signal intensities in X and Y directions
Estimating Foreground and Background with the “Fixed Circle” Method
Estimating Foreground and Background with the “Histogram” Method # pixels Intensity
Quality Filtering of Data From Tseng et al., 2001. NAR 29(12):2549-2557
Data Normalization Data Normalization is necessary if the overall signal differs between experiments and can be complicated if the relationship is nonlinear. Internal controls can also be used for normalization. Data from Schadt et al 2001. Journal of Cellular Biochemistry Supplement 37:120-125.
Normalization Methods Assume linear relationship Apply non-linear normalization Normalize to “house-keeping genes” Normalize to internal Standards
Detecting differential expression Determine which changes are significant: Fixed cutoff (fold-change>4) Replication allows assessment of variability Common statistics such as the t-test are often used for gene expression data. Significance of the value is then determined by referring to the t distribution. This assumes that the data is normally distributed, which may not be true. Gene expression experiments may require thousands of statistical tests and significance should be adjusted to reflect this. A standard Bonferroni correction is the p-value multiplied by the number of tests but is likely too conservative.
Different methods, different results Millenaar et al., BMC Bioinformatics2006, 7:137