1 / 39

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

A Quantitative Overview to Gene Expression Profiling in Animal Genetics. Analysis of (cDNA) Microarray Data : Part I. Sources of Bias and Normalisation. Armidale Animal Breeding Summer Course, UNE, Feb. 2006. A Quantitative Overview to Gene Expression Profiling in Animal Genetics.

ila-pate
Télécharger la présentation

A Quantitative Overview to Gene Expression Profiling in Animal Genetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Analysis of (cDNA) Microarray Data: Part I. Sources of Bias and Normalisation Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  2. A Quantitative Overview to Gene Expression Profiling in Animal Genetics MICROARRAY ANALYSIS My (Educated?) View • Data included in GEXEX • Whole data stored and “securely” available • GP3xCLI on each hybridisation • Relaxed data acquisition criteria • Signal to Noise > 1.00 (relaxer (sp?) exist) • Mean to Median > 0.85 (Tran et al. 2002) • Data Normalisation • Mixed-Model Equations • Check Residuals (plot Residuals vs Predicted) • Check REML estimates of Variance Components • Proportion of Total Variance due to Gene x Variety • Process Gene x Treatment BLUPs  Differentially Expressed Genes • t-statistics  Z-score  P-value • Mixtures of Distributions  Posterior Probabilities • Process Differentially Expressed genes • Hierarchical clustering • Gene ontology analysis Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  3. A Quantitative Overview to Gene Expression Profiling in Animal Genetics MICROARRAY ANALYSIS BASIC PIECES FOR SIGNAL DETECTION • Foreground RED and GREENRfGf • Background RED and GREENRbGb • Background-corrected REDR = Rf – Rb • GREENG = Gf – Gb • Log-transformed Log2(R) • Log2(G) • Difference: “Minus” M = Log2(R) – Log2(G) = Log2(R/G) • Mean: “Average” A = 0.5 * ( Log2(R) + Log2(G) ) = 0.5 * Log2(R*G) • MA-Plots …to come True Signals! Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  4. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Acquisition Criteria The Red/Green Intensities can be spatially biased Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  5. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Acquisition Criteria The Red/Green Intensities can be intensity-biased MA-Plot Values should scatter around zero Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  6. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Acquisition Criteria Background Correction: Why bother? Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  7. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Acquisition Criteria Background Correction: Why bother? Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  8. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Acquisition Criteria REDversusGREEN Log-transformation: Why bother? Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  9. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Acquisition Criteria MA-Plots: All versus only valid signals Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  10. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Acquisition Criteria Mean to Median Correlation Signal to Noise Ratio Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  11. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Normalisation http://genome-www5.stanford.edu/mged/normalization.html • Normalisation is an attempt to correct for systematic bias. • Normalisation allows you to compare data from one array to another. • Systematic Bias can be introduced into microarray experiments at all stages. • Need to: • Avoid it (as much as possible) • Recognize it • Correct for it • Discard unrecoverable data • In practice we do not always understand the data - inevitably some biology will be removed too (or at least not revealed). Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  12. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Different amounts of starting material. Differential labeling efficiency of dyes Different amounts of RNA in each channel Differential efficiency of scanning in each channel. Differential efficiency of hybridization over slide surface. Data Normalisation Source: Catherine Ball (Stanford) Pool of Cell Lines Tumor Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  13. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Systematic Bias Sources … …and Dealing with it • Different labeling efficiencies or dye effects • Scanner malfunction • Differences in concentration of DNA on arrays (plate effects) • Printing or tip problems • Uneven hybridization • Batch bias • Experimenter issues • Detect and recognize the effect Note something odd • Determine magnitude and effect on data  Try a few methods • Identify source of bias  Think big! • Eliminate or reduce contributing factors • Correct data • Discard uncorrectable data Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  14. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Systematic Bias Labeling Efficiencies Cause Bias • One channel of a two-channel array has higher intensity than the other (usually GREEN). • Most common source of recognizable bias. • Solution: Most easy to addressed (eg. dye-swaps, balanced loops). Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  15. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Systematic Bias Scanning (operator?) Bias • Mis-aligned lasers can cause big problems • In this case, the two channels are slightly out of register • Solution: fix the scanner and repeat Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  16. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Systematic Bias Printing (operator?) Bias • Irregular shaped spots are often observed (printing error) • Slides from the same printing batch cluster together • Solution: Probably limited to better printing technique and image analysis, rather than normalization Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  17. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Systematic Bias Probe Bias • Different concentrations of probes might produce patterns in arrays • Biological role of probes can produce patterns in arrays • These patterns can create a spatial bias that are not artificial, but biological Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  18. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Systematic Bias Probe Bias Coding regions • Probes arranged on the array based on biological function cause spatial bias • Solution: avoid arranging reporters based on function, know your experimental design Intergenic regions Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  19. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Systematic Bias Hybridisation (operator?) Bias • Poor technique during hybridisation can cause a spatial bias • Operator is one of the largest sources of systematic bias • Experiments done by the same operator often cluster together more tightly than warranted by the biology • Solution: Consistent methods, successful techniques Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  20. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Normalisation …and other beautifying techniques Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  21. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Normalisation Transformation …to near normality Solution: Explore the entire Box-Cox family of power transformations: Maximum at λ 0, hence use the log-transformation Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  22. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Normalisation Transformation …to near normality Log2 Transformed …normal-like Raw Data …exponential-like Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  23. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Normalisation Transformation …to near normality Lin-Log Transformation x = background corrected = Fg - Bg Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  24. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Normalisation Transformation …to near normality • The Edwards’ transformation as well as the Lin-Log transformation are an attempt to use the entire data, not only those for which foreground is greater than background. • The reasoning is that errors are linear and multiplicative for small and large signals, respectively. • The search for and choice of  could be rather unconvincing (eg. Different for different array slides). • Solution: Use Log2 if Foreground > Background • Otherwise, use a small arbitrary value (say 0), • Or simply disregard. • Alternatively: Use only Foreground and Log2 it Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  25. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Location Normalisation Log2(R/G) – c = M - c Location Parameter GLOBAL: Mean: c = Mean of M’s Median: c = Median of M’s LOWESS: c = Weighted Regress of M on A  Assumption: Changes roughly symmetric around Mean or Median  Assumption: Changes roughly symmetric at all intensities LOCAL: LOWESS: c = c(i) = Weighted Regression of M on A within print-tip-group i LOWESS = Locally WEighted Regression and Smoothing Scatterplots Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  26. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Location Normalisation LOWESS = Locally WEighted Regression and Smoothing Scatterplots Source: G Rosa 2003. Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  27. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Location Normalisation LOWESS = Locally WEighted Regression and Smoothing Scatterplots SAS Code Source: G Rosa 2003. Genetic analysis of complex traits using SAS ISBN 1-59047-507-0 Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  28. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Location Normalisation LOWESS = Locally WEighted Regression and Smoothing Scatterplots Normalised Intensities Source: G Rosa 2003. Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  29. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Location Normalisation LOWESS = Locally WEighted Regression and Smoothing Scatterplots Source: G Rosa 2003. Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  30. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Location Normalisation None Source: Yang et al 2002 Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  31. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Location Normalisation After Global Median Source: Yang et al 2002 Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  32. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Location Normalisation Global Lowess Source: Yang et al 2002 Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  33. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Location Normalisation Print-in-Group Lowess Source: Yang et al 2002 Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  34. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Location Normalisation After Print-in-Group Lowess Source: Yang et al 2002 Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  35. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Location Normalisation Additional Assumption (other than symmetry of changes): The proportion of genes that are Differentially Expressed (DE) is minimal Question: Which genes to use? Answer: Only the ones (housekeeping) that we know are not DE Comment: “Boutique” arrays become a nuisance Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  36. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Scale Normalisation (Standardisation) “Some scale adjustments may be required so that the relative expression levels from one particular experiment (slide) do not dominate the average relative expression levels across replicate experiments.” Yang et al 2002 Log2(R/G) – c(i) a(i) Notes: 1. The scaling a(i) is such that Var(M) = a(i)22 2. The estimation requires an approximation (“robust”) to the geometric mean: where MAD is the Median Absolute Deviation. 3. It doesn’t get any more heuristic (funnier?) than this Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  37. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Normalisation …and other beautifying techniques • Notes: • Except Log2, everything else applies only to Ratios: M = log2(R/G) • Except Log2, everything else applies only within slide • Everything is beautified to identify DE genes straight from MA-plot, either from a single slide or from a function of M’s across slides. • The uncertainty in measurements increases as intensity decreases • Measurements close to the detection limit are the most uncertain (cf. Sensitivity) • Fold-change measurements ignore these effects • We can calculate an intensity-dependent z-score that measures the ratio relative to the standard deviation in the data Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  38. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Data Normalisation …and other beautifying techniques Locally estimated standard deviation of positive ratios Z= 5 Z= 2 Corrected Log10 ( Ratio ) 2-fold Z= 1 2-fold Z= 1 Corrected Log10 ( Ratio ) 2-fold 2-fold Z= -1 Z= -1 Z= -2 Z= -5 Locally estimated standard deviation of negative ratios Z= -5 Mean ( Log10 ( Intensity ) ) Mean ( Log10 ( Intensity ) ) Z > 2 is at the ~ 95% confidence level Local Log10 ( Ratio ) Z-Score Z= 5 Mean ( Log10 ( Intensity ) ) Source: J Pevsner 2004 Armidale Animal Breeding Summer Course, UNE, Feb. 2006

  39. A Quantitative Overview to Gene Expression Profiling in Animal Genetics Normalisation: References Bilban M, Buehler LK, Head S, Desoye G, Quaranta V. Normalizing DNA microarray data. Curr Issues Mol Biol. 2002 Apr;4(2):57-64.Durbin BP, Hardin JS, Hawkins DM, Rocke DM. A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics. 2002 Jul;18 Suppl 1:S105-10.Kepler TB, Crosby L, Morgan KT. Normalization and analysis of DNA microarray data by self-consistency and local regression. Genome Biol. 2002 Jun 28;3(7):RESEARCH0037.Schuchhardt, J., D. Beule, et al. Normalization Strategies for cDNA Microarrays. NAR 2000 28(10): E47-e47.Tran PH, Peiffer DA, Shin Y, Meek LM, Brody JP, Cho KW. Microarray optimizations: increasing spot accuracy and automated identification of true microarray signals. Nucleic Acids Res. 2002 Jun 15;30(12):e54.Tseng GC, Oh MK, Rohlin L, Liao JC, Wong WH. Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 2001 Jun 15;29(12):2549-57.Tsodikov A, Szabo A, Jones D. Adjustments and measures of differential expression for microarray data. Bioinformatics. 2002 Feb;18(2):251-60.Yang MC, Ruan QG, Yang JJ, Eckenrode S, Wu S, McIndoe RA, She JX. A statistical method for flagging weak spots improves normalization and ratio estimates in microarrays. Physiol Genomics. 2001 Oct 10;7(1):45-53.Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002 Feb 15;30(4):e15. Armidale Animal Breeding Summer Course, UNE, Feb. 2006

More Related