1 / 142

Analysis of DNA Microarray Data: Sensitivity, Specificity, and Other Real-World Issues

Analysis of DNA Microarray Data: Sensitivity, Specificity, and Other Real-World Issues. 1. Definitions and basic considerations. DNA microarrays. Major advantage Simultaneous measurement of level of expression for nearly all transcribed genes within given cell or tissue Major disadvantage

edana
Télécharger la présentation

Analysis of DNA Microarray Data: Sensitivity, Specificity, and Other Real-World Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of DNA Microarray Data: Sensitivity, Specificity, and Other Real-World Issues

  2. 1. Definitions and basic considerations

  3. DNA microarrays • Major advantage • Simultaneous measurement of level of expression for nearly all transcribed genes within given cell or tissue • Major disadvantage • Cost

  4. Therefore, to get the most bang for the buck, it is imperative to understand the role of uncertainty in measurement…

  5. Categorical tests (yes/no, based upon threshold) • Gene arrays • Is gene expressed or not? • Is gene differentially expressed under two different experimental conditions? • Medical tests • Does patient have disease or not?

  6. Key concepts for categorical tests • Specificity • true negative rate • 1 – FPR (false positive rate) • Sensitivity • TPR (true positive rate)

  7. Specificity provides the answer to questions like… • What fraction of patients who are disease-free are correctly classified as disease-free? • What fraction of genes that are not differentially expressed are correctly classified as being non-differentially expressed?

  8. Specificity • Specificity is defined as true negative rate • Probability that disease-free patient will be correctly categorized as disease-free • False positive rate (FPR) = 1 – specificity • Probability that disease-free patient will be incorrectly categorized as having disease

  9. Sensitivity and specificity deal with distinct sets of patients or genes • Specificity • Healthy patients lacking the disease • Non-expressed genes • Non-differentially expressed genes • Sensitivity • Sick patients having the disease • Expressed genes • Differentially expressed genes

  10. Sensitivity provides the answer to questions like… • What fraction of patients who have a given disease are correctly classified as diseased? • What fraction of genes that are differentially expressed are correctly classified as being differentially expressed?

  11. Sensitivity • Sensitivity is defined as true positive rate • Probability that diseased patient will be correctly categorized as having the disease

  12. Yin and yang of sensitivity and specificity • Improving specificity always worsens sensitivity • Improving sensitivity always worsens specificity

  13. Since when is the world ever ideal?

  14. If we choose a threshold l of 1.5, then...

  15. And if we choose a threshold l of 0.5, then...

  16. 2. Sources of uncertainty in categorical tests

  17. SMEASURE = measured signal STRUE = true signal N = noise (error)

  18. Noise-to-Signal (N:S) Ratio • N : S << 1 • reliable and trustworthy measurement • N ~ S • unreliable measurement • N > S • highly unreliable measurement

  19. Sources of uncertainty in categorical measurements • Measurement uncertainty • SMEASURE does not necessarily equal STRUE • N ~ S or N > S • “Overlap” uncertainty • Some patients with disease truly have positive test values • Some patients without disease truly have negative test values

  20. Gene arrays and medical tests have distinct and different sources of uncertainty

  21. Variability in medical tests is mostly “overlap” • Measurement variability • Essentially none (error is of no clinical significance) • N : S << 1 • Hence, perform test once and only once • “Overlap” variability • Ubiquitous and essentially unavoidable • Feature of all medical tests to one degree or another • So what’s the solution? • Search for a better test

  22. Variability in DNA microarrays is mostly measurement uncertainty • Measurement variability • Ever-present • N > S for many genes • “Overlap” variability • None • Absent gene has expression level of zero, whereas present gene has expression level of non-zero • Differentially expressed gene… • So what’s the solution? • Repeated measurements

  23. So how do we improve the N:S ratio?

  24. Take mean of repeated measurements...

  25. Benefits of repeated measurements • Assuming that noise N has a normal (Gaussian) distribution, then the error decreases with square root of number n of measurements • Example: to reduce N : S by half, take mean of 4 measurements

  26. 3. Measurements using Affymetrix (MSV 5.0)

  27. Affymetrix Microsoft Suite Version 5.0 (MSV 5.0)

  28. For our analysis, we used...

  29. Signal Log Ratio (SLR) • SLR = logarithm to base 2 of the ratio of the signal for gene under experimental condition A (SA1) to that for the same gene under experimental condition N (SN1)

  30. Examples of SLR SA1 = 4000 SA1 = 2 SN1 = 1000 SN1 = 16 SLR = log2 (4) = 2 SLR =log2 (1/8) = –3

  31. 4. Specificity of MSV 5.0

  32. To get a handle on specificity, perform same-versus-same comparisons • SLRTRUE must be zero • log2 (1) = 0 • Hence, SLRMEASURE is all noise

  33. Perform separate analyses for “present” and “absent” genes • Present genes • N : S << 1 • Absent genes • N : S ≥ 1

  34. Experimental system • Primary cultures of peritoneal macrophages from mice of 3 strains • BALB/c (normal) • MRL/+ (autoimmune lupus) • MRL/lpr (autoimmune lupus) • Each array represents mRNA pooled from distinct sets of ~ 6 mice harvested on separate days • Macrophages were stimulated with bacterial endotoxin (lipopolysaccharide, LPS) for 8 or 24 hours

  35. Present genes:same-vs.-same comparison (single array)

  36. Present genes:same-vs.-same comparison (single array)

  37. Present genes:same-vs.-same comparison (single array)

  38. Present genes: Same-vs.-same comparison (single array) • Average SLR = ~ 0.02 + 0.04 (~ 1.014-fold) • not different from zero • that’s good! • Standard deviation = ~ 0.69 + 0.30 • ~ 32% genes have SLR > 0.69 (1.61-fold induction) • ~ 4% genes have SLR > 1.38 (2.60-fold induction) • that’s not good

  39. Present genes: Statistical distribution of SLR • Entire distribution • Not normal (p < 0.01, by D statistic) • Central 95% • Normal (p > 0.2, by D statistic) • Highly noteworthy, since D statistic detects tiny tiny deviations from normality • 5% at tails overestimate the SLR

  40. Present genes:same-vs.-same comparison (single array)

  41. If we compare genes in central 95% versus genes in 5% tails… • Center (95% genes) • Mean signal intensity = 1493 • Tails (5% genes) • Mean signal intensity = 620 (p < 10-19, t-test) • Consistent with intuitive idea that measurement variability is inversely related to level of gene’s expression

  42. Absent genes:same-vs.-same comparison(single array)

  43. Absent genes:same-vs.-same comparison(single array)

  44. Absent genes:same-vs.-same comparison(single array)

  45. Absent genes: Same-vs.-same comparison (single array) • Average SLR = ~ 0.33 + 0.31 (~ 1.26-fold induction) • definitely not good • Standard deviation = ~ 1.12 + 0.24 • > 35% genes have SLR > 1.0 (2-fold induction) • > 5% genes have SLR > 2.0 (4-fold induction) • even worse!

  46. Absent genes: Statistical distribution of SLR • Entire distribution • Not normal (p < 0.01, by D statistic) • Central 95% • Not normal (p < 0.01, by D statistic) • Central 60% • Not normal (p < 0.01, by D statistic)

  47. Summary of same-vs.-same comparisons (single array) • Use SLR only for genes that are actually expressed (i.e., “present” genes) • Central 95% normally distributed with standard deviation of ~ 0.69 • 2.5% at each tail exceeds normal distribution • Do not use SLR for genes that are marginally, if at all, expressed (i.e., “absent” genes) • Most of measured signal is noise • SLR is therefore ratio of two small randomly distributed values

  48. Specificity of single array comparisons

More Related