1 / 23

Identifying Differentially Expressed Genes in Unreplicated Multiple-Treatment Time-Course Microarray Experiments

Identifying Differentially Expressed Genes in Unreplicated Multiple-Treatment Time-Course Microarray Experiments. Rhonda R. DeCook and Dan Nettleton Iowa State University. Experiment Background. J – treatments K – time points for measurement collection G - genes

nike
Télécharger la présentation

Identifying Differentially Expressed Genes in Unreplicated Multiple-Treatment Time-Course Microarray Experiments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying Differentially Expressed Genesin Unreplicated Multiple-TreatmentTime-Course Microarray Experiments Rhonda R. DeCook and Dan Nettleton Iowa State University

  2. Experiment Background • J – treatments • K – time points for measurement collection • G - genes • One microarray for each treatment/time combination (giving J·K total microarrays) • Non-repeated measures. Different • experimental units for each time point • Post-normalized microarray data

  3. An Example Experiment Hours of Exposure to UVA Radiation 0 1 4 24 10 Wild Type Genotype Mutant 1 Mutant 2

  4. Genes of Interest • Treatment effects • Time effects • Interaction between • treatment and time • Any departure from coincident lines • with zero slope

  5. Tests of Interest • We assume that where for any given g. • For every gene, we wish to test for all j,k against all alternatives. • The gthnull hypothesis says that the distribution • of gene expression for gene g is identical for all • combinations of treatment level and time point.

  6. Identifying Genes of Interest • A cell-means model has 0 d.f. for error. • Instead we consider regression models with time as a quantitative variable (13 possible models). • Simplest models are linear in time or have only treatment effects. • The most complicated model has treatment effects and is cubic in time with all possible treatment x time interactions (3 d.f. for error).

  7. Analysis 1. Use BIC to select the “best” model for each gene among the 13 alternative models considered. 2. Separately for each gene, compute a reduced-vs.-full model F-statistic with the “best” model as the full model. 3. Randomly assign the data vectors associated with each GeneChip to the combinations of treatment and time. 4. Recompute the same F-statistic computed in step 2 using the permuted data.

  8. Analysis (ctd.) 5. Repeat steps 3 and 4 B times yielding for each gene g. 6. For each gene g, compute a p-statistic: Note that will tend to be smaller than a proper permutation p-value because the F-statistic used for gene gwas chosen using BIC to favor the alternative hypothesis.

  9. Analysis (ctd.) 7. For each of the permuted data sets and each gene g, compute a p-statistic : a) Choose “best” model for each permuted data set and gene using BIC. b) Compute relevant F-statistic for all other data sets. c) Find the proportion of F-statistics from the other data sets that match or exceed the F-statistic for the permuted data set in question. 8. Compute a permutation p-value for each gene:

  10. Histogram of P-Values Number of P-Values P-Value

  11. Accounting for Many Tests • Many dependent hypothesis tests • Controlling the probability of even one type I error is too conservative • Use Storey and Tibshirani’s method to estimate a False Discovery Rate ‘Estimating the FDR under dependence, with applications to DNA microarrays’ (2001) • Compare observed p-value distribution • with the ‘average’ null distribution

  12. Histogram of P-Values for the Observed Data Histogram of P-Values Averaged over 2499 Permuted Data Sets Number of P-Values Number of P-Values P-Value P-Value

  13. Zooming in on Smallest P-Values Histogram of P-Values for the Observed Data Histogram of P-Values Averaged over 2499 Permuted Data Sets Number of P-Values Number of P-Values P-Value P-Value

  14. Zooming in on Smallest P-Values Histogram of P-Values for the Observed Data Histogram of P-Values Averaged over 2499 Permuted Data Sets Ratio of bar heights is ~11% Number of P-Values Number of P-Values P-Value P-Value

  15. Zooming in on Largest P-Values Histogram of P-Values for the Observed Data Histogram of P-Values Averaged over 2499 Permuted Data Sets Number of P-Values Number of P-Values P-Value P-Value

  16. Zooming in on Largest P-Values Histogram of P-Values for the Observed Data Histogram of P-Values Averaged over 2499 Permuted Data Sets Ratio of bar heights is ~59% Number of P-Values Number of P-Values P-Value P-Value

  17. Estimating the False Discovery Rate (FDR) 94 p-values computed from the observed data are less than or equal 0.002. The average number of p-values less than or equal to 0.002 in the 2499 permuted data sets is 10.273. An initial estimate of FDR is 10.273/94 10.9%

  18. Estimating the False Discovery Rate (FDR) The initial estimate of 10.9% is too high because the estimate is based on assuming that all null hypotheses are true.

  19. Estimating the False Discovery Rate (FDR) 488 p-values computed from the observed data are greater than or equal to 0.9. The average number of p-values greater than or equal to 0.9 in the 2499 permuted data sets is 827.699. Thus we estimate that 488/827.69959% of the null hypotheses are true in our data set.

  20. Estimating the False Discovery Rate (FDR) Our final estimate of the FDR for p<=0.002 is

  21. Estimating the False Discovery Rate (FDR) Our final estimate of the FDR for p<=0.002 is Estimated number of p-values<=0.002 if all null hypothesis were true Number of p-values >=0.9 Estimated number of p-values>=0.9 if all null hypotheses were true Number of p-values <=0.002

  22. Key Points of Method • Partitions gene set according to best fitting model • Requires few assumptions about gene • expression distributions. • Has power to detect a variety of • alternatives to the null when replication • is lacking.

  23. Acknowledgements Rhonda DeCook, Iowa State University Department of Statistics Carol Foster, Iowa State University Department of Botany Eve Wurtele, Iowa State University Department of Botany

More Related