1 / 25

Roli Shrivastava

pbadillo
Télécharger la présentation

Roli Shrivastava

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extracting binary signals from microarray time-course data Debashis Sahoo1, David L. Dill2, Rob Tibshirani3 and Sylvia K. Plevritis41 Department of Electrical Engineering2 Department of Computer Science 3 Department of Radiology and4 Department of Health Research and Policy and Department of StatisticsStanford University Roli Shrivastava

  2. Introduction • Problem Statement • To identify up and down regulated gene • To identify the time of transition • Experimental Technique • Microarray (Tens of thousands of distinct probes on an array to accomplish the equivalent number of genetic tests in parallel) • Computational Technique • A tool called StepMiner to extract biologically meaningful result from large amounts of data

  3. Types of Transitions 1. One Step 2. Two Step 3. Genes for which the one- or two-step patterns do not fit appreciably better than a constant mean value (the null hypothesis).

  4. Calculate the F statistic for the model and data set Pthreshold = 0.05 Calculate the P-value If P < Pthreshold If P > Pthreshold The model does not fit The model fits Fitting One or Two-Step Function • F1 statistic: Computes how well the one-step model fits the data • F2 statistic: Computes how well the two-step model fits the data • F12 statistic: Compares the fit of one-step model and two-step model on same data • P-value: Low P-value represents a good fit of the model to the data

  5. StepMiner Algorithm one-step fits data AND one-step fits better than two-step two-step fits data AND one-step does not fit it Neither one-step Nor two-step fits the data

  6. Comparison of 4 Algorithms StepMinerAlgo Step height = 5σ. Number of timepoints = 15. A total of 2000 random data, 2000 one step data and 2000 two step data with random step positions.

  7. Comparison of 4 Algorithms Step height = 5σ. Number of timepoints = 15. A total of 2000 random data, 2000 one step data and 2000 two step data with random step positions.

  8. Generation of Simulated Data • Microarray data with 15 non-uniform time points • 4000 genes with 2000 one-step and 200 two-step patterns • Gaussian noise was added to the above data • P-value threshold of 0.05 was used

  9. Results of Simulated Data - I • σ is the standard deviation of noise • Step position is fixed at 5 for 1-step • Step position at 5 and 9 for 2-step • Higher the height easier is the identification

  10. Results of Simulated Data - II • σ is the standard deviation of noise • Random step positions • Small reduction in accuracy • Higher matches occur if all constant segments in a curve have several time points. • Desirable to design experiments so that there are several points before the first interesting transition and after the last interesting transition.

  11. Results of Simulated Data - III • Shows sensitivity to P-value threshold and number of time points • Random step position and step height of 5σ • Two-step signals require more time points than one-step signals • Matches increase on increasing P-value but at the cost of higher False Discovery Rate

  12. Results of Simulated Data - IV • Shows sensitivity to spacing between steps • For 15 time points first step is fixed at position 4 • A spacing of at least 3 time points is required when step height is > 3σ • Steps are required to be placed at least 3 time points from end point

  13. Diauxic Shift • In the initial phases of a growing batch culture, yeast prefers to metabolize glucose and produce ethanol even when oxygen is abundant. • When the glucose is exhausted, cells undergo a “diauxic shift,” in which they switch abruptly to an oxidative metabolism. This pathway allows the oxidation of the accumulated fermentation products and is highly efficient as a mechanism for generating ATP. Brauer et. al., Mol Biol Cell. 2005 May; 16(5): 2503–2517

  14. Analysis of Experimental Data Fitting functions for 3 genes • 2284 genes with diauxic shift • 1088 were matched with one-step transition • 267 were two-step transitions • 929 did not match to anything

  15. The heat map shows two transitions at 8.25 and 9.25 h Same Data reanalyzed using StepMiner Heat Maps Analysis by Brauer et. al.

  16. Comparison With Brauer et al’s Results • The GO annotations and FDR-corrected P-values for the clusters reported in Brauer et al. was recomputed with the latest yeast gene annotations from the Gene Ontology Consortium Website • Table shows the results of the p-values from GO- Term Finder as well as Step Miner.

  17. Table for Comparison

  18. Results Of Comparison • The annotation that had the lowest P-values in Brauer et al. had even low P-values in the StepMiner groups. • In most cases, the P-values in the reanalysis are lower than Brauer et al’s, implies that grouping by time-of-change is at least as effective as hierarchical clustering at identifying relevant genes. • GO annotations are obtained fully automatically using StepMiner – it is not necessary to select interesting clusters manually. • Those clusters which has no P-values from StepMiner were “less interpretable in terms of diauxic shift”, in the words of Brauer et al.

  19. Comparison of StepMiner to Other Tools • Hierarchical clustering: finds clusters that transition at same time point • Manual search required to find transitions • SAM: finds transitions by looking for significant differences in average expression before and after a specified time point. • However, many of the genes selected by this method do not, in fact, have a transition at the specified time point. • EDGE: identify genes whose expression systematically change over time and significantly different from the mean of the expressions over time. • Clearly, this method doesn’t provide the direction and position of significant change directly.

  20. Hierarchical vs. StepMiner Cluster that transitions at 3 hours StepMiner clearly shows other transition times

  21. Comparison of StepMiner to Other Tools - STEM • Provides model profiles and their significance values • But profiles don’t look like step functions and therefore is not helpful to locate transitions

  22. Strengths and Limitations • Easy to understand • Few parameters • Biologically transitions can be more interesting • Very fast < 15s for 15 microarrays of 40000 genes • Can deal with missing measurements • Provides statistical parameters like P-value, FDR etc. • Binary model • There can be other cases: eg, transition is not step • Short and long time courses are not good Most appropriate for 10-30 Time measurements.

  23. Post StepMiner Analysis • Once StepMiner is run genes undergoing binary transitions can easily be partitioned into sets based on the number, direction, and timing of transitions. • These sets can be merged at the user’s discretion (e.g., the set of one-step genes that rise at time 3 could be merged with the two-step genes that rise at time 3), or can be further subdivided etc.

  24. BACK UP SLIDES

  25. Replication vs. Resolution • For accuracy it is better to take more frequent measurements that to get replicates • It comes at a cost of correctly identifying the kind of step

More Related