1 / 34

Microarray Analysis with a Small Number of Replicates

Microarray Analysis with a Small Number of Replicates. By Kung-Hua Chang & Dhondup Pemba Mentors: Cecilie Boysen, Ph.D & Jim Breaux, Ph.D Southern California Bioinformatics Institute Summer 2005. Funded By NSF/NIH. Our Task Statistical Analysis with a Small Number of Replicates

khuong
Télécharger la présentation

Microarray Analysis with a Small Number of Replicates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microarray Analysis with a Small Number of Replicates By Kung-Hua Chang & Dhondup Pemba Mentors: Cecilie Boysen, Ph.D & Jim Breaux, Ph.D Southern California Bioinformatics Institute Summer 2005 Funded By NSF/NIH

  2. Our Task Statistical Analysis with a Small Number of Replicates Functional Analysis Additional Projects Background Affymetrix GeneChip® Microarrays VMAxS Steps in Microarray Data Analysis Outline

  3. Affymetrix GeneChip® Microarrays • Signal detection. Fluorescence detection of hybridization between RNA target and oligonucleotide probe. 22 Probes define one gene FOR MORE INFO... http://www.affymetrix.com

  4. Each gene on an Affy chip is represented by a probe set • Perfect Match (PM) probe represents short segment of gene of interest. • Mismatch (MM) probe measures background signal • Data for probe set is summarized into single number (“gene-level” data) FOR MORE INFO... “Processing Affy chip Data: GCOS/MAS 5.0, RMA, and gcRMA”(Roger Bumgarner University of Washington).

  5. ViaLogy’s data analysis service for DNA microarray chip data • Employs Quantum Resonance Interferometry technology to detect signals below background noise FOR MORE INFO... Visit Vialogy.com. Raw Data

  6. Raw Data Image Image Analysis (extract cell-level data) VMAxS Gene-level summarization Normalization (remove non-biological variation) Statistical Analysis (select differentially expressed genes) Functional Analysis (identify affected processes and pathways) Steps in Microarray Data Analysis

  7. Statistical Analysis with a Small Number of Replicates Overview • Overall objective: Perform end-to-end analysis on a client’s microarray data set (from raw image to pathway analysis) • Problem: Dataset contained a small number of replicates

  8. Problem with small number of replicates Small number of replicates yields unreliable identification of gene variances With seven replicates, we are more confident that gene 1 is upregulated FOR MORE INFO... Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays (Nitin et al.)

  9. Approach to dealing with a small number of replicates • Analyze a larger data set that has a good number of replicates (n = 8x8). • Assume this is the “truth” • Analyze a randomly selected subset of this data set (n = 3x3) using three different algorithms. • Compare output from 8x8 analysis to 3x3 analysis. • Decide how to analyze client’s data set based on results

  10. Statistical Analysis Algorithms • SAM: Significance Analysis of Microarray (Tusher, Tibshirani & Chu) • J-Score (Jim Breaux) • Cyber-T (Baldi & Long)

  11. SAM • Each gene receives a score based on the difference in average gene expression relative to the standard deviation of the repeated measurements. • Genes with scores greater than a threshold are considered significant. • This threshold is determined by the false discovery rate the user desires. FOR MORE INFO... Significance analysis of microarrays applied to the ionizing radiation response(Tusher et al)

  12. J-Score • Each gene receives a score based on average fold-change in gene expression relative to the standard deviation of the repeated measurements. • Cut-off for selection of “significant” genes is arbitrary.

  13. Cyber-T (Baldi & Long) Cyber-T ‘Regularized t-test’ • “Assumes genes of similar expression levels have similar measurement errors. • The variance of any single gene can be estimated from the variance from a number of genes of similar expression level. • The variance of any gene within any given treatment can be estimated by the weighted average of a prior estimate of variance for that gene.” FOR MORE INFO... Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework (Long et al).

  14. Results: Comparison between SAM 8x8 and 3x3 methods • At 1% False Discovery Rate (FDR) SAM 8x8 picked up 762 significant genes (estimated number of false significant genes = 8). • Agreement between SAM 8x8 and the top 1000 genes from the 3x3 methods:

  15. Results: Comparison between 3x3 methods • Venn Diagram: Union of all three methods = 433 unique genes

  16. Results: Comparison between 3x3 methods • Agreement between any two methods: • These findings are consistent with a previous study by a group at NIH (Hosack et al.): • Found that agreement between various methods tested ranged from 7% to 60%.

  17. Possible Approaches for Final Analysis • Method 1: Final set of significant genes is derived from the method that had the most overlap with SAM 8x8 (J-Score). • Final result: • 1000 total significant genes • At most 356 true positives • At most 652 false positives • Pro: • Decent number of true positives • Con: • Large number of false positives • Might be missing important genes found by other two methods

  18. Possible Approaches for Final Analysis • Method 2: Final set of significant genes is the intersection of the three methods. • Final result: • 174 total significant genes • At most 174 true positives • At most 8 false positives • Pro: • Lowest number of false positives • Con: • Lowest number of true positives

  19. Possible Approaches for Final Analysis • Method 3: Final set of significant genes is the union of the three methods • Final result: • 1631 total significant genes • At most 433 True positives • At most 1206 False positives • Pro: • Highest number of true positives. • Con: • Highest number of false positives

  20. Final Approach • Return the largest number of true positives to the client (Method 3). • To deal with large number of potential false positives in the results, we rank each gene based on the ranking from Cyber-T, J-Score, and SAM methods. • For example, if “Gene 02” is ranked number 2 in Cyber-T, number 3 in J-Score, and number 4 in SAM, then the overall ranking is (2 + 3 + 4) / 3 = 3 • Higher ranking = more likely to be true positive

  21. Example Output of Our Approach

  22. Functional Analysis • Mapping to biological processes. • - EASE, the Expression Analysis Systematic Explorer from the National Institute of Allergy and Infectious Diseases at the National Institute of Health. • Mapping to pathways. • - PathwayAssist software from Ariadne Genomics. FOR MORE INFO... http://apps1.niaid.nih.gov/david/ http://www.ariadnegenomics.com/products/pathway.html

  23. Mapping to biological processes • The list of up and down regulated genes were inserted into EASE. • The Lower the EASE score the more highly the ranked process is. • Example of the top 14 processes, locations and functions found from our significant genes.

  24. Gene 2 Gene 1 Gene 3 Mapping to pathways • Gene 1, 2 and 3 are significant up- or down-regulated genes by our combination method • Investigation of gene 1 reveals gene 2 and 3 are involved in gene 1’s pathway.

  25. Conclusion • Three algorithms for selecting differentially expressed genes produced different lists of genes with ~60% to 70% agreement. • Taking the union of the results from the three algorithms yielded the most true positives for our client. • Biological processes and pathways found through functional analysis correspond to what we expected based on samples studied. • Helps to make microarray results more believable.

  26. Additional Projects: Chris’s GUI • Automation of the previously discussed analyses with a GUI.

  27. Chris’ GUI project

  28. Chris’ GUI project screen 2

  29. Additional Projects: Dhonam’s GUI • ViaLogy has individual scripts that are used to test quality of VMAxS output. • Current implementation requires working knowledge of R scripting. • Project: implement a user-friendly GUI program to execute multiple QC tests.

  30. Dhonam’s GUI Project Screen 1

  31. Dhonam’s GUI Screen 2

  32. Dhonam’s GUI Screen 3 Optional window pops up if default parameters are not desired

  33. Acknowledgements • Dr. Sandra Sharp • Dr. Wendie Johnston • Dr. Jamil Momand • Dr. Nancy Warter-Perez • Other SoCalBSI Staff and Faculty • SoCalBSI 2005 Participants • Lien Chung (SoCalBSI Participant 2004) SoCalBSI ViaLogy • Dr. Cecilie Boysen • Dr. Jim Breaux • Other ViaLogy Employees

  34. References • Hosack DA, Dennis GJ, Sherman BT, Lane HC, Lempicki RA: Identifying biological themes within lists of genes with EASE.Genome Biol 2003, 4:R70. • Leslie M. Cope, Irizarry RA, Jaffee HA, Wu J, Speed, TP. A benchmark for Affymetrix GeneChip expression measures. Bioinformatics 2004;20:323–331 • Long, A.D., Mangalam, H.J., Chann, B.Y.P., Tolleri, L., Hatfield, G.W., and Baldi, P. (2001) Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework. The Journal of Biological Chemistry 276(23):19937-19944. • Nitin Jain, Jayant Thatte, Thomas Braciale, Klaus Ley, Michael O'Connell, Jae K. Lee: Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics 19(15): 1945-1951 (2003) • Processing Affy chip Data: GCOS/MAS 5.0, RMA, and gcRMA (Roger Bumgarner ) • Saviozzi S, Calogero RA. 2003. Microarray probe expression measures,.data normalization and statistical validation. Comparative and Functional Genomics Comp Funct Genom 2003; 4: 442–446.Conference review • Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response, PNAS, 98, 5116-5121 • http://www.tau.ac.il/lifesci/bioinfo/teaching/2002-2003/Differential_Genes_Dec03.ppt • http://www.kochi-u.ac.jp/~tatataa/RA/RA-targets.html • http://www.biostat.jhsph.edu/~ririzarr/Teaching/688/04-preproc-norm.pdf/ • http://nibn.bgu.ac.il/core_units/microarray_facility/microarray_technique.htm • http://www.Vialogy.com

More Related