1 / 29

Microarray Data Analysis

Microarray Data Analysis. The Bioinformatics side of the bench. The anatomy of your data files from MAS 5.0 (Microarray Suite 5.0). .DAT .CEL .EXP .CHP .txt files generated from .CHP. Quality Control (QC) of the chip – visual inspection. Look at the .DAT file or the .CHP file image

miracle
Télécharger la présentation

Microarray Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microarray Data Analysis The Bioinformatics side of the bench

  2. The anatomy of your data files from MAS 5.0 (Microarray Suite 5.0) • .DAT • .CEL • .EXP • .CHP • .txt files generated from .CHP

  3. Quality Control (QC) of the chip – visual inspection • Look at the .DAT file or the .CHP file image • Scratches? Spots? • Corners and outside border checkerboard appearance (B2 oligo) • Positive hybridization control • Used by software to place grid over image • Array name is written out in oligos!

  4. Scratch on a chip

  5. Possible chip contamination

  6. Internal controls • B. subtilis genes (added poly-A tails) • Assessment of quality of sample preparation • Also as hybridization controls • Not used in our module

  7. More internal controls • Eukaryotic Hybridization controls (bioB, bioC, bioD, cre) • E. coli and P1 bacteriophage biotin-labeled cRNAs • Spiked into the hybridization cocktail • Assess hybridization efficiency

  8. And still more internal controls • Actin and GAPDH assess RNA sample/assay quality • Compare signal values from 3’ end to signal values from 5’ end • ratio generally should not exceed 3 • Percent genes present (%P) • Replicate samples - similar %P values

  9. MAS 5.0 output files • For each transcript (gene) on the chip: • signal intensity • a “present” or “absent” call (presence call) • p-value (significance value) for making that call • Each gene associated with GenBank accession number (NCBI database)

  10. How are transcripts determined to be present or absent? • Probe pair (PM vs. MM) intensities • generate a detection p-value • assign “Present”, “Absent”, or “Marginal” call for transcript • Every probe pair in a probe SET has a potential “vote” for presence call

  11. Discrimination score • Probe pairs “vote” via discrimination score (R) • R compared to a predetermined threshold: Tau • R > Tau = present • R < Tau = absent • Voting result expressed as p-value • Reflects confidence of expression call

  12. Altering Tau • You can fine tune Tau yourself within MAS 5.0 • Increase Tau: reduce “false positives”, may also reduce number of TRUE present calls • Our rule: use the default!

  13. Calculation of R R = (PM - MM) / (PM + MM) • (PM – MM): • intensity difference of probe pair • (PM + MM): • overall hybridization intensity • R value closer to 1: lower p-value (detection call is more significant) • PM >> MM • R value close to 0 or negative: higher p-value (detection call is less significant) • MM >/= PM • One-sided Wilcoxon’s Signed Rank test used to determine Detection p-value

  14. Calculating signal • One-Step Tukey Biweight Estimate • Yields robust weighted mean • Relatively insensitive to even extreme outliers • Signal intensity value is created • related to amount of transcript present for that gene

  15. Thank goodness for software!!! • MAS 5.0 does these calculations for you • .CHP file • Basic analysis in MAS 5.0, but it won’t handle replicates • Import MAS 5.0 (.CHP) data into GeneSifter • web based microarray data analysis software package designed BY biologists FOR biologists

  16. How do we want to analyze this data? • Pairwise analysis is most appropriate • Control vs. DMSO • List of genes that are “upregulated” or “downregulated” • Determine fold up or down cutoffs • What is significant? • 1.5 fold up/down? • 2 fold up/down? • 10 fold up/down?

  17. Normalization • “Normalizing” data allows comparisons ACROSS different chips • Intensity of fluorescent markers might be different from one batch to the other • Normalization allows us to compare those chips without altering the interpretation of changes in GENE EXPRESSION

  18. Statistics • Statistical tests allow us to determine how SIGNIFICANT the data are • t-test statistic • compares the means of two groups while taking into account the standard deviations of those means • p value (probability value) of </= 0.05 • (only 5 times out of 100 or less will the change in gene expression be due to chance, rather than a REAL change)

  19. Present or absent? • Can do analysis on genes that are considered “absent” under all conditions • ONE transcript should be “present” in a pairwise analysis

  20. Thresholds/cutoffs • What is a significant change in gene expression? • Some think 2 fold at the lowest • Judgement call • Can also set upper limit of expression changes • Remember we are talking about changes in mRNA expression • does that always mean more protein?

  21. The output • Run analysis, get output of a GENE LIST • List indicates what genes are up or down regulated • p values for t-test • Graphs of signal levels • Absolute numbers not as important here as the trends you see • Now what????

  22. Follow the links • Click on a gene • Find links to other databases • Follow links to discover what the protein does • Now the fun part begins….

  23. Back to Biology • Do the changes you see in gene expression make sense BIOLOGICALLY? • If they don’t make sense, can you hypothesize as to why those genes might be changing? • Leads to many, many more experiments

  24. Validation • Not enough to just do microarrays • Usually “validate” microarray results via some other technique • rt-PCR • TaqMan • Northern analysis • Protein level analysis • No technique is perfect…

  25. Why microarrays? • Ask a single question, and get more answers than you dreamed of! • Can assess GLOBAL changes in gene expression under a certain experimental condition • Can discover new pathways, gene regulation, the possibilities are almost endless

  26. Caveat… • There is NO standard way to analyze microarray data • Still figuring out how to get the “best” answers from microarray experiments • Best to combine knowledge of biology, statistics, and computers to get answers

  27. One last note • Microarrays are “cutting edge” technology • You now have experience doing a technique that most Ph.D.s have never done • Looks great on a resume…

More Related