1 / 28

Differential Expression Analysis Multiple Hypotheses Testing

Differential Expression Analysis Multiple Hypotheses Testing. Xiaole Shirley Liu STAT115 / STAT215. Variance Stabilization in Differential Expression Analysis. Problem with estimating variance when the sample size is small (e.g. 3 treatments + 3 controls) Use a constant  for all the genes?

scout
Télécharger la présentation

Differential Expression Analysis Multiple Hypotheses Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Differential Expression AnalysisMultiple Hypotheses Testing Xiaole Shirley Liu STAT115 / STAT215

  2. Variance Stabilization in Differential Expression Analysis • Problem with estimating variance when the sample size is small (e.g. 3 treatments + 3 controls) • Use a constant  for all the genes? • Statistical Analysis of Microarrays (SAM) • Modified t*, increase  based on  of other genes on the array (i.e. lowest 5 percentile of ) • LIMMA: Smyth 2004

  3. LIMMA: Design Matrix • Specifies RNA samples used on arrays • >Mat Treat1 Treat2 Control Sample1 1 0 0 Sample2 1 0 0 Sample3 1 0 0 Sample4 0 1 0 Sample5 0 1 0 Sample6 0 1 0 Sample7 0 0 1 Sample8 0 0 1 Sample9 0 0 1

  4. LIMMA: Contrast Matrix • Specifies which comparisons are of interest • > contrast Treat1-Control Treat2-Control Treat1 1 0 Treat2 0 1 Control -1 -1 • Smooth genewisevariance towards a common (typical) value by borrowing information from all the genes, but allow flexibility for individual genes

  5. LIMMA Hierarchical Model • Prior s0 in effect adds d0 extra arrays for estimating the variance of g

  6. LIMMA Moderated T-test • Ordinary t-test • Moderated t-test with increased DoF j based on number of samples in the particular comparison

  7. Multiple Hypotheses Testing • We test differential expression for every gene with p-value, e.g. 0.01 • For ~20 K genes on the array, potentially 0.01 x 20K = 200 genes wrongly called • H0: no diff expr; H1: diff expr • Reject H0: call something to be differential expressed • Should control family-wise error rate or false discovery rate

  8. Family-Wise Error Rate • P(false rejection at most one hypothesis) < α P(no false rejection ) > 1- α • Bonferroni correction: to control the family-wise error rate for testing m hypotheses at level α, we need to control the false rejection rate for each individual test at α/m • If α is 0.05, for 20K gene prediction, p-value cutoff is 0.05/20K = 2.5E-6 • Too conservative for differential expressed gene selection

  9. False Discovery Rate V: type I errors, false positives T: type II errors, false negatives FDR = V / R, FP / all called

  10. False Discovery Rate • Less conservative than family-wise error rate • Benjamini and Hochberg (1995) method for FDR control, e.g. FDR ≤ * • Assume all the p-val from different tests are independent • Draw all m genes (x), ranked by p-val (y) • Draw line y = x * / m, x = 1…m • Call all the genes below the line

  11. FDR Threshold Genes ranked by p-val p-value x * / m line index / m

  12. Q-value • Storey & Tibshirani, PNAS, 2003 • Empirically derived q-value • Every p-value has its corresponding q-value (FDR) • FDR’s academic vs practical values

  13. Gene Ontology

  14. Gene Annotation • How to report differentially expressed genes or gene clusters? • Enriched for certain pathways, certain functions, or proteins localized in the same complex, etc? • Gene Ontology Consortium • Ashburner et al 1998 • Annotate gene function in the human genome • Now extended to many model organisms • Why do we care? • Effectively communicate biomedical knowledge • Organize and summarize annotations in structured way • Allow effective and meaningful computation on gene annotations

  15. GO Categories • Molecular function • Describe gene’s jobs or abilities • E.g. transporters, transcription factor • Biological process • Events or pathways • E.g. cell differentiation, maturation, development • Cellular component • Describe locations (subcellular structures, macromolecular complexes) • E.g. nucleus, cell membrane, protein complexes

  16. GO

  17. GO • Relationships: • Subclass: Is_a • Membership: Part_of • Topological: adjacent_to; Derivation: derives_from • E.g. 5_prime_UTR is part_of a transcript, and mRNA is_a kind of transcript • Same term could be annotated at multiple branches • Directed acyclic graph

  18. Evaluate Differentially Expressed Genes • NetAffx mapped GO terms for all probesets Whole genome Up genes GO term X 100 80 Total 20K 200 • Statistical significance? • Binomial proportional test • p = 100 / 20 K = 0.005 • Check z table

  19. Evaluate Differentially Expressed Genes Whole genome Up genes GO term X 100 80 Total 20K 200 • Chi sq test or Fisher’s exact test: Up !Up Total GO: 80 (1) 20 (99) 100 !GO: 120 (199) 20K-120 (19701) 20K-100 Total: 200 20K-200 20K • Check Chi-sq table

  20. GO Tools for Microarray Analysis • http://neurolex.org/wiki/Category:Resource:Gene_Ontology_Tools • Hundreds • DAVID

  21. Gene Set Enrichment Analysis

  22. Gene Set Enrichment Analysis • In some microarray experiments comparing two conditions, there might be no single gene significantly diff expressed, but a group of genes slightly diff expressed • Check a set of genes with similar annotation (e.g. GO) and see their expression values • Kolmogorov-Smirnov test • GSEA at Broad Institute

  23. Gene Set Enrichment Analysis • Mootha et al, PNAS 2003 • Kolmogorov-Smirnov test • Cumulative fraction function: What fraction of genes are below this fold change?

  24. Gene Set Enrichment Analysis • Alternative to KS: one sample z-test • Population with all the genes follow normal ~ N(,2) • Avg of the genes (X) with a specific annotation: STAT115 03/18/2008

  25. Gene Set Enrichment Analysis • Set of genes with specific annotation involved in coordinated down-regulation • Need to define the set before looking at the data • Can only see the significance by looking at the whole set

  26. Expanded Gene Sets • Subramanian, et al PNAS 2005

  27. Examples of GSEA

  28. Summary • LIMMA: use hierarchical model to stabilize gene-wise variance • FDR: adjust for multiple hypotheses testing • FWER, Benjamini-Hochberg, qvalue • GO Annotation, directed and acyclic • 3 categories, and simple relationships • Test for statistical enrichment • GSEA: use existing GO categories and other profile gene sets, KS tests

More Related