1 / 42

Pathway Analysis Karl Brand, June 2012

Pathway Analysis Karl Brand, June 2012. overview. 1. goal 2. annotation 3. tools (various approaches, pros & cons) 4. underlying statistics (Fisher’s exact test) 5. in use (DAVID) 6. to summarise. goal. To understand genomics results &/or Translate genomics data into knowledge &/or

linus
Télécharger la présentation

Pathway Analysis Karl Brand, June 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pathway AnalysisKarl Brand, June 2012

  2. overview 1. goal 2. annotation 3. tools (various approaches, pros & cons) 4. underlying statistics (Fisher’s exact test) 5. in use (DAVID) 6. to summarise

  3. goal To understand genomics results &/or Translate genomics data into knowledge &/or “…for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power”1 To facilitate generating a testable hypothesis 1Khatri et al., 2012

  4. tools You have : Applied methods to identify differentially regulated biological entities (BEs), e.g. p < 0.05 with fold change greater than 1.5 What now? You could pass this list to your chosen pathway analysis tool, but first…

  5. annotation

  6. annotation: a modern problem Synonyms Homonyms Acronyms Different names for the same biological entity Same name for different biological entities Reduced words representing biological entities PAP, alias for: • PAP (Pancreatitis-associated protein) • MRPS30 (Mitochond ribosomal prot 30S) • PAPOLA (Poly(A) polymerase alpha) 5418 genes with synonyms (38% of total) SCT stands for: • Stem cell transplant • Secretin • Salmon calcitonin

  7. Dutch printed map 1600’s Discoveries of Willem Jansz: 1606 is the first recorded European discovery of Australia (New Holland) at Cape York Peninsula annotation Slide by A. Stubbs

  8. annotation • And now! • Post Genome view of the world Slide by A. Stubbs

  9. These changes reflect new information or analysis The frequency of the changes can be problematic Attempts made to ‘hide’ this IDs merged/ deleted/ temporarily un-mapped on the genome sequence Even common concepts such as Genes Boundaries move, TF Binding Sites discovered annotation Database (and their IDs) Change Over Time… The Shifting Sands of Databases and Genome builds… “M. Moorhouse” Slide by M. Moorhouse

  10. annotation Khatri et al., 2012

  11. annotation Khatri et al., 2012

  12. annotation

  13. annotation

  14. tools You have : Applied methods to identify differentially expressed gene’s* (DEGs), e.g. p < 0.05 with fold change greater than 1.5 What now? You could pass this list to your chosen pathway analysis tool, but first… ensure you have mapped your identifiers to the latest annotations. And then what? *or proteins, metabolites

  15. tools You get the latest pathway analysis tools... February 2012 | Volume 8 | Issue 2

  16. tools February 2012 | Volume 8 | Issue 2 Huang et al., 2009

  17. tools February 2012 | Volume 8 | Issue 2 Khatri et al., 2012

  18. tools • First generation - over representation analysis (ORA) • aka singular enrichment analysis (SEA) • e.g. EASE, DAVID, IPA* • 0. Use parametric statistics to identify DEGs, e.g. limma • 1. Choose significance level e.g. FDR < 0.05, FC > 1.5 • 2. Use parametric statistics to identify annotations over represented within your list compared to what was assayed e.g. Fisher’s exact test *disclosure – our department has a licensing agreement with Ingenuity Systems, Inc.

  19. tools First generation - over representation analysis (ORA) Caveats: 1. thresholdness – what about the transcript with p = 0.050001, FC = 1.4999 2. equality, transcript-X with p = 0.0000001, FC = 100 considered equal to trans-Y p = 0.049, FC = 1.51 3. assumption of independence between both genes and pathways inflates significance 4. ignores relationships between genes/gene products 5. significance increases with population size

  20. tools • Second generation – gene set enrichment analysis (GSEA), • aka functional class scoring (FCS) • e.g. GSEA, GlobalTest, Gazer, IPA • Use parametric statistics to determine DE for all genes • e.g. t-distribution statistics • 2. Use various statistics to combine gene statistics and determine pathway statistics e.g. Wilcoxon rank sum, Kolmogorov-Smirnov • 3. Permute phenotypes and pathways to determine pathway significance

  21. tools Second generation – gene set enrichment analysis (GSEA) Overcomes most ORA limitations, except… Caveats: 1. assumes independence between pathways 2. dependence on ranking approaches miss magnitude of changes between phenotypes, i.e., sham FC = 10; treated similar FC = 100 3. ignores relationships between genes/gene products 4. difficult/can not use your own special list - not an issue for ORA

  22. tools • Third generation – pathway topology (PT), • aka modular enrichment analysis (MEA) • e.g. DAVID, SPIA, IPA • Use various statistics to determine differences in gene-gene* interactions** for all genes • e.g. Pearson’s correlation • 2. Use various statistics to combine gene interaction statistics and determine pathway significance e.g. permutation, hypergeometric distribution *aka node-node **edges

  23. tools • Third generation – pathway topology (PT) • Caveats: • limited interaction knowledge, i.e., thus hampered by immature interaction databases (KEGG, BioCarta, Reactome, PantherDB etc.) • Not to mention a lack of cellular and temporal resolution of interactions.

  24. underlying statistics Fisher’s exact test demonstration (if time permits)

  25. in use DAVID

  26. in use • DAVID • Keep in mind, before uploading: • does you list of DEGs contain gene’s expected a priori? • have you generated at least three* list’s with different cutoffs e.g. p < 0.05 / 0.01 , FC > 1.3 / 1.5 • And after uploading: • are the pathway(s) expected a priori, identified in your analysis? *only for ORA analysis

  27. in use • DAVID • Withstood the test of time (released 2003) • proven functionality – highly cited • 2. comprehensive – many databases accessible • 3. feature rich – ORA, MEA, annotation mapping, etc. • 4. constantly updated & maintained – v6.7 • 5. well supported – personal experience • 6. easy to use, well documented • 7. free as in gratis

  28. in use DAVIDhome

  29. in use DAVID upload

  30. in use DAVID list management *Ariel Pink's Haunted Graffiti

  31. in use DAVID background selection

  32. in use DAVID functional annotation chart

  33. in use DAVID functional annotation chart (options)

  34. in use DAVID dowload results

  35. in use DAVID results in spreadsheet

  36. in use DAVID results in spreadsheet

  37. in use DAVID functional annotation clustering

  38. in use DAVID functional annotation clustering

  39. to summarise • 1. choose your analysis approach: • ORA if you must use your own special gene list • GSEA or PT, in addition to ORA, where possible

  40. to summarise DAVID Khatri et al., 2012

  41. to summarise • 1. choose your analysis approach: • ORA if you must use your own special gene list • GSEA or PT, in addition to ORA, where possible • 2. use a range of cut-offs for ORA analysis • 3. verify gene lists and pathway analysis output with a priori biology • 4. choose free (gratis & libre) tools where possible, in addition to proprietary apps

  42. questions? k.brand@erasmusmc.nl

More Related