1 / 34

MSCL Analyst’s Toolbox

MSCL Analyst’s Toolbox . Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson Jan 2008. Mathematical and Statistical Computing Laboratory Division of Computational Bioscience. Course Outline. Day 1 MSCL Analyst’s Toolbox and JMP™ overview MSCL Toolbox Concepts

velvet
Télécharger la présentation

MSCL Analyst’s Toolbox

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MSCL Analyst’s Toolbox Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson Jan 2008 Mathematical and Statistical Computing Laboratory Division of Computational Bioscience

  2. Course Outline Day 1 • MSCL Analyst’s Toolbox and JMP™ overview • MSCL Toolbox Concepts • JMP™fundamentals • Lunch • Affymetrix ExpressionConsole™, processing .cel files, exporting data • MSCL Toolbox Demo • Data input • Basic Analysis (Master File, Final File, Data normalization, QC, PCA, ) • Gene selection, statistical tests (p-values, FDR) • Annotation Day 2 • Statistical Topics (PCA, Data normalization, FDR) • MSCL Analyst’sToolbox Demo (cont.) • Complex Analysis (2-way ANOVA, blocked ANOVA) • Data Visualization

  3. Topics not included • Exon Array Analysis -- coming soon! • SNP chip • Resequencing analysis, ChIP-Chip, copy number • 2-color or spotted cDNA array analysis • complete JMP tutorial • JMP on Mac, Linux • JMP scripting language • Data management commands in JMP: • Stack, Split, Concatenate, Sort

  4. Why use JMP? • Interactive graphics facilitates data exploration, discovery of features • Powerful, > 2,00,000 rows by 100s of columns (currently, 2 GB limit) • Scripting language -- object oriented, allows matrix manipulation • Connects to database servers including NIHLIMS or local GCOS • JMP is also general purpose statistics pack • Good technical support for JMP from: (919) 677-8008 or www.jmp.com • No direct cost to individual NIH users* (centrally supported in most NIH ICs) • MSCL Analyst's Toolbox is FREE, adds tools for microarray studies

  5. MSCL Analyst’s Toolbox Features • Menu driven • Automated gene annotations • Web link-out** • Highly interactive, intuitive user interface • Analysis pipeline, based on years of experience • Familiar parametric analysis, e.g. ANOVA • Exploratory Data Analysis • Adaptable to new designs, analyses (e.g. Exon chips, SNP chips) • Powerful, handles largest Affy chips, probe-level analysis • Up to hundreds of chips at once • PC, Mac or Linux desktops • Support available through MSCL

  6. MSCL Analyst’s Toolbox Capabilities • Connects to the central NIHLIMS database or local GCOS databases • Reads in Pivot Tables from Affymetrix EC™ or GCOS™ • Visualizes Principal Components • Analyzes simple experiments (paired, unpaired T-tests) • Analyzes complex experiments (multiple treatments, time series, linear trends, slope changes between treatments) • Compensates for “batch” effects • Selects and annotates significant genes • Manages multiple gene lists (intersection, union, Venn diagrams) • Multivariate, Cluster, Discriminant, Neural net analysis • Uses dynamic visualization tools

  7. How to obtain: • JMP • http://isdp.cit.nih.gov/downloads/stats.asp • Find your desktop support person at http://isdp.cit.nih.gov/information/contact_lookup_nih.asp • JMP technical support from (919) 677-8008 • The MSCL Analyst's Toolbox • Download from http://affylims.cit.nih.gov • Help offered on collaborative basis by MSCL • Email questions to: munson@helix.nih.gov

  8. NIH Bioinformatics Cooperativehttp://affylims.cit.nih.gov

  9. MSCL Toolbox Data Pipeline: files Xform • Input files or Fetch data • Transform and normalize • Principal Components Analysis • Create Master file, add treatment groups • Compute statistical test, get p-values • Correct for multiple comparisons or use FalseDiscoveryRate • Compute log fold-change • Visualize results • Select relevant genes PCA Final Master

  10. Data sources: • NIHLIMS database via ODBC connection • Local GCOS database via ODBC connection • GCOS pivot table • EC pivot table (NEW support for this option) • Excel spread sheet • Text files

  11. Data Input or data fetch NIHLIMS database Publish(MAS) EC™ or GCOS™ MAS5™ MSCL Publish DB <username/password> Process DB <experiment> .dat files .cel files .chp files .rpt files DCEG/NCI Publish DB <username/password> CCMD Publish DB Import ODBC access archive(LM) delete(LM) assume ownership(LM) Import(LM) Export(LM) client workstation Analyze (MAS) .txt client files Fluidics Platform Scanner DMT A-SCAN Partek GeneSpring

  12. 1 Genes 20,000 1 16 Samples Gene Expression Data Matrix Sample information Expression Matrix Gene Annotations

  13. 1 Genes 20,000 Annotations for each gene • Probe Set ID • Genbank ID • Unigene ID, Title • Entrez Gene ID • Cytogenetic map location • Physical map location • HUGO gene symbol, synonyms • Functional relevance • Associated literature references ... • GO terms for molecular process, biological function or cellular component Gene Annotations

  14. Annotation Files: • Affymetrix annotations for each probeset have been downloaded and formatted for MSCL Toolbox, available at affylims.cit.nih.gov • Annotations are updated quarterly • Annotation tables may be JOINed by ProbeSetID • Probe Set ID • Gene Title • Gene Symbol • UnigeneID • Transcript ID • Ensembl • Entrez Gene • Representative Public ID • First SwissProt • Genome Alignment Chromosome • Genome Alignment Start Address • Genome Alignment Stop Address • Genome Alignment Strand • Chromosomal Location Final Annot. Final-Annot

  15. Annotating Genes Netaffx, reformatted Your data file “JOIN” on ProbeSetID

  16. 1 Samples 16 Information about the Sample(transposed into MasterFile) • Clinical information (human) • Diagnosis • Demographic information • Treatment (in vivo, in vitro) in designed experiment • Tissue of origin • Cell culture, strain, passage • Sampling date/time • RNA preparation protocol • Operator/batch/lot/laboratory information • QC information (rawQ, scale factor, 3/5-actin, 3/5-GAPDH, etc) Information about each Sample

  17. Table formats • JMP usually deals with a single Table, but… • TWO tables are needed for MSCL Analyst’s Toolbox: • 1. "Master File" layout • Each ROW represents a chip • Columns define treatment, replicate number, etc. • 2. "Final" layout • COLUMNs correspond to chips (rows in Master File) • Each ROW is a probe set, unique identifier is probe set ID • Tables are LINKED by “Shortnames” field in Master

  18. Master File -- one row per chip Final File -- one row per probe set Linked Table Formats

  19. Naming Convention for Final File Columns (prefixes) • Data type: AD-, SG-, PA- • Data transform: L-, Lmed-, GL-, S10- • Statistical results: p-, FDR-, mean-, SFC- • Column Naming Tips: • Avoid punctuation, hyphen, period, slash, etc. • Avoid spaces, use underscore “_” instead • Shorter is better • Toolbox utility available for trimming column names Column Name ITEM_NAME SG-33NH SG-33TH S10-33NH S10-33TH PA-33NH PA-33TH SFC-7 SFC-11 p-slope&cent2 FDR slope&cent2

  20. Data Pipeline: files Xform • Input files or Fetch data • Transform and normalize • Principal Components Analysis • Create Master file, add treatment groups • Compute statistical test, get p-values • Correct for multiple comparisons or use FalseDiscoveryRate • Compute log fold-change • Visualize results • Select relevant genes PCA Final Master

  21. Data Transformation and Normalization

  22. Log(x/median x) transform (“Lmed”)

  23. Data Pipeline: files Xform • Input files or Fetch data • Transform and normalize • Principal Components Analysis • Create Master file, add treatment groups • Compute statistical test, get p-values • Correct for multiple comparisons or use FalseDiscoveryRate • Compute log fold-change • Visualize results • Select relevant genes PCA Final Master

  24. Principal Components Analysis PC 2(12%) PC 1(38%)

  25. Data Pipeline: files Xform • Input files or Fetch data • Transform and normalize • Principal Components Analysis • Create Master file, add treatment groups • Compute statistical test, get p-values • Correct for multiple comparisons or use FalseDiscoveryRate • Compute log fold-change • Visualize results • Select relevant genes PCA Final Master

  26. Analysis Scripts • ANOVA1 • T-test, unequal variance • Paired t-test • Consistency test • ANOVA1 with blocking • ANOVA2 with interaction terms (unbalanced data allowed) • ANOVA2 with blocking • Linear regression • ANCOVA with blocking (balanced data case) • ANCOVA2 with blocking (balanced data case) • Other tests are easily added (requires scripting)

  27. Data Pipeline: files Xform • Input files or Fetch data • Transform and normalize • Principal Components Analysis • Create Master file, add treatment groups • Compute statistical test, get p-values • Correct for multiple comparisons or use FalseDiscoveryRate • Compute log fold-change • Visualize results • Select relevant genes PCA Final Master

  28. Log(FoldChange)=“LFC” FoldChange = treated / control Log(FoldChange) = Log(treated / control) = Log(treated) - Log(control) Rule of Thumb for Base10 Logarithms: Log10(2-fold change) = 0.3 Log10(10-fold change) = 1 Log10(0.1-fold change) = -1

  29. Data Pipeline: files Xform • Input files or Fetch data • Transform and normalize • Principal Components Analysis • Create Master file, add treatment groups • Compute statistical test, get p-values • Correct for multiple comparisons or use FalseDiscoveryRate • Compute log fold-change • Visualize results • Select relevant genes PCA Final Master

  30. Significance of change Magnitude of change, Log Scale Volcano Plot Selection Regions

  31. Interpreting Gene Lists Final Annot. Filter (FDR<10%) Ingenuity™, GeneGo™ GeneList Significant Terms

  32. GO-SCAN- Gene Ontology Annotations • Gene Ontology for Significant Collection of Annotations: GO-SCAN is a bioinformatics • tool that selects and presents relevant Gene Ontology (GO) annotations for a gene "hit" • list from an Affymetrix microarray experiment. http://goscan.cit.nih.gov/

  33. Ingenuity Pathway Analysis(Doug Joubert, NIH Library)

More Related