1 / 50

Designing a metabolomics experiment

Designing a metabolomics experiment. Grier P Page Ph.D. Senior Statistical Geneticist RTI International Atlanta Office gpage@rti.org 770-407-4907. Types of Metabolomics. Designing a good study. Primary consideration of good experimental design .

Télécharger la présentation

Designing a metabolomics experiment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing a metabolomics experiment Grier P Page Ph.D.Senior Statistical Geneticist RTI International Atlanta Office gpage@rti.org 770-407-4907

  2. Types of Metabolomics

  3. Designing a good study

  4. Primary consideration of good experimental design • Understand the strengths and weaknesses of each step of the experiments. • Take these strengths and weaknesses into account in your design.

  5. From Drug Discov Today. 2005 Sep 1;10(17):1175-82.

  6. State the Question and Articulate the Goals

  7. The Myth That Data Mining has No Hypothesis • There always needs to be a biological question in the experiment. If there is not even a question don’t bother. • The question could be nebulous: What happens to the gene expression of this tissue when I apply Drug A. • The purpose of the question is to drive the experimental design. • Make sure the samples answer the question: Cause vs. effect.

  8. Experimental Design

  9. Biological replication is essential. • Two types of replication • Biological replication – samples from different individuals are analyzed • Technical replication – same sample measured repeatedly • Technical replicates allow only the effects of measurement variability to be estimated and reduced, whereas biological replicates allow this to be done for both measurement variability and biological differences between cases. Almost all experiments that use statistical inference require biological replication,

  10. Statistical analyses • Supervised analyses – linear models etc • Using fold change alone as a differential expression test is not valid. • ‘Shrinkage’ and or use of Bayes can be a good thing. • False-discovery rate is a good alternative to conventional multiple-testing approaches. • Data is not missing at random • Pathway testing is desirable.

  11. Classification • Supervised classification • Supervised-classification procedures require independent cross-validation. • See MAQC-II recommendations Nat Biotechnol. 2010 August ; 28(8): 827–838. doi:10.1038/nbt.1665. • Wholly separate model building and validation stages. Can be 3 stage with multiple models tested • Unsupervised classification • Unsupervised classification should be validated using resampling-based procedures.

  12. Sample size estimation for metabolomics studies

  13. There is strength in numbers —power and sample size . • Unsupervised analyses • Principal components, clustering, heat maps and variants • These are actually data transformations or data display rather than hypothesis testing, thus unclear if sample size estimation is appropriate or even possible. • Stability of clustering may be appropriate to think about. Garge et al 2005 suggested 50+ samples for any stability.

  14. Sample size in supervised experiments • Supervised analyses • Linear models and variants • Methods are still evolving, but we suggest the approach we developed for microarrays may be appropriate for metabolomics (being evaluated)

  15. Experimental Conduct All experiments are subject to non-biological variability that can confound any study

  16. UMSA Analysis Day 1 Day 2 Insulin Resistant Insulin Sensitive

  17. Design Issues • Known sources of non-biological error (not exhaustive) that must be addressed • Technician / post-doc • Reagent lot • Temperature • Protocol • Date • Location • Cage/ Field positions

  18. Control Everything! • Know what you are doing • Practice! • Practice!

  19. Metabolite quality • Still evolving field, few good metrics such as RIN score or A260/A280 ratios to assess contamination and quality of extraction.

  20. Confirmation of RNA integrity, based on an 28S:18S ratio greater than 1.5 as quantified by Agilent BioAnalyzer and formaldehyde gel electrophoresis However, The Drosophila RNA has a split peak for the 28s ribosomal RNA on theBioanalyzer. Example from RNA Intact RNA Degraded RNA Images from Agilent

  21. The Drosophila RNA has a split peak for the 28s ribosomal RNA on the Bioanalyzer. And no 18S peak Be aware of what your specific Species should look like

  22. What if you can’t control or make all things uniform • Randomize • Orthogonalize

  23. What are Orthogonalization and Randomization ? • Orthogonalization- spreading the biological sources of error evenly across the non-biological sources of error. • Maximally powerful for known sources of error. • Randomization – spear the biological sources of error at random across the non-biological sources of error. • Useful for controlling for unknown sources of error

  24. Examples of Orthogonalization and Randomization ? Randomize The experiment Orthogonalize

  25. Know your data - What should it look like

  26. These are OK

  27. These are not OK

  28. One bad sample can contaminate an experiment

  29. Histogram of p-values

  30. Potentially Bad Chip

  31. Histogram of p-values with bad chip removed

  32. Quality of Database, Bioinformatics and Interpretative tools

  33. Understand what databases include, don’t include, and assumptions • Just because a database says something does not mean it is right. Read the evidence. • Databases are biased. • Databases are incomplete • Databases have lots of data • Understand data before you use it • Database are useful!

  34. Issues in the Annotation of Genes

  35. Annotation is inconsistent across sources

  36. Issues with pathway data

  37. TCA cycle from Ingenuity

  38. TCA from GeneMAPP

  39. TCA cycle from Ingenuity

  40. Summary • Design your experiment well • Conduct your experiment well • Control for non-biological sources of error • Know what is good and bad quality data at each stage including metabolite, image, data, and annotation • If you are aware of these issues and control for them highly powerful and reproducible metabolite experimentation is possible. • Else you get garbage

  41. Overshare your data and show work • Practice compendium research – to allow others to replicate your work • Many high profile omic studies are not even technically reproducible

  42. References • The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray based predictive models. Nat Biotechnol. 2010 August ; 28(8): 827–838. • Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2006 Jan;7(1):55-65. • Reproducible clusters from microarray research: whither? BMC Bioinformatics. 2005 Jul 15;6 Suppl 2:S10. • Baggerly K. "Disclose all data in publications." Nature. 2010 Sep 23;467(7314):401. PMID: 20864982 • Repeatability of published microarray gene expression analyses. Nat Genet. 2009 Feb;41(2):149-55 • A design and statistical perspective on microarray gene expression studies in nutrition: the need for playful creativity and scientific hard-mindedness. Nutrition. 2003 Nov-Dec;19(11-12):997-1000.

  43. If time allows

  44. RTI Regional Comprehensive Metabolomics Resource Core(RTI RCMRC) Susan Sumner, PhD Director RTI RCMRC Discovery Sciences Proteomics and Metabolomics Programs RTI International

  45. Contact Information for the RTI RCMRC Susan C.J. Sumner, PhD Director RTI RCMRC Senior Scientist nanoSafety RTI International Discovery Sciences 3040 Cornwallis Drive Research Triangle Park North Carolina 27709 ssumner@rti.org 919-541-7479 (office) 919-622-4456 (cell) Jason P. Burgess, PhD Program Coordinator, RTI RCMRC Associate Director, Discovery Sciences RTI International 3040 Cornwallis Drive Research Triangle Park North Carolina 27709 jpb@rti.org 919-541-6700 (office)

  46. MS and NMR Instruments at RTI and DHMRI • RTI DHMRI • Mass Spectrometers (38) • LC-MS 13 6 • GC-MS 4 3 • GC x GC-TOF-MS 1 1 • ICP-MS 6 1 • MALDI ToF/ToF 2 1 • NMR (6) 2 4

  47. Some RTI Metabolomics Applications and Pilots Experience with adolescent and adult human subject research, animal model and cell based research, e.g., • Apoptosis- cells • Drug induced liver injury- animal models • in utero exposure to chemicals and fetal imprinting- animal models • Dietary exposure and imprinting- animal models • NAFLD - pediatric obesity; microbiome • Weight Loss- pediatric obesity • Preterm delivery- human subjects • Response to vaccine- human subjects • Nicotine withdrawal- human subjects • Colon cancer- human subjects

  48. Pilot and Feasibility Studies • The aim of the pilot and feasibility program is to foster collaborations and promote the use of metabolomics. • Studies will be selected through an application process. • Application involves abstract, description of samples available (matrix type, volume, type and duration of storage, sample processing, freeze thaws, etc), description of phenotypes, and plan for subsequent grant/contract submissions for metabolomics analysis beyond initial pilot study. • Applications may also include technology development. • Applications must agree to deposit data in DRCC, coauthor publications, and submit joint grant/contract proposals. • Deadlines being defined

More Related