1 / 49

Integrative Omics for Cancer Biology

Integrative Omics for Cancer Biology. Xiang Zhang , PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University of Louisville, Louisville, KY 40292 xiang.zhang@louisville.edu. Systems Biology.

rob
Télécharger la présentation

Integrative Omics for Cancer Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University of Louisville, Louisville, KY 40292 xiang.zhang@louisville.edu

  2. Systems Biology is a field in biology aiming at systems level understanding of biological processes, where a bunch of parts that are connected to one another and work together. It attempts to create predictive models of cells, organs, biochemical processes and complete organisms. • Integrative systems biology • Extracting biological knowledge from the ‘omics through integration • Predictive systems biology • Predicting future of biosystem using ‘omics knowledge, e.g. in-silico biosystems Davidov, E.; Clish, C. B.; Oresic, M.; Zhang, X; et al. Omics: A Journal of Integrative Biology. 2004, 8, 267-­288. Clish, C. B.; Davidov, E.; Oresic, M.; Zhang, X; et al. Omics: A Journal of Integrative Biology. 2004, 8, 3­-13.

  3. Omics Space Differentialomics is the beginning of Systems Biology molecule cell tissue organism …

  4. Differential Proteomics & Metabolomics • Differential proteomics and metabolomics are qualitative and quantitative comparison of proteome andmetabolome under different conditions that should unravel complex biological processes • It can be used to study any scientific phenomenathat may change the proteome and/or metabolome of a living system. • Cancer Biomarker Discovery • Nano-medicine NIH • Environment • Food and nutrition preventative medicine

  5. Biomarker Discovery is Major Research Field of Differential Omics Biomarkers are naturally occurring biomolecules useful for measuring the prognosis and/or progress of diseases and therapies. • These substances may be normally present in small amounts in the blood or other tissues • When the amounts of these substances change, they may indicate disease. • Valid biomarkers should • demonstrate drug activity sooner • facilitate clinical trial design by defining patient populations • optimize dosing for safety and efficacy • be sensitive and easy to assay to speed drug development

  6. What Types of Change Are Expected? • Sensing structural change is a major element of comparative proteomics • Most of metabolomics works focus on concentration change only.

  7. Challenges in Proteomics • Sample complexity • About 25K types of protein coding-genes present in human. IPI human database (v3.25) has 67,250 entries, which could generate about 106-8peptides • More than one hundred post translational modifications (PTMs) could happen in a proteome • Large protein concentration difference • 107-8 in human cells, and at least 1012 in human plasma • Dynamic range of a LC-MS is about 104-6 • The top 12 high abundant proteins constitute approximately 95% of total protein mass of plasma/serum • Albumin, IgG, Fibrinogen, Transferrin, IgA, IgM, Haptoglobin, alpha 2-Macroglobulin, alpha 1-Acid Glycoprotein, alpha 1-Antitrypsin and HDL (Apo A-I & Apo A-II). • Dynamic system, large subject variation

  8. Challenges in Metabolomics • Metabolites have a wide range of molecular weights and large variations in concentration • The metabolome is much more dynamic than proteome and genome, which makes the metabolome more time sensitive • Metabolites can be either polar or nonpolar, as well as organic or inorganic molecules. This makes the chemical separation a key step in metabolomics • Metabolites have chemical structures, which makes theidentification using MS an extreme challenge cholesta-3,5-diene

  9. Differential Omicsbiomarker discovery Diseased Healthy A B C D … Z A B C D … Z A B C D … Z A B C D … Z A B C D … Z A B C D … Z A B C D … Z A B C D … Z S1 S2 S4 S5 S6 S7 S8 S3

  10. Informatics Platform

  11. Roadmap Systems Biology Differential omics • Experimental design • Molecular identification • Data preprocessing • Statistical significance test • Pattern recognition • Molecular networks

  12. MDLC Platforms Sample • MudPIT, i.e. SCX followed by RP • The proteome is split into 10-20X more fractions • There is carry-over between fractions • LC fractions generally still are too complex for MS • Affinity Selection • Avidin selection of Cys-containing peptides • Cu-IMAC for His-containing peptides • Ga-IMAC for phosphorylated peptides • Lectins for glycosylated peptides APR AP AP Digestion SCX F1 F2 F2 F2 RPC-MS Qiu, R.; Zhang, X. and Regnier, F. E. J. Chromatogr. B. 2007, 845, 143-150. Wang, S.; Zhang, X.; and Regnier, F. E. J. Chromatogr. A 2002, 949, 153-162. Regnier, F. E.; Amini, A.; Chakraborty, A.; Geng, M.; Ji, J.; Sioma, C.; Wang, S.; and Zhang, X. LC/GC 2001, 19(2), 200-213. Geng, M.; Zhang, X.; Bina, M.; and Regnier, F. E. J. Chromatogr. B 2001, 752, 293-306.

  13. In-Gel Stable Isotope Labelinga sample gel based platform • Avoiding gel-to-gel variability • Only labeling K-containing peptides • Accurate quantification d) Asara, J. M.; Zhang, X.; Zheng, B.; Christofk, H. H.; Wu, N.; Cantley, L. C. Nature Protocols, 2006, 1, 46-51.. Asara, J. M.; Zhang, X.; Zheng, B.; Christofk, H. H.; Wu, N.; Cantley, L. C. J. Proteome Res., 2006, 5, 155-163. Ji, J.; Chakraborty, A.; Geng M.; Zhang, X.; Amini, A.; Bina, M.; and Regnier, F. E. J. Chromatogr. A 2000, 745, 197-210.

  14. Roadmap Systems Biology Differential omics • Experimental design • Molecular identification • protein identification • metabolite identification • Data preprocessing • Statistical significance test • Pattern recognition • Molecular networks

  15. Protein Identificationdatabase searching The database searching approach uses a protein database to find a peptide for which a theoretically predicted spectrum best matches experimental data. Protein Peptide Mass matched peptide

  16. Protein Identificationdatabase searching • Sequest • Spectrum Mill • Mascot • X! Tandem • OMSSA More than 20 algorithms have been developed. • About 20% of tandem ms spectra could provide confident peptide identification • < 50% of peptides can be identified by all algorithms Zhang, X.; Oh, C.; Riley, C. P.; Buck, C. Current Proteomics 2007, 4, 121-130.

  17. Protein Identificationde novo sequencing de novo sequencing reconstructs the partial or complete sequence of a peptide directly from its MS/MS spectrum. Performance of de novo method is limited by low mass accuracy, mass equivalence, and completeness of fragmentation. Pevtsov, S.; Fedulova, I.; Mirzaei, H.; Buck, C.; Zhang, X. Journal of Proteome Research. 2006, 5, 3018-3028. Fedulova, I.; Ouyang, Z.; Buck, C.; Zhang, X. The Open Spectroscopy Journal 2007, 1, 1-8.

  18. Incorporating Peptide Separation Information for Protein Identificationstructure of pattern classifier Oh, C.; Zak, S. H.; Mirzaei, H.; Regnier, F. E.; Zhang, X. Bioinformatics 2007, 23, 114-118.

  19. Training the ANNs with Generic Algorithm Oh, C.; Zak, S. H.; Mirzaei, H.; Regnier, F. E.; Zhang, X. Bioinformatics 2007, 23, 114-118.

  20. Protein Identification Using Multiple Algorithms and Predicted Peptide Separation in HPLCPIUMA architecture Oh, C.; Zak, S. H.; Mirzaei, H.; Regnier, F. E. and Zhang, X. Bioinformatics, 2007, 23, 114-118. Zhang, X.; Oh, C.; Riley, C. P.; Buck, C. Current Proteomics 2007, 4, 121-130.

  21. Roadmap Systems Biology Differential omics • Experimental design • Molecular identification • Data preprocessing • Spectrum deconvolution • Quality control • Alignment • Normalization • Statistical significance test • Pattern recognition • Molecular networks

  22. Spectrum DeconvolutionGISTool, single sample analysis • To differentiate signals arising from the real analytes as opposed to signals arising from contaminants or instrument noise • To reduce data dimensionality, which will benefit down stream statistical analysis. Functionality • Smoothing and centralization • Peak cluster detection • Charge recognition • De-isotope • Peak identification at LC level • Doublet recognition • Doublet quantification

  23. GISTool AlgorithmDeconvoluting MS spectra 748.6354 3+ 748.9694 2+ Single sample analysis Zhang, X.; Hines, W.; Adamec, J.; Asara, J.; Naylor, S.; and Regnier, F. E. J. Am. Soc. Mass Spectrom. 2005, 16, 1181-1191.

  24. Quality Assessment / Control • Biological Sample QA/C • protein assay • Experimental Data QA/C • 2D K-S test • Percentile of detected peaks • Percentile of aligned peaks • Retention time variance vs. retention time • m/z variance vs. retention time • Frequency distribution of RT & m/z variance Zhang, X.; Asara, J. M.; Adamec, J.; Ouzzani, M.; and Elmagarmid, A. K. Bioinformatics, 2005, 21, 4054-4059.

  25. Data Alignment To recognize peaks of the same molecule occurring in different samples from the thousands of peaks detected during the course of an experiment. • MS to MS data alignment • MS to MS/MS data alignment • Referenced alignment • Blind alignment • Quality depending on the information of peak detection • Depends on experimental design

  26. LC-MS Data AlignmentXAlign software for proteomics & metabolomics data • Detecting median sample • Aligning samples to the median sample Mj =  Ii,jMi,j /  Ii,j Tj =  Ii,jTi,j /  Ii,j s Di =  |Ti,j-µj| j=1 Zhang, X.; Asara, J. M.; Adamec, J.; Ouzzani, M.; and Elmagarmid, A. K. Bioinformatics, 2005, 21, 4054-4059.

  27. Chromatogram of Serum Analyzed on GCGC/TOF-MS GCxGC-MS Data Alignment metabolite component of human serum • Four dimension • 1535 peaks have been detected

  28. GCxGC/TOF-MS Data Alignment MSort software for metabolomics • Criteria for alignment • 1st dim. rt • 2nd dim. rt • spec. correlation • Features • *peak entry merging • *cont. exclusion Oh, C.; Huang, X.; Buck, C.; Regnier, F. E. and Zhang, X. J. Chromatogr. A. 2008, 1179, 205-215

  29. Analysis Results of MAlign53 standard acids • 8 [OA + FA] samples and 8 [AA + FA] samples • derivatization reagent: (N-Methyl-N-t-butyldimethylsilyl)-trifluoroacetamide (MTBSTFA) Oh, C.; Huang, X.; Buck, C.; Regnier, F. E. and Zhang, X. J. Chromatogr. A. 2008, 1179, 205-215

  30. Normalization To reduce concentration effect and experimental variance to make the data comparable. • Methods • Log linear model xij = ai rj  eij • Reference sample normalization • Auto-scaling • Constant mean / trimmed constant mean • Constant median / trimmed constant median

  31. CV Distribution of Peak Intensitieshuman serum sample 20.7% Log linear model: xij = ai rj  eij log(xij) = log(ai) + log(rj) + log(eij) 17.3%

  32. Roadmap Systems Biology Differential omics • Experimental design • Molecular identification • Data preprocessing • Statistical significance test • Pattern recognition • Molecular networks

  33. Statistical Significance Tests To find individual peaks for which there are significant differences between groups. Methods • Pair-wise t-test (diff. mean?) • Mann-Whitney U test (diff. median?) • Kolmogorov-Smirnov test (diff. population?) • Kruskal-Wallis analysis of variance

  34. Statistical Significance Testsmetabolome of great blue heron fertilized eggs contaminated by PCBs PCBs: polychlorinated biphenyls down-regulated up-regulated fold change = I_c / I_n blue line: p=0.05 dashed line: fold change = 0

  35. Roadmap Systems Biology Differential omics • Experimental design • Molecular identification • Data preprocessing • Statistical significance test • Pattern recognition • Molecular networks

  36. Clustering or Classification Resulting pattern recognition provides the first glimpse of improvement in understanding the underlying biology. Unsupervised Methods Principle component analysis (PCA) Linear Discriminant Analysis (LDA) Clustering objects on subsets of attributes (COSA) Supervised Methods Support vector machine (SVM) Artificial neural network (ANN)

  37. Cross Species Comparison 27 of the 28 control humans and all 8 control rats cluster to one group 11 of the 14 diseased human and all diseased rats cluster to second group

  38. Differential Metabolomics of Human Bloodbreast cancer samples vs. control samples

  39. Differential Metabolomics of Human Bloodbreast cancer samples vs. control samples

  40. Roadmap Systems Biology Differential omics • Experimental design • Protein identification • Data preprocessing • Statistical significance test • Pattern recognition • Molecular networks • correlation network • interaction network • regulation network • pathway analysis

  41. Molecular Correlation Analysispair wised correlation of proteins and metabolites Healthy Diseased A B C D … Z A B C D … Z A B C D … Z A B C D … Z A B C D … Z A B C D … Z A B C D … Z A B C D … Z S1 S2 S4 S5 S6 S7 S8 S3

  42. Molecular Correlation Networkan example of drug effect on disease state • Reveal important relationships among the various components • Complimentary to abundance level information • Provides information about the biochemical processes underlying the disease or drug response Clish, C. B.; Davidov, E.; Oresic, M.; Plasterer, T.; Lavine, G.; Londo, T. R.; Meys, M.; Snell, P.; Stochaj, W.; Adourian, A.; Zhang, X.; Morel, N.; Neumann, E.; Verheij, E.; Vogels, J, T.W.E.; Havekes, L. M.; Afeyan, N.; Regnier, F. E.; Greef, J.; Naylor, S. Omics: A Journal of Integrative Biology 2004, 8, 3­-13.

  43. SysNet: Interactive Visual Data Mining of Molecular Correlation Network An interactive integration and visualization environment for molecular correlation of ‘omics data. • Integrating molecular expression information generated in different ‘omics • Visualizing molecular correlation in interactive mode • Enabling time course data visualization and analysis • Automatically organizing molecules based on their expression pattern in time course. Zhang, M.; Ouyang, Q.; Stephenson, A.; Salt, D.; Kane, D. M.; Burgner J.; Buck, C. and Zhang, X. BMC Systems Biology. Accepted by BMC Systems Biology.

  44. Biomarker Verification • Wet-lab verification • AQUA • MRM • Antibody • In-silico verification • tracing lineage • pathway analysis

  45. Automated Lineage Tracing • Interested in identifying the connections between input and output data for a program • Tracing of fine-grained lineage through run-time analysis • Developed based on dynamic slicing techniques used in debugging • Applicable to any arbitrary function Analysis Software Lineage Tracing Zhang, M.; Zhang, X.; Zhang, X. andPrabhakar, S. 33rd International Conference on Very Large Data Bases (VLDB 2007), 2007.

  46. Summary • Informatics platform developed in my group can be used to analyze protein and metabolite profiling data to differentiate disease and normal samples for biomarker discovery • Groups identified using clustering analysis reflected the phenotypic categories of cancer and control samples, the animal and human subjects, etc. with high degree of accuracy • The application of SysNet using an interactive visual data mining approach integrates omics data into a single environment, which enables biologists performing data mining • Lineage tracing technology is an efficient and effective approach for in-silico biomarker verification. This technique will significantly reduce the false discovery rate (FDR) of biomarker discovery

  47. Acknowledgements Irina Fedulova Dr. Hamid Mirzaei Dr. Cheolhwan Oh Sergey E. Pevtsov Ouyang Qi Alan Stephenson Mingwu Zhang Dr. John Burger Dr. Michael D. Kane Dr. Fred E. Regnier Dr. David Salt Dr. Mohammad Sulma Dr. Daniel Raftery Dr. Sunil Prabhakar Dr. David Clemmer Dr. John Asara Dr. Mu Wang Dr. Jake Chen Dr. Steve Valentine Dr. Steve Naylor

  48. Postdoc Positions Posting Title:Industrial Postdoctoral Fellow - Bioinformatician Work Location:University of Louisville, KY Job Type:Full time Starting Date:Position immediately available Job Description: Predictive Physiology and Medicine (PPM) Inc. is an exciting health and life sciences company based in Bloomington, Indiana focused on developing analytical systems for the individualized health and wellness industry. We have an immediate opening for a postdoctoral fellow. The successful candidate will develop bioinformatics systems for mass spectrometry based quantitative proteomics and metabolomics. Requirements: The position requires a bioinformatician with strong computational background. Priority will be given to the candidate with a PhD in bioinformatics, computer science, statistics, engineer, or computational physics. The successful candidate should have strong understanding of statistics and pattern recognition. Programming skills using Matlab, Microsoft .NET, or Java to accomplish analyses is required. Experience in analyzing biological data is not required; however, interest in multidisciplinary research is a must.

More Related