Proteomics Data analysis

Proteomics Data analysis Sanjoy Dey

iTRAQ labeling

Isobaric labeling

Case Study1 • Goal: • biosignature of alveolar epithelial cell repairand migration. • This can shed light on lung injury. • Molecular networks and pathways to physiological states. • identify molecular targets for drug design

Case Study1 • Experimental design: • isolated alveolar epithelial cells comparing lung injury and recovery. • hyperoxia exposure in first three time step. • the recovery in room air stage. • iTRAQ labeling with respect to a control without any hyperoxia exposure. • Two sets of protein: Soluble and membrane. • Three duplicate sets for each of the runs.

Case Study1 Proteins • - 740 proteins. with 293 in the soluble and 609 in the insoluble fraction. • passed false discovery rate of 1% with a fold-change threshold (>1.2) and an associated P-value of at least 0.05 by ProteinPilot3. • 74 proteins for soluble and 294 for insoluble.

Data quality evaluation • Computed the hamming distance/similarity of the 72 soluble proteins. • Binarized the data with fold change>1

Data quality evaluation • Data agrees between the runs qualitatively but not quantitatively. • May be different environmental affects lead to different experimental bias.

Some groups of biomarkers Panel A Panel B Figure: A cluster of 25 proteins included ten proteins with greater than 1.5 fold change at 36 hours of recovery (Panel A). These proteins participate in different cellular processes (Panel B).

Case study2: Enhancing Prostate Cancer Diagnosis • Current Approach: The current diagnosis technique using Prostate Specific Antigen (PSA) has several limitations: • Lack of specificity • Cancer missed in biopsy negative patients • Inability to distinguish aggressive and latent prostate cancer • Our Approach:finding proteomic biomarkers for prostate cancer which can be used for improving the specificity of current diagnosis technique. • Can take advantage of field affect of tumor • Can detect malignancy associated changes in normal cell • Can discover novel proteins that are altered by cancer and/or related to Gleason grade

Frozen tissue blocks of 7 prostates was collected. • • Identified four tissue areas of interest: • 1.Cancer(Ca) • 2.Benign close to cancer.(BN) • 3.Benign distant to cancer.(BD) • 4.Benign prostatic hyperplasia. (BPH)

Mass Spec analysis • We performed 2D Liquid Chromatography. • 8-plex iTRAQ labeling scheme. • Pepetide identification through mass-spectroscopy(MS/MS) • Protein Pilot 3 software was used for protein quantification with p-value< 0.05 (FDR corrected.)

Methods Challenges: • Somewhat different proteins are identified in different iTRAQ runs • Contains relative abundances rather than absolute values • Many pairs of comparison, e.g., BPH vs Ca, BPH vs. BN, etc. • Extremely low sample size (n<<p) • Missing values for most of the proteins Method: • Biomarker discovery was conducted for each pair of four regions • Data was normalized to reduce the sample variation • Both parametric(one-sided t-test) and non-parametric(sign-rank) hypothesis tests • Correction was made for multiple hypothesis corrections using FDR with p-value<0.05 • Missing value imputations using K-nearest neighbor impute algorithm

BN Ca BD BPH Results Cardiovascular System Development and Function, Organism Development, Tissue Morphology

Summary on iTRAQ data analysis • Data quality is not great. • Variability among different runs. • Some proteins are inherently abundant. (Wang et al. Proteomics 09) • Extremely low sample size. • Statistical power is low. • Finding interaction between proteins is hard. • Contains only relative abundance rather than absolute abundance. • Prior knowledge about the pathway from other sources can be incorporated.

Michael Wilson Chris wendt Pratik D. Jagtap LeeAnn Higgins Lorraine Anderson Maneesh Bhargava Trisha L. Becker Gaurav Pandey Acknowledgement References • Oberg, Ann et al. Statistical design of quantitative mass spectrometry-based proteomic experiments. J Proteomic Research 2009. • Roy, P. et al. Protein mass spectra data analysis for clinical biomarker discovery: a global review. Briefings in Bioinformatics 2010. • de Jong, E.P et al. Quantitative proteomics reveals myosin and actin as promising saliva biomarkers for distinguishing pre-malignant and malignant oral lesions. Plos ONE 2010. • Hill EG et al.. A statistical model for iTRAQ data analysis. J Proteome Res 2008. • Liu J, et al. (2008). Bayesian mass spectra peak alignment from Mass charge ratios. Cancer Informatics. • Machine learning methods for predictive proteomics. A Barla et al. Briefings in Bioinformatics 2008. • Generally detected proteins in comparative proteomics - A matter of cellular stress response? Wang et al. Proteomics 2008.

Questions/Comments Thanks!

Proteomics Data analysis

Proteomics Data analysis

Presentation Transcript

Proteomics

Proteomics

Proteomics Analysis and integration of large-scale data sets

Proteomics

Proteomics

Proteomics

Proteomics

Proteomics

Proteomics

Analysis of shotgun proteomics datasets

Proteomics

Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data

Proteomics

EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics

XML Standards for Proteomics Data

Proteomics

proteomics

Proteomics

Annotating genomes using proteomics data

Proteomics

Bottom-Up Proteomics Data collection