540 likes | 688 Vues
Advancing Statistical Analysis of Multiplexed MS/MS Quantitative Data with Scaffold Q+. Brian C. Searle and Mark Turner Proteome Software Inc. Vancouver Canada, ASMS 2012. Creative Commons Attribution. Reference. 114. 115. 116. 117. Ref. Ref. 114. 114. 115. 115. 116. 116. 117.
E N D
Advancing Statistical Analysis of Multiplexed MS/MS Quantitative Data with Scaffold Q+ Brian C. Searle and Mark Turner Proteome Software Inc. Vancouver Canada, ASMS 2012 Creative Commons Attribution
Reference 114 115 116 117
Ref Ref 114 114 115 115 116 116 117 117 ANOVA Oberg et al 2008 (doi:10.1021/pr700734f)
“High Quality” Data • Virtually no missing data • Symmetric distribution • High Kurtosis
“Normal Quality” Data • High Skew due to truncation • >20% of intensities are missing in this channel! • Either ignore channels with any missing data (0.84 = 41%) …
“Normal Quality” Data …Or deal with a very non-Gaussian data!
Contents • A Simple, Non-parametric Normalization Model • Refinement 1: Intelligent Intensity Weighting • Refinement 2: Standard Deviation Estimation • Refinement 3: Kernel Density Estimation • Refinement 4: Permutation Testing
Additive Effects on Log Scale • Experiment: sample handling effects across MS acquisitions (LC and MS variation, calibration etc) • Sample: sample handling effects between channels (pipetting errors, etc) • Peptide: ionization effects • Error: variation due to imprecise measurements Oberg et al 2008 (doi:10.1021/pr700734f)
Median Polish “Non-Parametric ANOVA” Remove Inter-Experiment Effects Remove Intra-Sample Effects 3x Remove Peptide Effects
Linear Intensity Weighting Low Intensity, Low Weight High Intensity, High Weight
Desired Intensity Weighting Most Data, High Weight Saturated Data, Decreased Weight Low Intensity, Low Weight
Estimate Confidence from Protein Deviation • Pij = 2 * cumulative t-distribution(tij), where i = raw intensity bin j = each spectrum in bin i = protein median for spectrum j tij = • Pi =
Data Dependent Intensity Weighting Most Data, High Weight Saturated Data, Decreased Weight Low Intensity, Low Weight
Desired Intensity Weighting Most Data, High Weight Saturated Data, Decreased Weight Low Intensity, Low Weight
Data Dependent Intensity Weighting Most Data, High Weight Low Intensity, Low Weight
Algorithm Schematic Remove Inter-Experiment Effects Remove Intra-Sample Effects Data Dependent Intensity Weighting 3x Remove Peptide Effects
Standard Deviation Estimation i = intensity bin j = each spectrum in bin i = protein median for spectrum j
Algorithm Schematic Remove Inter-Experiment Effects Remove Intra-Sample Effects Data Dependent Intensity Weighting 3x Remove Peptide Effects Data Dependent Standard Dev Estimation
Kernel Density Estimation 0.3 shift on Log2 Scale Deviation that shifts distribution
Improved Kernels • We have a better estimate for Pi: the intensity-based weight! • We have a better estimate for Stdevi: the intensity-based standard deviation!
Improved Kernel Density Estimation Significant Deviation Worth Investigating Unimportant Deviation
Improved Kernel Density Estimation 1.0 shift on Log2 Scale = 2 Fold Change
Why Use Permutation Testing? • Why go through all this work to just use a t-test or ANOVA? • Ranked-based Mann-Whitney and Kruskal-Wallis tests “work”, but lack power
Basic Permutation Test T=4.84
Basic Permutation Test T=4.84 T=1.49
Basic Permutation Test x1000 T=4.84 T=1.49 T=1.34 T=1.14
Basic Permutation Test 950 below 50 above
Improvements… • N is frequently very small • Instead of randomizing N points, randomly select N points from Kernel Densities • Expensive! What if you want more precision?
Extrapolating Precision 1000 below 0 above Actual T-Statistic of 6.6? Last Usable Permutation
Extrapolating Precision Actual T-Statistic of 6.6? Knijnenburg, et al 2011 (doi:10.1186/1471-2105-12-411)