220 likes | 520 Vues
Cross-site and Cross-platform Concordance of Microarray Analysis Improved by Variance Stabilization. Pan Du, Simon Lin Robert H. Lurie Comprehensive Cancer Center. Outline. Why Variance Stabilization? How to Stabilize Variance? Illumina Affymetrix Does it work?.
E N D
Cross-site and Cross-platform Concordance of Microarray Analysis Improved by Variance Stabilization Pan Du, Simon Lin Robert H. Lurie Comprehensive Cancer Center
Outline • Why Variance Stabilization? • How to Stabilize Variance? • Illumina • Affymetrix • Does it work?
Introduction of Microarray Studies normal cancer A A Array x Array y Array x Array y Quality Control Studies Biomedical Applications (Johnson and Lin, Nature 411:885, 2001)
Lab A Lab B Gene list B Gene list A Anything in common? ideal 100 better % in common worse number of genes selected Evaluation criterion of reproducibility: Concordance • FDA-led Quality Control Study • cross-time • cross-site • cross-platform (Tong et al., Nature Biotech 24:1132, 2006)
Sample preparation Microarray experiment and data collection Background adjustment Transformation Normalization Gene identification General Microarray Analysis Procedure (log2)
Why Variance Stabilization? Ideal raw x log2 (x) log2 (x+offset) x-y plot mean-var plot
Why do we care? • A general assumption of statistical tests to microarray data: variance is independent of intensity Gene A: 7 (normal) → 8 (cancer) Gene B: 13 (normal) → 14 (cancer)
Variance Stabilization: the model • A mathematical model of microarray hybridization (Rocke and Durbin, Bioinformatics 19:996, 2003)
Variance Stabilization: deriving h(y) • Asymptotic variance-stabilizing transformation can be achieved by (Tibshirani, JASA, 1988)
Huber’s Solution (2002) • VSN (Variance Stabilizing Normalization) • Estimate the mean and variance from a set of arrays • Assume most genes are not differentially expressed • Technically challenging because the normalization between arrays has to be considered • Practically challenging because usually we have only 2 ~ 6 arrays (Huber et al., Bioinformatics, 2002)
Illumina BeadArray Technology Larger than 30 technique replicates are on each array. Beads are randomly assembled and held in these microwells Multiple arrays on the same slide Cost: < $200
Variance Stabilizing Transformation (VST) Fit the relations between mean and standard deviation Relations between log2 and VST (arcsinh) (Lin, Pan, Huber, and Warren, 2007)
Evaluation Data Sets • Barnes data: (Barnes, M., et al., 2005) • measured a dilution series (two replicates and six dilution ratios) of two human tissues: blood and placenta. • MAQC-I: (Shippy, R., et al., 2006) • Similar dilution series, conducted at more than one microarray facilities using both Illumina and Affymetrix platforms
Cross-site concordance evaluation MAQC data VST improves the cross-site concordance
VST for Affymetrix • Hypothesis: VST also works for Affymetrix arrays • Treat each pixel as a technical replicate • Model the mean and variance the same way
Cross-platform: Affymetrix and Illumina • Evaluation procedure • Comparing sample C and D in the MAQC study • The probe ids were first mapped to the Entrez IDs. • Legend notation • “Current”: RMA (affymetrix), Log2+Quantile (Illumina) • “Improved”: VST+RMA (affymetrix); VST+Quantile
Bioconductor lumi package • The VST and related algorithms are included in the Bioconduction lumi package • Bioconductor: http://www.bioconductor.org
Acknowledgements • Robert H. Lurie Comprehensive Cancer Center, Northwestern University • Warren A. Kibbe and other members in the Bioinformatics group • Denise Scholtens, Biostatistics • European Bioinformatics Institute • Wolfgang Huber • The Walter and Eliza Hall Institute of Medical Research, Australia • Gordon Smyth