1 / 35

Robust microarrary experiments by design: a multiphase framework

Robust microarrary experiments by design: a multiphase framework. Chris Brien Phenomics & Bioinformatics Research Centre, University of South Australia. http://chris.brien.name/multitier. Chris.brien@unisa.edu.au. Outline. Phases in microarray experiments.

chibale
Télécharger la présentation

Robust microarrary experiments by design: a multiphase framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Robust microarrary experiments by design: a multiphase framework Chris Brien Phenomics & Bioinformatics Research Centre,University of South Australia http://chris.brien.name/multitier Chris.brien@unisa.edu.au

  2. Outline • Phases in microarray experiments. • Designing and analysing multiphase microarray experiments. • Randomization to avoid bias; • Minimizing variability. • Conclusions

  3. 1. Phases in microarray technologies Single-channel oligo Two-channel spotted Similar multi-step processes. Figures by Jiang Long (from The Science Creative Quarterly, URL: http://www.scq.ubc.ca/image-bank/).

  4. Speed et al.’s (INI, 2008) main phases in ‘omics studies. • Workshoppers were challenged to consider how design might be done in these phases.

  5. Taking up Speed et al.’s (2008) challenge: a multiphase framework • The multiphase framework is based on Brien et al. (2011). • They define a phase to be the period of time during which a set of units are engaged in producing a particular outcome. • The outcome can be material for processing in the next phase, or values for response variables, or both. • Only the final phase need have a response variable. • Also, one phase might overlap another phase. • Then, multiphase experiments consist of two or more such phases. • Generally, multiphase experiments randomize (randomly allocate, not randomly sample as asserted by some) the outcomes from one phase to the next phase. • For microarray experiments, while the need for randomization is often mentioned in general terms, how to actually deploy it is little discussed.

  6. Physical phases in a microarray experiment Selection or production of biological material Sample acquisition & storage RNA extraction Labelling Hybridization, incl. post-washing Extracts Samples Organisms/Tissues Aliquots Scanning Measure-ments Hybridized arrays Arrays • Potentially a designed experiment at every phase — not usually considered. Array production or purchase Design • Two-phase design (McIntyre, 1955; Kerr, 2003): • 1st phase can be an experiment or an observational (epidemiological) study; • 2nd phase is a laboratory phase (Brien et al., 2011). • Multiple randomizations (Brien & Bailey, 2006) in a seven-phase process. One list of phases, and their outcomes, that commonly occurs.

  7. A sources-of-variability Affymetrix microarray experiment Production of biological material Sample acquisition & storage RNA extraction Labelling Hybridization, incl. post-washing Mammary gland sample Extract Rat 2 Aliquots Scanning Measure-ments 4 Hybridized arrays 4 Affymetrix arrays Array purchase Each aliquot halved for hybridization Extract halved for labelling • A modified version of a study to examine the variability of labelling and hybridization described by Zakharkin et al. (2005): • It involved 8 rats; • The phases are:

  8. Single-set description of the experiment Production of biological material Sample acquisition & storage RNA extraction Labelling Hybridization, incl. post-washing Mammary gland sample Extract Rat 2 Aliquots Scanning Measure-ments 4 Hybridized arrays 4 Affymetrix arrays Array purchase Each aliquot halved for hybridization Extract halved for labelling • Need to identify a single set of factors that uniquely indexes the 32 observations: • {8 Rats, 2 Labellings, 2 Hybridizations} (others are possible) • The factors are nested so the ANOVA is that for a doubly-nested study. • Zakharkin et al. (2005) model: Y = Xm + ZRuR + ZL[R]uL[R] + eOR • Grand mean | Rats + RatsLabellings + Error, where terms to the left of ‘|’ are fixed and those to the right are random.

  9. 2. Designing and analysing multiphase microarray experiments Approach to be taken: • Identify the set of phases involved. • Consider how the outcome of one phase is to be assigned to the units in the next phase. • Produce factor-allocation description (Brien et al., 2011). • Formulate the full mixed model. • Derive the ANOVA table and use it to • investigate design; and • obtain a mixed model of convenience.

  10. 2(a) Randomization to avoid bias Production of biological material Sample acquisition & storage RNA extraction Labelling Hybridization, incl. post-washing Mammary gland sample Extract Rat 2 Aliquots Scanning Measure-ments 4 Hybridized arrays 4 Affymetrix arrays Array purchase • But, how is the outcome of one phase to be assigned to the units in the next phase? • Agree need to randomize to avoid bias. • Suppose take the simplest option: completely randomize every phase. Each aliquot halved for hybridization Extract halved for labelling Illustrate using the example experiment We have the set of phases involved.

  11. Sampling randomization Sampling Rat 1 8 2 6 3 5 4 4 5 1 6 2 7 7 8 3

  12. Extraction randomization Extract Sampling Rat 1 4 4 2 1 8 3 2 6 4 6 2 5 5 1 6 7 7 7 3 5 8 8 3

  13. Expansion to HalfExtracts Extract HalfExtr Sampling Rat 1 1 4 4 1 2 4 4 2 1 1 8 2 2 1 8 3 1 2 6 3 2 2 6 4 1 6 2 4 2 6 2 5 1 5 1 5 2 5 1 6 1 7 7 6 2 7 7 7 1 3 5 7 2 3 5 8 1 8 3 8 2 8 3

  14. Labelling randomization Labelling Extract HalfExtr Sampling Rat 1 8 2 8 3 2 5 1 5 1 3 5 2 5 1 4 6 1 7 7 5 7 1 3 5 6 6 2 7 7 7 4 2 6 2 8 1 2 4 4 9 3 2 2 6 10 8 1 8 3 11 4 1 6 2 12 2 1 1 8 13 7 2 3 5 14 2 2 1 8 15 1 1 4 4 16 3 1 2 6

  15. and eventually .......

  16. Scan randomization (1 – 24) Scan HybArray Chip LabellingHalfAliq Extract HalfExtr Sampling Rat 1 12 5 1 2 8 2 8 3 2 21 14 3 1 5 2 5 1 3 27 10 8 1 1 2 4 4 4 24 15 14 1 2 2 1 8 5 29 21 4 2 6 1 7 7 6 13 7 10 2 8 1 8 3 7 19 22 8 2 1 2 4 4 8 6 9 1 1 8 2 8 3 9 8 2 12 1 2 1 1 8 10 5 28 2 2 5 1 5 1 11 26 13 9 1 3 2 2 6 12 20 24 16 2 3 1 2 6 13 15 3 5 2 7 1 3 5 14 22 19 7 1 4 2 6 2 15 3 1 15 2 1 1 4 4 16 30 23 12 2 2 1 1 8 17 25 32 15 1 1 1 4 4 18 28 26 4 1 6 1 7 7 19 1 31 6 2 6 2 7 7 20 14 4 11 2 4 1 6 2 21 18 8 14 2 2 2 1 8 22 9 12 16 1 3 1 2 6 23 23 27 3 2 5 2 5 1 24 11 17 5 1 7 1 3 5 25 2 20 7 2 4 2 6 2 26 10 29 9 2 3 2 2 6 27 32 6 13 2 7 2 3 5 28 7 25 11 1 4 1 6 2 29 4 18 6 1 6 2 7 7 30 31 30 10 1 8 1 8 3 31 16 16 2 1 5 1 5 1 32 17 11 13 1 7 2 3 5

  17. Factor-allocation description of experiment Production of biological material Sample acquisition & storage RNA extraction Labelling Hybridization, incl. post-washing Mammary gland sample Factor-allocation diagrams have a panel for a set of objects (here the set of outcomes of a phase); a panel lists the factors indexing a set. Extract Rat 2 Aliquots Scanning Measure-ments 4 Hybridized arrays 4 Affymetrix arrays 8 rats 16 half-extracts 32 Scans 8 Extractions 2 HalfExtracts in E 2 AliquotHalves in L 8 Rats 8 Samplings Array purchase 32 half-aliquots 32 scans Each aliquot halved for hybridization Extract halved for labelling 8 samples 16 Labellings 32 hybridized arrays 32 Hybridizations 32 Chips (Brien et al., 2011) Arrow indicates randomization 32 arrays

  18. Mixed model for the experiment 2 AliquotHalves in L 8 rats 16 half-extracts 32 Scans 8 Extractions 2 HalfExtracts in E 8 Rats 8 Samplings 32 half-aliquots 32 scans 8 samples 16 Labellings 32 hybridized arrays 32 Hybridizations 32 Chips • To get mixed model use Brien & Demétrio’s (2009) method: • In each panel, form terms as all combinations of the factors, subject to nesting restrictions; • For each term from each panel, add to either fixed or random model. • Mixed model is: • Grand mean | Rats + Samplings + Extractions + ExtractionsHalfExtracts + Labellings + LabellingsAliquotHalves + Chips + Hybridizations + Scans. • Not all terms are estimable — use ANOVA to show this. 32 arrays

  19. ANOVA for the experiment 2 AliquotHalves in L 8 rats 16 half-extracts 32 Scans 8 Extractions 2 HalfExtracts in E 8 Rats 8 Samplings 32 half-aliquots 32 scans 8 samples 16 Labellings 32 hybridized arrays 32 Hybridizations 32 Chips 32 arrays

  20. ANOVA for the experiment (cont’d) • Shows can measure variability from: • Rats + Samplings + Extractions (biological replication); • ExtractionsHalfExtracts + Labellings; • LabellingsAliquotHalves + Chips + Hybridizations + Scans. • Last referred to as ‘Error’ by Zakharkin et al. (2005) — factor-allocation description shows that several sources. • The aim is a model in which all potential variability sources are identified, not one in which all are separately estimable. • Mixed model of convenience for fitting: • Grand mean | Rats + ExtractionsHalfExtracts + LabellingsAliquotHalves; • This model equivalent to that of Zakharkin et al. (2005).

  21. 2(b) Minimizing variability Desirable to minimize variability so that the variance of treatment estimates is as small as possible. What sources of variability in microarray experiments?

  22. Sources of variability • Three basic types of variability (Novak et al. (2002): • Technical: arising from processing a particular RNA source; • Physiological: cell differences arising from macroscopically identically conditions (replicate cell cultures); • Sampling: arising from different biological material (different tissue sections, organs, organisms). • The latter two are specific types of biological variability as they involve different RNA extractions. • Conclusion: var(technical)  var(physiological) < var(sampling)

  23. Sources of technical variability var(hybidization) > var(labelling) need to optimize hybridisation var(hybridization + scanning) > var(RNA extraction) For spotted microarrays, var (array production) >? var(labelling + hybridization) Clearly different parts of the microarray production process are more variable than others.

  24. Three possibilities for minimizing technical variability • As usual, three possibilities for this: • Stringent experimental protocols to reduce variability; • Experimental design to avoid variability; • Statistical analysis to adjust for variability.

  25. i) Stringent experimental protocols Have noted that hybridization phase is often more variable than other phases. Not surprising that Han et al. (2006) advocate the adoption of protocols that minimizing variability in the hybridization phase.

  26. ii) Experimental design • The need to take account of batches is often stressed. • Indeed, it is the only aspect of processing order that is dealt with specifically.

  27. Batching • MAQC_Sample_Processing_Overview_SOP stipulates: • All 20 target preparations (4 RNAs X 5 replicates) should be processed together starting on the same day by a single person; • If all hybridizations and downstream wash processes cannot be performed together, the samples should be distributed evenly by replicates. • Spruill et al. (2002) states: • Performing hybridizations over multiple days increases the risk of introducing more variation at this level. • Some general discussion in textbooks: • Table 4.1 on Scherer (2009) makes it clear that batch effects can occur in each of the phases.

  28. Batching (cont’d) • Generally, look for batches built into the processes: • e.g. different acquisition times, processing days, operators, batches of reagent or sets of simultaneously-processed specimens. • When have treatments, will want to block them according to batches.

  29. Design with batches for the sources-of-variability example experiment • Suppose that: • For each rat, the tissue is to be obtained and the RNA extracted immediately; i.e. RNA-extraction order from samples is not random. • In the labelling phase, one half-extract from all extracts will be labelled in one batch, and the remaining ones in a second batch. • This separation of duplicates will yield a better estimate of the labelling variability; consecutively processed replicates are likely to be more similar than units processed over the entire course of an experiment. • This batching will be carried through to the hybridization phase, where washing occurs in batches of 16. Further, for practical reasons, 2 half-aliquots from an aliquot are hybridized consecutively. • In the scanning phase, the arrays will be scanned in a completely random order • other possibilities are: (i) in the same order as produced and (ii) batched.

  30. Factor-allocation description 2 HalfExtracts in E 2 H1 2 AliquotHalves in B,L 16 half-extracts 32 Scans 8 Extractions 8 Rats 8 Samplings 32 half-aliquots 32 scans 8 samples 8 rats 2 Batches 8 Labellings in B 32 hybridized arrays 2 Occasions 8 Periods in O 2 Hybridizations in O,P 32 Chips • Mixed model: • Grand mean | Rats + Samplings + Extractions + ExtractionsHalfExtracts + Batches + BatchesLabellings + BatchesLabellingsAliquotHalves + Occasions + OccasionsPeriods + OccasionsPeriodsHybridizations + Chips + Scans • However, this model will not fit because terms are confounded. • Will use the ANOVA table to decide on a model that will fit. 32 arrays

  31. ANOVA & mixed model of convenience 2 HalfExtracts in E 2 H1 2 AliquotHalves in B,L • The only real effect of the systematic assignment of samplings is that a systematic trend in the extractions would reinforce one in the samplings. 16 half-extracts 32 Scans 8 Extractions 8 Rats 8 Samplings 32 half-aliquots 32 scans 8 samples 8 rats 2 Batches 8 Labellings in B 32 hybridized arrays 2 Occasions 8 Periods in O 2 Hybridizations in O,P 32 Chips • Mixed model of convenience: • Grand mean | Rats + Batches + BatchesLabellings + BatchesLabellingsAliquotHalves 32 arrays

  32. 3. Conclusions: Pros and cons of the approach • Cons • Requires extra planning and work to organize — “I randomized the rats at the start. Isn’t this enough?” • A lot of needless redundancy — “Such a lot of factors!” • Pros • Encourages consideration of appropriate design in all phases, even if ultimately it is decided not to randomize all phases. • As for all experiments, in a microarray experiment, • randomizing in a phase makes it robust to systematic biases in that phase. • Often processing order not considered, perhaps assuming no systematic change during a phase or processing a batch? • But is this tenable? Is it OK to process material from the same rat first in every phase, during operator or equipment warm-up in a phase? Randomization is insurance. • Blocking , based on batches, assists in minimizing variability. • Promotes the identification of all the sources of variability at play in a microarray experiment, even if not all are estimable.

  33. Overall summary • Microarray experiments are multiphase: • One might employ an experimental design in every phase to randomize and block the processing order in the current phase. • Factor-allocation description can be used to formulate the analysis for an experiment, this analysis including terms and sources from every phase. • The multiphase framework is flexible in that it can easily be adapted to another set of phases, including new technologies.

  34. References Brien, C.J., and Bailey, R.A. (2006) Multiple randomizations (with discussion). J. Roy. Statist. Soc., Ser. B, 68, 571–609. Brien, C.J. and Demétrio, C.G.B. (2009) Formulating mixed models for experiments, including longitudinal experiments. J. Agr. Biol. Env. Stat., 14, 253-80. Brien, C.J., Harch, B.D., Correll, R.L. and Bailey, R.A. (2011) Multiphase experiments with at least one later laboratory phase. I. Orthogonal designs. J. Agr. Biol. Env. Stat., available online. Han, Tao, et al. (2006) Improvement in the Reproducibility and Accuracy of DNAMicroarray Quantification by Optimizing Hybridization Conditions. BMC Bioinformatics, 7, S17-S29. Kerr, M. K. (2003) Design Considerations for Efficient and Effective Microarray Studies. Biometrics, 59(4), 822-828. MAQC Consortium (2009) MAQC Sample Processing Overview SOP, U.S.F.a.D. Administration. McIntyre, G. A. (1955). Design and analysis of two phase experiments. Biometrics, 11, 324-334. Novak, Jaroslav P., Sladek, Robert, and Hudson, Thomas J. (2002) Characterization of Variability in Large-Scale Gene Expression Data: Implications for Study Design. Genomics, 79, 104-113.

  35. References (cont’d) http://chris.brien.name/multitier Scherer, A. (2009) Batch effects and noise in microarray experiments : sources and solutions, in Wiley Series in Probability and Statistics. Wiley-Blackwell: Oxford. Speed, T. P. and Yang, J. Y. H. with Smyth, G. (2008) Experimental design in genomics, proteomics and metabolomics: an overview. Advanced Topics in Design of Experiments. Workshop held at INI, Cambridge, U.K. Spruill, S. E., et al. (2002) Assessing sources of variability in microarray gene expression data. Biotechniques, 33(4), 916-20, 922-3. Tu, Y., Stolovitzky, G., and Klein, U. (2002) Quantitative noise analysis for gene expression microarray experiments. Proceedings of the National Academy of Sciences, 99(22), 14031-14036. Zakharkin, Stanislav O., et al. (2005) Sources of variation in Affymetrix microarray experiments. BMC Bioinformatics, 6, 214-11. Web address for Multitiered experiments site:

More Related