Optimizing Two-Color Microarray Reference RNA Designs

Two-Color Microarrays: Reference Designs and Reference RNAs. Kathleen Kerr Department of Biostatistics University of Washington Collaborators: Kyle Serikawa, Mette Peters, Caimiao Wei, Roger Bumgarner

“Reference Design” “Loop Design”

Advantages: Reference Design • Simple; easy to execute • (Relatively) easy to analyze • If a “tonsil” RNA is used as the reference RNA in a reference design, then measurements on other RNAs can be considered to be measured in “tonsil” units

What goes here?

Some previous work on reference RNAs • Gorreta et al, Biotechniques, 2004. • He et al, Biotechniques, 2004. • Novoradovskaya et al, BMC Genomics, 2004. • All assert that a “good” reference RNA gives strong signal for all the genes on the array: most genes expressed “above background” in the reference.

This assertion is based on the conventional wisdom that signals “near background” are unreliable • (“Unreliable” may overstate the case) • Consider: the popular methods of data normalization all assume that co-hybridized RNAs are “not too different”

Method Validation • Ideally, methods should be evaluated in terms of how well they answer a scientific question of interest. • Analogous to using clinically relevant endpoints in clinical trials rather than surrogates. • The “proportion of spots above background” does not satisfy this ideal.

How to validate reference RNAs? • A better (though still not ideal) criterion is to evaluate whether a comparison of RNAs made through a reference matches the comparison that would have been achieved through direct comparison. • If the estimates agree well, then the results are not “reference-specific”

Experimental Design

3 “Test” RNA pairs: • Placenta • Assumed to be most similar to placenta reference • Kidney • A component of commercial reference • Lung • Not a component of commercial reference

3 “Test” RNA pairs: • Placenta • Kidney • Lung • Predictions • Placenta reference will work best for the placenta test pair • Commercial reference will work better for the kidney test pair than the lung test pair • Pool reference will work well overall

3 “Test” RNA pairs: • Placenta • Kidney • Lung • 3 Reference RNAs: • Placenta • Commercial • Pool • How did we do on our predictions? • Predictions • Placenta reference will work best for placenta test pair • Commercial reference will work better for the kidney test pair than the lung test pair • Pool reference will work well overall • Predictions 1 & 3 were born out; prediction 2 was not. • However, the main result was that choice of reference RNA did not matter as much as we thought.

Compare: data with background subtraction

Compare: data without intensity-based normalization

Compare: data with background subtraction, No intensity-based normalization

Concordance for Low-intensity Genes

-Indirect and direct log-ratios are in reasonable agreement for the vast majority of these genes -Some low-intensity genes are reproducibly measured as differentially expressed -Most “highly discrepant genes” are NOT picked out by flagging low-intensity genes Conclusion: discarding data from low-intensity spots is a very crude filter

Conclusion: Measurements on low-intensity genes are less reliable but not unreliable *Moving average for low intensity-genes only *Moving average for all genes

Is this a reasonable way to evaluate reference RNAs? • Concordance between reference-based log-ratios and direct log-ratios means that the comparison through the reference is not “reference-specific”

The evaluation is based on a kind of reproducibility, which is not the same as accuracy. However: • A much stronger kind of reproducibility than just reproducibility among technical replicates. • Though not sufficient, good reproducibility is necessary for low error • The best kind of “accuracy” with microarray measurements is an open issue. • Opinion: For most intents and purposes, a low-variance, biased estimate of the log-ratio is preferable to a high-variance, unbiased estimate • The results here suggest that, on average, microarrays give an estimate of log-ratios that are proportional to the true log-ratios • Shi et al, BMC Bioinformatics, 2005

Summary/Conclusions • To date, evaluations of reference RNAs • Have used criteria distant from scientific objectives • Have not accounted for the assumptions invoked in the popular methods of data normalization • We found results to be robust to choice of reference • Is there such a thing as a “universal” reference? • The concordance between our indirect and direct logratios is encouraging, but the “linearity” issue needs more attention

Kerr KF, Serikawa, Wei, Peters, Bumgarner (2007). What is the best reference RNA? And other questions regarding the design and analysis of two-color microarray experiments. OMICS 11:152-65. (Pre-print at www.bepress.com/uwbiostat ) Many Thanks to the Conference Organizers!

Optimizing Two-Color Microarray Reference RNA Designs