1 / 49

Variances are Not Always Nuisance Parameters

Variances are Not Always Nuisance Parameters. Raymond J. Carroll Department of Statistics Texas A&M University http://stat.tamu.edu/~carroll. Dedicated to the Memory of Shanti S. Gupta. Head of the Purdue Statistics Department for 20 years I was student #11 (1974). Palo Duro

samantha
Télécharger la présentation

Variances are Not Always Nuisance Parameters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Variances are Not Always Nuisance Parameters Raymond J. Carroll Department of Statistics Texas A&M University http://stat.tamu.edu/~carroll

  2. Dedicated to the Memory of Shanti S. Gupta • Head of the Purdue Statistics Department for 20 years • I was student #11 (1974)

  3. Palo Duro Canyon, the Grand Canyon of Texas West Texas  East Texas  Wichita Falls, my hometown Guadalupe Mountains National Park College Station, home of Texas A&M University I-45 Big Bend National Park I-35

  4. Overview • Main point: there are problems/methods where variance structure essentially determines the answer • Assay Validation • Measurement error • Other Examples mentioned briefly • Logistic mixed models • Quality technology • DNA Microarrays for gene expression (Fisher!)

  5. Variance Structure • My Definition: Encompasses • Systematic dependence of variability on known factors • Random effects: their inclusion, exclusion or dependence on covariates • My point: • Variance structure can be important in itself • Variance structure can have a major impact on downstream analyses

  6. Collaborators on This Talk David Ruppert also works with me outside the office • Statistics:David Ruppert • Assays: Marie Davidian, Devan Devanarayan, Wendell Smith • Measurement error: Larry Freedman, Victor Kipnis, Len Stefanski

  7. Acknowledgments Matt Wand Peter Hall Alan Welsh Xihong Lin (who nominated me!) Naisyin Wang Mitchell Gail

  8. Assay Validation • Immunoassays: used to estimate concentrations in plasma samples from outcomes • Intensities • Counts • Calibration problem: predict X from Y • My Goal: to show you that cavalier handling of variances leads to wrong answers in real life • David Finney: anticipates just this point

  9. Assay Validation David Finney is the author of a classic text • “Here the weighted analysis has also disclosed evidence of invalidity” • “This needs to be known and ought not to be concealed by imperfect analysis”

  10. Assay Validation Wendell Smith motivated this work • Assay validation is an important facet of the drug development process • One goal: find a working range of concentrations for which the assay has • small bias (< 30% say) • small coefficient of variation (< 20% say)

  11. These data are from a paper by M. O'Connell, B. Belanger and P. Haaland Journal of Chemometrics and Intelligent Laboratory Systems (1993) Assay Validation The Data

  12. Assay Validation Unweighted and Weighted Fits • Main trends: any method will do • Typical to fit a 4 parameter logistic model

  13. Assay Validation: Unweighted Prediction Intervals

  14. Assay Validation David Rodbard (L) and Peter Munson (R) in 1978 proposed the 4-parameter logistic for assays • The data exhibit heteroscedasticity • Typical to model variance as a power of the mean • Most often:

  15. Assay Validation: Weighted Prediction Intervals Marie Davidian and David Giltinan have written extensively on this topic

  16. Assay Validation: Working Range • Goal: predict X from observed Y • Working Range (WR): the range where the cv < 20% • Validation experiments (accuracy and precision): done on working range • If WR is shifted away from small concentrations: never validate assay for those small concentrations • No success, even if you try (see %-recovery plots)

  17. Assay Validation: Variances Matter No weighting: LQL=1,057: UQL=9,505 Weighting, LQL=84, UQL=3,866 UQL LQL UQL LQL

  18. Working Ranges for Different Variance Functions Unweighted Weighted LQL = 84 UQL = 3,866

  19. Assay Validation: % Recovery Devan Devanarayan, my statistical grandson, organized this example • Goal: predict X from observed Y • Measure: = % recovered • Want confidence interval to be within 30% of actual concentration

  20. Assay Validation: % Recovery • Note Acceptable ranges (IL-10 Validation Experiment) depend on accounting for variability Unweighted Weighted  

  21. Assay Validation: Summary • Accounting for changing variability is pointless if the interest is merely in fitting the curve • In other contexts, standard errors actually matter (power is important after all!) • The gains in precision from a weighted analysis can change conclusions about statistical significance • Accounting for changing variability is crucial if you want to solve the problem • Concentrations for which the assay can be used depend strongly on a model for variability

  22. The Structure of Measurement Error See Wayne Fuller’s 1987 text • Measurement error has an enormous literature • Hundreds of papers on the structure for covariates W = X + e • Here X = “truth”, W = “observed” • X is a latent variable

  23. The Structure of Measurement Error • For most regressions, if • X is the only predictor • W = X + e • then • biased parameter estimates when error is ignored • power is lost (my focus today)

  24. The Structure of Measurement Error • My point: the simple measurement error model is too simple W = X + e • A different variance structure suggests different conclusions

  25. The Structure of Measurement Error Ross Prentice has written extensively on this topic • Nutritional epidemiology: dietary intake measured via food frequency questionnaires (FFQ) • Prospective studies: none have found a statistically significant fat intake effect on breast cancer • Controversy in post-hoc power calculations: • what is the power to detect such an effect?

  26. Dietary Intake Data • The essential quantity controlling power is the attenuation • Let Q = FFQ, , X = “long-term dietary intake” • Attenuation l= • % of variation that is due to true intake • 100% is good • 0% is bad • slope of regression of X on Q • Sample size needed for fixed power can be thought of as proportional to l-2

  27. Post hoc Power Calculation Larry Freedman has done fundamental work on dietary instrument validation • FFQ: known to be biased • F: “reference instrument” thought to be unbiased (but much more expensive than Q) • F = X + e • F = 24-hour recall or some type of diary • Then l = slope of regression of F on Q

  28. Post hoc Power Calculation Walt Willett: a leader in nutritional epidemiology • If “reference instrument” is unbiased then • Can estimate attenuation • Can estimate mean of X • Can estimate variance of X • Can estimate power in the study at hand • Many, many papers assume that the reference instrument is unbiased in this way • Plenty of power

  29. Dietary Intake Data • The attenuationl ~= 0.30 for absolute amounts, ~= 0.50 for food composition • Remember, attenuation is the % of variability that is not noise • All based on the validity of the reference instrument F = X + e • Pearson and Cochran now weigh in

  30. The Structure of Measurement Error Karl Pearson • 1902: “On the mathematical theory of errors of judgment” • Interested in nature of errors of measurement when the quantity is fixed and definite, while the measuring instrument is a human being • Individuals bisected lines of unequal length freehand, errors recorded

  31. The Structure of Measurement Error Karl Pearson • FFQ’s are also self-report • Findings have relevance today • Individuals were biased • Biases varied from individual to individual

  32. Measurement Error Structure William G. Cochran • Classic 1968 Technometrics paper • Used Pearson’s paper • Suggested an error model that had systematic and random biases • This structure seems to fit dietary self-report instruments

  33. Measurement Error Structure: Cochran Fij = aF+ bFXij +rFi+ eFij rFi = Normal(0,sFr2) • We call rFi the “person-specific bias” • We call bFthe “group-level bias” • Similarly, for FFQ, Qij = aQ+ bQXij +rQi+ eQij rQi = Normal(0,sQr2)

  34. Measurement Error Structure • The horror: the model is unidentified • Sensitivity analyses • suggest potential that measurement error causes much greater loss of power than previously suggested • Needed: Unbiased measures of intake • Biomarkers • Protein via urinary nitrogen • Calories via doubly-labeled water

  35. Biomarker Data Victor Kipnis was the driving force behind OPEN • Protein: • Available from a number of European studies • Calories and Protein: • Available from NCI’s OPEN study • Results are stunning

  36. Biomarker Data: Attenuations • Protein (and Calories and Protein Density for OPEN)

  37. Biomarker Data: Sample Size Inflation • Protein (and Calories and Protein Density for OPEN)

  38. Measurement Error Structure • The variance structure of the FFQ and other self-report instruments appears to have individual-level biases • Pearson and Cochran model • Ignoring this: • Overestimation: of power • Underestimation: of sample size • It may not be possible to understand the effect of total intakes • Food composition more hopeful

  39. Other Examples of Variance Structure • Nonlinear and generalized linear mixed models (NLMIX and GLIMMIX) • Quality Technology: Robust parameter design • Microarrays

  40. Nonlinear Mixed Models • Mixed models have random effects • Typical to assume normality • Robustness to normality has been a major concern • Many now conclude that this is not that major an issue • There are exceptions!!

  41. Logistic Mixed Models Patrick Heagerty • Heagerty & Kurland (2001) • “Estimated regression coefficients for cluster-level covariates • Can be highly sensitive to assumptions about whether the variance of a random intercept depends on a cluster-level covariate”, • i.e., heteroscedastic random effects or variance structure

  42. Logistic Mixed Models • Heagerty (Biometrics, 1999, Statistical Science 2000, Biometrika 2001) • See also Zeger, Liang & Albert (1988), Neuhaus & Kalbfleisch (1991) and Breslow & Clayton (1993) • Gender is a cluster-level variable • Allowing cluster-level variability to depend on gender results in a large change in the estimated gender regression coefficient and p-value. • Marginal contrasts can be derived and are less sensitive • In the presence of variance structure, regression coefficients alone cannot be interpreted marginally

  43. Robust Parameter Design Jeff Wu and Mike Hamada’s text is an excellent introduction • “The Taguchi Method” • From Wu and Hamada: “aims to reduce the variation of a system by choosing the setting of control factors to make it less sensitive to noise variation” • Set target, optimize variance

  44. Robust Parameter Design • Modeling variability is an intrinsic part of the method • Maximizing the signal to noise ratio (Taguchi) • Modeling location and dispersion separately • Modeling location and then minimizing the transmitted variance • Ideas are used in optimizing assays, among many other problems

  45. Robust Parameter Design: Microarrays for Gene Expression R. A. Fisher • cDNA and oligo- microarrays have attracted immense interest • Multiple steps (sample preparation, imaging, etc.) affect the quality of the results • Processes could clearly benefit from robust parameter design (Kerr & Churchill)

  46. Robust Parameter Design: Microarrays • Experiment (oligo-arrays): • 28 rats given different diets (corn oil, fish oil and olive oil enhanced) • 15 rats have duplicated arrays • How much of the variability in gene expression is due to the array? • We have consistently found that 2/3 of the variability is noise • within animal rather than between animal

  47. Intraclass Correlations r in the Nutrition Data Set Simulated ICC for 8,000 independent genes with common r = 0.35 Estimated ICC for 8,000 genes from mixed models Clearly, more control of noise via robust parameter design has the potential to impact power for analyses

  48. Conclusion • My Definition: Variance Structure encompasses • Systematic dependence of variability on known factors • Random effects: their inclusion or exclusion • My point: • Variance structure can be important in itself • Variance structure can have a major impact on downstream analyses

  49. And Finally At the Falls on the Wichita River, West Texas • I’m really happy to be on the faculty at A&M (and to be the Fisher Lecturer!)

More Related