120 likes | 222 Vues
Explore the challenges of estimating population means from skewed data, specifically in Health Economics, using a sample dataset of asthma patients' costs. Learn about Bayesian analysis, model comparisons, and various distributions for robust inference. Discover recommendations for modeling costs effectively in complex structures and utilizing prior knowledge for reliable results.
E N D
The problem with costs Tony O’Hagan CHEBS, University of Sheffield The 2003 CHEBS Seminar
A simple problem • Given a sample from a population, how can we estimate the mean of that population? • Sample mean? • Unbiased and consistent • But sensitive to extreme observations • In Health Economics • Costs are invariably very skewed • Can also arise with times to events and other kinds of data • And we really require inference about population means The 2003 CHEBS Seminar
A simple dataset • Costs incurred by 26 asthma patients in a trial comparing two inhalers • Patients who used pMDI and had no exacerbations The 2003 CHEBS Seminar
Some estimates & intervals • Sample mean (use CLT to justify normality) • Estimate 2104, 95% CI (-411,4619) • Bayesian analysis assuming normality (weak prior) • Posterior mean 2104, 95% credible interval (-411,4619) • Nonparametric bootstrap • Estimate 2104, 95% CI (298,4785) • Bayesian bootstrap • Posterior mean 2104, 95% credible interval (575,5049) • Bayesian analysis assuming lognormality • Posterior median 1112, 95% credible interval (510,3150) The 2003 CHEBS Seminar
Which is right? • Data appear to fit lognormal much better than normal • But many other distributions might visually fit well yet give completely different results • Results from analysis assuming normality are supported by bootstrap and by analysis based on CLT • These are well known to be robust methods • But bootstrapping the sample mean will always give the same estimate and will tend to back up the normal-theory analysis • And extreme skewness evident in the population suggests non-robustness of the sample mean • Real cost distributions won’t follow any standard form The 2003 CHEBS Seminar
The problem • The population mean depends critically on the shape of the tail • How can we learn about that tail from a small sample? • Or even quite a large one? The 2003 CHEBS Seminar
Bayesian model comparison • Bayes factors for the example data • Lognormal versus normal, 1028 • Lognormal versus square-root normal, 1012 • Lognormal is favoured over any other power transformation to normality • Lognormal versus gamma, 103 • This is far from conclusive • Distributions we can’t distinguish could still have completely different tails The 2003 CHEBS Seminar
Possible distributions • Normal – unrealistic, very thin tailed • Gamma – thin tailed (exponential) • Sample mean is MVUE • Lognormal – heavier tailed • Population mean exists but its posterior mean may not • Inverse gamma – heavy tailed (polynomial) • Population mean exists if enough degrees of freedom The 2003 CHEBS Seminar
Log-gamma, log-logistic – too heavy tailed? • Population mean never exists • Generalised Pareto – range of tail weights • Used in extreme value theory • Mixtures and chimeras • More flexible and realistic • Harder to fit • Bayesian methods essential The 2003 CHEBS Seminar
More complex structures • We nearly always wish to compare means • Extreme data can heavily influence comparison • Asthma dataset • We also often need to model costs in more complex ways • Components of costs • Covariates • Tail shape can again be very influential The 2003 CHEBS Seminar
Recommendations • Try a variety of models • If sample size is large enough, answers may be robust to modelling assumptions • Use prior information • We need evidence of what kinds of distributions can arise in different situations • And of how different they can be between different groups The 2003 CHEBS Seminar