Understanding Population Mean Estimation in Skewed Data: Bayesian Approach

The problem with costs Tony O’Hagan CHEBS, University of Sheffield The 2003 CHEBS Seminar

A simple problem • Given a sample from a population, how can we estimate the mean of that population? • Sample mean? • Unbiased and consistent • But sensitive to extreme observations • In Health Economics • Costs are invariably very skewed • Can also arise with times to events and other kinds of data • And we really require inference about population means The 2003 CHEBS Seminar

A simple dataset • Costs incurred by 26 asthma patients in a trial comparing two inhalers • Patients who used pMDI and had no exacerbations The 2003 CHEBS Seminar

Some estimates & intervals • Sample mean (use CLT to justify normality) • Estimate 2104, 95% CI (-411,4619) • Bayesian analysis assuming normality (weak prior) • Posterior mean 2104, 95% credible interval (-411,4619) • Nonparametric bootstrap • Estimate 2104, 95% CI (298,4785) • Bayesian bootstrap • Posterior mean 2104, 95% credible interval (575,5049) • Bayesian analysis assuming lognormality • Posterior median 1112, 95% credible interval (510,3150) The 2003 CHEBS Seminar

Which is right? • Data appear to fit lognormal much better than normal • But many other distributions might visually fit well yet give completely different results • Results from analysis assuming normality are supported by bootstrap and by analysis based on CLT • These are well known to be robust methods • But bootstrapping the sample mean will always give the same estimate and will tend to back up the normal-theory analysis • And extreme skewness evident in the population suggests non-robustness of the sample mean • Real cost distributions won’t follow any standard form The 2003 CHEBS Seminar

The problem • The population mean depends critically on the shape of the tail • How can we learn about that tail from a small sample? • Or even quite a large one? The 2003 CHEBS Seminar

Bayesian model comparison • Bayes factors for the example data • Lognormal versus normal, 1028 • Lognormal versus square-root normal, 1012 • Lognormal is favoured over any other power transformation to normality • Lognormal versus gamma, 103 • This is far from conclusive • Distributions we can’t distinguish could still have completely different tails The 2003 CHEBS Seminar

Possible distributions • Normal – unrealistic, very thin tailed • Gamma – thin tailed (exponential) • Sample mean is MVUE • Lognormal – heavier tailed • Population mean exists but its posterior mean may not • Inverse gamma – heavy tailed (polynomial) • Population mean exists if enough degrees of freedom The 2003 CHEBS Seminar

Log-gamma, log-logistic – too heavy tailed? • Population mean never exists • Generalised Pareto – range of tail weights • Used in extreme value theory • Mixtures and chimeras • More flexible and realistic • Harder to fit • Bayesian methods essential The 2003 CHEBS Seminar

More complex structures • We nearly always wish to compare means • Extreme data can heavily influence comparison • Asthma dataset • We also often need to model costs in more complex ways • Components of costs • Covariates • Tail shape can again be very influential The 2003 CHEBS Seminar

The 2003 CHEBS Seminar

Recommendations • Try a variety of models • If sample size is large enough, answers may be robust to modelling assumptions • Use prior information • We need evidence of what kinds of distributions can arise in different situations • And of how different they can be between different groups The 2003 CHEBS Seminar

Understanding Population Mean Estimation in Skewed Data: Bayesian Approach

Understanding Population Mean Estimation in Skewed Data: Bayesian Approach

Presentation Transcript

The Problem With Emptiness

The Problem With Estuaries

The problem with Opium

THE PROBLEM WITH LANDMINES

THE PROBLEM WITH LANDMINES. . .

The problem with place

The Problem With Aspects

Problem: Alum costs are increasing

The Problem with Teamwork

The Problem With Zoo’s

The problem with Belgium …

The Problem with Evil

The problem with the question

The Problem with Memory

Problem Solving with Information Access Costs in Mind

THE PROBLEM WITH WORDS

The Problem with Juice

The Problem With Power

The Problem with “Redefining”…

The Problem with Appearances

The Problem with Margaret