1 / 80

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. SPH6004 Advanced Biostatistics Part 1: Bayesian Statistics Chapter 1: Introduction to Bayesian Statistics

3. Objectives • Describe differences between Bayesian and classical statistics • Develop appropriate Bayesian solutions to non-standard problems, describe the model, fit it, relate analysis to problem • Describe differences between computational methods used in Bayesian inference, understand how they work, implement them in a programming language • Understand modelling and data analytic principles

4. Expectations Know already • Basic and intermediate statistics • Likelihood function • Pick up programming in R • Generalised linear models • Able to read notes

5. The fundamental theoremof statistics?

6. Why the profundity? Bayes' rule is THE way to invert conditional probabilities ALL probabilities are conditional Bayes' rule therefore provides the 'calculus' to manipulate probability, moving from p(A|B) to p(B|A).

7. For early detection of breast cancer, starting at some age, women are encouraged to have routine screening, even if they have no symptoms Imagine you conduct such screening using mammography Prof GerdGigerenzer The following information is available about asymptomatic women aged 40 to 50 in your region who have mammography screening

8. The probability a woman has breast cancer is 0.8% • If she has breast cancer, the probability is 90% that she has a positive mammogram • If she does not have breast cancer, the probability is 7% that she still has a positive mammogram • The challenge: • Imagine a woman who has a positive mammogram • What is the probability she actually has breast cancer?

9. Their answers... I never inform my patients about statistical data. I would tell the patient that mammography is not so exact, and I would in any case perform a biopsy.

10. The following information is available about asymptomatic women aged 40 to 50 in your region who have mammography screening • The probability a woman has breast cancer is 0.8% • If she has breast cancer, the probability is 90% that she has a positive mammogram • If she does not have breast cancer, the probability is 7% that she still has a positive mammogram Can we write the above mathematically?

11. Key point 1 • p(B = 1 | A = 1)---the probability prior to observing the mammogram • p(B = 1 | M = 1, A = 1)---the probability after observing it • Bayes’ rule provides the way to update the prior probability to reflect the new information to get the posterior probability • (Even the prior is a posterior)

12. Key point 2 Bayes' rule allows you to switch from pr(something known | something unknown) to pr(something unknown | something known)

13. Bayesians and frequentists Bayes' rule is used to switch to pr(unknowns|knowns) for all situations in which there is uncertainty including parameter estimation Bayes' rule is only used to make probability statements about events, that in principle could be repeatedly observed Parameter estimation is done using methods that perform well under some arbitrary desiderata, such as being unbiased, and uncertainty is quantified by appealing to large samples

14. The Thai AIDS vaccine trial

15. The modified intention to treat analysis Q: what is the “underlying” probability pv of infection over this time window for those on the vaccine arm?

16. What does that actually mean? • Participants are not randomly selected from the population: they are referred or volunteer • Participants must meet eligibility requirements • Not representative of Thai population • Risk of infection different in Thailand and, eg, Singapore • Nebulous: risk of infection in an hypothetical second trial in same group of participants • Hopepv/pu has some relevance in other settings

17. Model for data • Seems appropriate to assume Xv ~ Bin(Nv,pv) • Xv = 51 = number vaccinees infected • Nv = 8197 = number vaccinees • pv = ?

18. Refresher: frequentist approach • Traditional approach to estimate pv: • find the value of pv that maximises the probability of the data given that the hypothetical value were the true value • using calculus • numerically (Newton-Raphson, simulated annealing, cross entropy etc) • EITHER CASE use log likelihood

19. Refresher: frequentist approach • Differentiating wrt argument we want to max over • setting derivative to zero, adding hat, solving, gives • which is just the empirical proportion infected

20. Refresher: frequentist approach • To quantify the uncertainty might take a 95% interval • You probably know • (involves cheating: assuming you know pv and assuming the same size is close to infinity---actually there are better equations for small samples)

21. Interpretation • The maximum likelihood estimate of pv is not the most likely value of pv • Classical statisticians cannot make probabilistic statements about parameters • Not a 95% chance pv lies in the interval (0.45,0.79)% • 95% of such intervals over your lifetime (with no systematic error, small samples) will contain the true value

22. ...we know all that oredy... this is so boring, tell us something new dr cook

23. Tackling it Bayesianly • Target: point and interval estimate • Intermediate: probability of the parameter pv given the data Xv and Nv, ie • Likelihood function is same as before • What is the prior? likelihood fn prior for pv posterior for pv dummy variable pi

24. What is the prior? • There is no the prior • There is a prior: you choose it just as you choose a Binomial model for the data • It represents information on the parameter (proportion of vaccinees that would be infected) before the data are in hand • Perhaps justifiable to assume allprobs between [0,1] are equiprobably before data observed

25. What is the prior? • 1{A}=1 if A true and 0 if A false • Nv can be dropped from the condition as I assume sample size and probability of infection are independent

26. What is the posterior? • pv on the range (0,1) • C a constant

27. The dumb way • Grid of values for pv, finely spaced, on sensible range • Evaluate log posterior +C • Transform to posterior ×C • Approximate integral by sum over grid • Scale to get rid of C exploiting fact that posterior is a pdf and integrates to 1

28. The dumb way

29. The posterior can take values >1 note asymmetry

30. Point estimates • If you have a sample x1, x2, ... from a distribution, can represent overall location using: • mean • median • mode • Similarly can report as point estimate mean, median or mode of posterior

31. In R

32. Uncertainty • Two common methods to get uncertainty interval/credible interval/intervals: • quantiles of the posterior (eg2.5%ile, 97.5%ile) • highest posterior density interval • Since there is a 95% chance if you drew a parameter value from the posterior of it falling in this interval, the interpretation is how many people think of confidence intervals

33. Highest posterior density intervals need to draw sketch

34. In R (0.47,0.82)% (0.45,0.79)% (0.47,0.81)%

35. Important points • In some situations it doesn’t really matter if you do a Bayesian or a classical analysis as the results are effectively the same • sample size is large, asymptotic theory justified • no prior/external information for analysis • someone has already developed a classical routine • In other situations, Bayesian methods come into their own!

36. Philosophical points • If you really love frequentism and hate Bayesianism, you can pragmatically use Bayesian approaches and interpret them like classical ones • If vice versa, you can • use classical estimates from literature as if Bayesian • arguably interpret classical point/interval estimates the way you want to

37. Priors and posteriors • A prior probability of BC reflects the information you have before observing the mammogram: all you know is the risk class the patient sits in • The posterior probability of BC reflects the information after observing the mammogram • A prior probability density function for pv reflects the information you have before the study results are known • The posterior probability density function reflects the information after the study, including anything known before and everything from the study itself How much knowledge, how much uncertainty

38. Justification For instance, Ms A wants to do a logistic regression on the following data outcome: got infected by H1N1 as measured by serology predictors: age, gender, recent overseas travel, number of children in household, ... • Statistician, Ms A, is analysing some data. She comes up with a model for the data based on some simplifying assumptions. She must justify this choice if others are to believe her • Bayesian statistician, Mr B, is analysing some data. He must come up with a model for the data and for the parameters. He too must justify his choice. There is no reason why the effect of age on the risk of infection should be linear in the logit of risk. There is no reason why each predictor’s effect is additive on the logit of risk. There is no reason why individuals should be taken to be independent. These are choices made by the statistician

39. Support • Each parameter of a model has a support • The prior should match this • All a bit silly: