1 / 22

Dealing with Nuisances : Principled and Ad Hoc Methods

Dealing with Nuisances : Principled and Ad Hoc Methods. Xiao-Li Meng Department of Statistics, Harvard University Joint work with Jingchen Liu (and CHASC). Dealing with Nuisance Parameters. Bringing in a little “Bee”: Posterior Predictive Assessment Giving up a bit of power:

Télécharger la présentation

Dealing with Nuisances : Principled and Ad Hoc Methods

E N D

Presentation Transcript

1. Dealing with Nuisances: Principled and Ad Hoc Methods Xiao-Li Meng Department of Statistics, Harvard University Joint work with Jingchen Liu (and CHASC) Harvard University

2. Dealing with Nuisance Parameters • Bringing in a little “Bee”: Posterior Predictive Assessment • Giving up a bit of power: Using an alternative alternative (or a “working” alternative) • Being further away from the big “Bee”: Profiling via moments Harvard University

3. A Simple Spectral Model • A source spectrum with two components: a continuum modeled by a power law E- , and an emission line modeled as a Gaussian profile with a total flux F. • The expected observed flux Fj from the source within an energy bin Ej for a “perfect” instrument is given by where dEj is the energy width of bin j, and j is the Gaussian proportion in bin j. • If the exact energy is observed, then the distribution follows • Reference: Protassov et al (2002) Harvard University

4. Hypothesis Testing – Notation • Likelihood L(q|x) = f(x|q),  = 0[1, 0\1=; • Null Hypothesis H0: 20 • Alternative Hypothesis HA: 21 • Critical region: C ) Reject null hypothesis if x 2C. • Type I error: P(X 2C | 20) – False negative rate Type II error: P(X 2Cc| 2A) – False positive rate • Power function: p() = P(X 2C | ) • Hypothesis testing of size : p() ·, 8 20 Harvard University

5. Hypothesis Testing – Likelihood Ratio Test • Uniformly most powerful (UPM) test: the most powerful test among all the tests with size  • Likelihood ratio test (LRT): C(c) = {x : LR(x) > c} • In a simple null hypothesis case, if the UMP test exists, it is likelihood ratio test. Harvard University

6. Seeking Pivotal Quantity • Hypothesis testing of size : max20 P(X 2C | ) = , hard to maximize. • Ideally, we seek a pivotal quantity: T(X) -- its distribution is completely known under the null 0 • Then type I error P(T(X)>t| ) = , 820, • Easy to control type I error, but typically it is very hard to find a useful/powerful pivotal quantity. Harvard University

7. Posterior Predictive Assessment • p-value = P(T(X) > T(x)| 0), • In the presence of nuisance parameter , under the null, the p-value will be a function of , p() = P(T(X) > T(x) | ). • Posterior predictive p-value: ppp=E(p() | x) = s p() f( | x) d , where f( | x) is the posterior density of . That is, the p-value is calculated under the posterior predictive distribution: f(Xrep|x) = s f(Xrep| 0, ) f( | x) d  • Casting doubt on the null hypothesis/model if a ppp is extreme. • Can use realized discrepancy D(X, ): p() = P(D(X , ) > D(x, ) | ). • Can assess the entire posterior distribution of p(). • References: Rubin (1984), Meng (1994), Gelman, Meng and Stern (1996) Harvard University

8. MODEL 0. There is no emission line. • MODEL 1. There in an emission line with fixed location in the spectrum, but unknown intensity. • MODEL 2. There is an emission line with unknown location and intensity. • Reference: van Dyk & Kang (2004) Harvard University

9. The posterior predictive check. The two histograms compare the observed likelihood ratio test statistics (vertical lines) with 1000 simulations from the posterior predictive distribution. The left plot is the comparison between Model 0 and Model 1, and the right plot is the comparison between Model 0 and Model 2. Both model checks indicate strong evidence for including the emission line. Harvard University

10. Mixture Model - Testing p = 0 • Hypothesis testing of mixture model • Particularly, f(x | ) / x-, g(x | , ) = (x| , ) (To avoid singularity at the 0, when  > 1, we need to truncate the density away from 0. Without losing generality, we assume x > 1.) • LR is not a pivotal quantity under this model. But if we use a different model for the g component, then we can construct a LR test that is a pivotal quantity. • Let y = log (x) and  = 1 / ( - 1), then we can model Harvard University

11. Difference between the Two Choices Density: normal(1, 0.2) Vs log-normal(0,0.2) Density: normal(1, 0.02) Vs log-normal(0,0.02) Harvard University

12. Power Comparison: LR under log-normal mixture vs LR under normal mixture when the true model is (almost) normal mixture =1,  = 1,  = 0.02 are treated as known p = 0.0001, 0.005, 0.01, 0.015, 0.02, 0.03 Only one free parameter, p. =1,  = 1,  = 0.3 are treated as known p = 0.0001, 0.005, 0.01, 0.015, 0.02, 0.03 Only one free parameter, p. Harvard University

13. Likelihood Ratio Test and Pivotal Quantity • H0: p = 0, HA: p > 0 • The LRT is pivotal quantity, i.e., the distribution of likelihood ratio is free of . • The maximization can be done via the EM algorithm by viewing the subgroup membership as missing data. Harvard University

14. Expectation-Maximization Algorithm

15. Multiple Modes log Likelihood Likelihood of  given that  = 1,  = 0.02, p = 0.01, the sample size is 500 Harvard University

16. A “Profiled” Likelihood Ratio Test • “Profile likelihood” via moment • Lp( p, , | y) can be maximized via numerical optimization method (the correct likelihood was harder to maximize without using EM). • Let’s define critical region C( c ) = {y | LRp(y) > c} Harvard University

17. A Sketch of Proof Harvard University

18. Demonstrating a pivot: QQ-plot of LRs when  = 1 vs  = 10 Profile Likelihood EM Harvard University

19. Distribution of 2 log (LR )’s under the null hypothesis Profile Likelihood EM: Starting from E( | y)= 0.5 Harvard University

20. Power Comparison: “Profile” LRT vs “EM” LRT Harvard University

21. References • Gelman, A., Meng, X.L., and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussions). Statistica Sinica, 6, 733-807 • Meng, X. L. (1994). Posterior predictive p-values. Ann. Stat. 22:1142 - 1160. • Protassov, R., van Dyk, D.A., Connors, A., Kashyap, V.L., and Siemiginowska, A. (2002) Statistics: Handle with Care, Detecting Multiple Model Components with the Likelihood Ratio Test. The Astrophysical Journal, 571:545–559 • Rubin, DB (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics, 12(4), 1151–1172 • van Dyk, D.A., and Kang, H. (2004). Highly Structured Models for Spectral Analysis in High-Energy Astrophysics. Statistical Science, 9, no. 2, 275–293 Harvard University

22. Topic “B” reinstated: • How to measure “ego”? • How to classify professions by such “ego” measures? • Finding the most powerful test for testing Ego_Particle Physicists > Ego_Astrophysicists> Ego_Statisticians Harvard University

More Related