Comparing analyses

Comparing analyses Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University

Facial feedback • Is your emotional state influenced by your facial muscles? • If I ask you to smile, you may report feeling happier • But this could just be because you guess I want you to report feeling happier • Or because intentional smiling is associated with feeling happier • You can ask people to use smiling muscles without them realizing it

Facial feedback • Within subjects design (n=21) • Judge “happiness” in a piece of abstract art while • Holding a pen in your teeth (smiling) • Holding a pen in your lips (frowning/pouting) • No pen • 11 trials for each condition • Different art on eachtrial

Data • The HappinessRating is a number between 0 (no happiness) to 100 (lots of happiness) • The facial feedback hypothesis suggests that the mean HappinessRating values should be larger when the pen is held in the teeth and lower when the pen is held in the lips • File FacialFeedback.csv contains all the data

Models # load data file FFdata<-read.csv(file="FacialFeedback.csv",header=TRUE,stringsAsFactors=TRUE) # By default, participant numbers are treated as _numbers_. Need to correct that. FFdata$ParticipantFactor = factor(FFdata$Participant) # load the brms library library(brms) # null model model1 = brm(HappinessRating ~ 1, data = FFdata) # without random effect on participant model2 = brm(HappinessRating ~ 0 + Condition, data = FFdata) # with random effect on participant (for shrinkage) model3 = brm(HappinessRating ~ 0 + Condition + (Condition | Participant), data = FFdata) # compare models print(model_weights(model1, model2, model3, weight="loo") ) • Consider three models > print(model_weights(model1, model2, model3, weight="loo") ) model1 model2 model3 3.026681e-05 8.068823e-02 9.192815e-01

Model comparison • What does this mean? • If you wanted to predict future data, the random effects model is your best choice among these models • That’s largely because there seem to be differences between participants, and only this model considers them • Maybe we should consider other models

Various null models • We usually think of there being one null • But oftentimes there are many such models # Different null model: no effect of condition, but different values across participants model4 = brm(HappinessRating ~ ParticipantFactor, data = FFdata) # Random effects null model: no effect of condition, but different values across participants that are related to each other model5 = brm(HappinessRating ~ (1 | ParticipantFactor), data = FFdata) # compare models print(model_weights(model1, model2, model3, model4, model5, weight="loo")) > model_weights(model1, model2, model3, model4, model5, weight="loo") model1 model2 model3 model4 model5 2.480789e-02 5.444721e-02 6.586768e-01 2.620574e-01 1.074924e-05

Evidential support • What does the Facial feedback hypothesis actually predict? • Is your emotional state influenced by your facial muscles? • ->Differences in happiness ratings across the conditions > print(model3) Family: gaussian Links: mu = identity; sigma = identity Formula: HappinessRating ~ 0 + Condition + (Condition | Participant) Data: FFdata (Number of observations: 693) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Group-Level Effects: ~Participant (Number of levels: 21) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 11.01 2.20 7.34 15.83 2114 1.00 sd(ConditionPenInLips) 8.59 3.19 2.33 15.05 1152 1.00 sd(ConditionPenInTeeth) 7.19 3.00 1.18 13.15 791 1.00 cor(Intercept,ConditionPenInLips) 0.03 0.32 -0.53 0.67 2304 1.00 cor(Intercept,ConditionPenInTeeth) -0.46 0.30 -0.88 0.29 2435 1.00 cor(ConditionPenInLips,ConditionPenInTeeth) 0.07 0.40 -0.78 0.73 1472 1.00 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat ConditionNoPen 43.69 2.76 38.17 49.20 1174 1.01 ConditionPenInLips 45.88 3.37 39.29 52.46 1540 1.01 ConditionPenInTeeth 47.44 2.54 42.29 52.42 1700 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 19.02 0.55 17.98 20.11 4000 1.00

Evidential support > print(model3) Family: gaussian Links: mu = identity; sigma = identity Formula: HappinessRating ~ 0 + Condition + (Condition | Participant) Data: FFdata (Number of observations: 693) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Group-Level Effects: ~Participant (Number of levels: 21) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 11.01 2.20 7.34 15.83 2114 1.00 sd(ConditionPenInLips) 8.59 3.19 2.33 15.05 1152 1.00 sd(ConditionPenInTeeth) 7.19 3.00 1.18 13.15 791 1.00 cor(Intercept,ConditionPenInLips) 0.03 0.32 -0.53 0.67 2304 1.00 cor(Intercept,ConditionPenInTeeth) -0.46 0.30 -0.88 0.29 2435 1.00 cor(ConditionPenInLips,ConditionPenInTeeth) 0.07 0.40 -0.78 0.73 1472 1.00 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat ConditionNoPen 43.69 2.76 38.17 49.20 1174 1.01 ConditionPenInLips 45.88 3.37 39.29 52.46 1540 1.01 ConditionPenInTeeth 47.44 2.54 42.29 52.42 1700 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 19.02 0.55 17.98 20.11 4000 1.00 • What does the Facial feedback hypothesis actually predict? • Use of “smiling” facial muscles should lead to higher happiness ratings than “pouting” facial muscles • -> Higher ratings for teeth than for lips conditions

Evidential support > print(model3) Family: gaussian Links: mu = identity; sigma = identity Formula: HappinessRating ~ 0 + Condition + (Condition | Participant) Data: FFdata (Number of observations: 693) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Group-Level Effects: ~Participant (Number of levels: 21) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 11.01 2.20 7.34 15.83 2114 1.00 sd(ConditionPenInLips) 8.59 3.19 2.33 15.05 1152 1.00 sd(ConditionPenInTeeth) 7.19 3.00 1.18 13.15 791 1.00 cor(Intercept,ConditionPenInLips) 0.03 0.32 -0.53 0.67 2304 1.00 cor(Intercept,ConditionPenInTeeth) -0.46 0.30 -0.88 0.29 2435 1.00 cor(ConditionPenInLips,ConditionPenInTeeth) 0.07 0.40 -0.78 0.73 1472 1.00 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat ConditionNoPen 43.69 2.76 38.17 49.20 1174 1.01 ConditionPenInLips 45.88 3.37 39.29 52.46 1540 1.01 ConditionPenInTeeth 47.44 2.54 42.29 52.42 1700 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 19.02 0.55 17.98 20.11 4000 1.00 • What does the Facial feedback hypothesis actually predict? • “Smiling” facial muscles leads to more happiness, “pouting” facial muscles leads to less happiness • -> Order of means: teeth > none > lips

Ordered means • You should build the model that best represents the theoretical claim • Consider a model for ordered effects: teeth > none > lips • First attempt (failure) • More generally, “hard” boundaries are dangerous. • Can lead to model convergence problems • Better to use “soft” boundaries via priors priors = c(prior(normal(0, 10), class = "b", coef="ConditionPenInLips", ub=0), prior(normal(0, 10), class = "b", coef="ConditionPenInTeeth", lb=0)) > Error: Argument 'coef' may not be specified when using boundaries.

Ordered means • You should build the model that best represents the theoretical claim • Consider a model for ordered effects: teeth > none > lips • Intercept (None), bias negative for Lips, bias Positive for Teeth • Versus a model without ordered effects priors = c(prior(normal(-5, 5), class = "b", coef="ConditionPenInLips"), prior(normal(5, 5), class = "b", coef="ConditionPenInTeeth")) model6 = brm(HappinessRating ~ 1 + Condition + (1 + Condition | Participant), data = FFdata, prior=priors) > model_weights(model3, model6, weights="loo") model3 model6 0.3262669 0.6737331

Ordered means • You should build the model that best represents the theoretical claim • We set different priors, so this might not be a fair comparison • Not much difference priors = c(prior(normal(0, 5), class = "b", coef="ConditionPenInLips"), prior(normal(0, 5), class = "b", coef="ConditionPenInTeeth")) model7 = brm(HappinessRating ~ 1 + Condition + (1 + Condition | Participant), data = FFdata, prior=priors) > model_weights(model6, model7, weights="loo") model6 model7 0.5610197 0.4389803

Model comparison • To do much more, we need better priors • Which means we need to better understand the precise predictions of the Facial Feedback hypothesis • Perhaps the theory does make more precise predictions • But oftentimes, psychological theories make only vague predictions

BayesFactor Package # load data file FFdata<-read.csv(file="FacialFeedback.csv",header=TRUE,stringsAsFactors=TRUE) # By default, participant numbers are treated as _numbers_. Need to correct that. FFdata$ParticipantFactor = factor(FFdata$Participant) # load the BayesFactor library library(BayesFactor) bf = anovaBF(HappinessRating ~ Condition + ParticipantFactor, data = FFdata, whichRandom="ParticipantFactor") • It is less flexible, but it provides a nice interface for common comparisons • Favors the null > bf Bayes factor analysis -------------- [1] Condition + ParticipantFactor : 0.1314427 ±0.95% Against denominator: HappinessRating ~ ParticipantFactor --- Bayes factor type: BFlinearModel, JZS

BRMS vs BayesFactor # Different null model: no effect of condition, but different values across participants model4 = brm(HappinessRating ~ ParticipantFactor, data = FFdata, save_all_pars = TRUE) model5b = brm(HappinessRating ~ Condition + (1 | ParticipantFactor), data = FFdata, save_all_pars = TRUE) • We can set up similar comparisons in BRMS • Favors an effect! > model_weights(model4, model5b, weights="waic") model4 model5b 0.2545269 0.7454731

Why the differences? • Same model structure: • Condition + random effect of participant • Vs. participants only • Differences: • Priors • BF vs. WAIC/loo • Lets look at details

Bayes Factor Iterations = 1:1000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 1000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE mu 45.6104 2.4044 0.076032 0.076032 Condition-NoPen -1.8950 1.0304 0.032585 0.032585 Condition-PenInLips 0.1665 1.0324 0.032649 0.032649 Condition-PenInTeeth 1.7285 1.0157 0.032118 0.032118 ParticipantFactor-1 2.0154 3.8732 0.122481 0.116345 ParticipantFactor-2 -3.8689 3.8867 0.122908 0.117546 ParticipantFactor-3 -6.5852 3.7818 0.119590 0.119590 ParticipantFactor-4 7.1901 3.8973 0.123244 0.109853 ParticipantFactor-5 5.8068 3.9683 0.125487 0.109294 ParticipantFactor-6 -5.2414 3.8057 0.120346 0.120346 ParticipantFactor-7 14.4266 3.9816 0.125908 0.125908 ParticipantFactor-8 7.3839 3.8769 0.122597 0.122597 ParticipantFactor-9 4.7411 3.8236 0.120912 0.120912 ParticipantFactor-10 -6.0632 3.9646 0.125371 0.125371 ParticipantFactor-11 12.4585 3.9078 0.123576 0.136742 ParticipantFactor-12 3.0253 3.9878 0.126106 0.126106 ParticipantFactor-13 -9.3336 3.8855 0.122871 0.122871 ParticipantFactor-14 6.6023 3.9474 0.124829 0.124829 ParticipantFactor-15 2.4411 3.9526 0.124991 0.117939 ParticipantFactor-16 -2.4974 3.9522 0.124978 0.124978 ParticipantFactor-17 -0.9160 4.0851 0.129181 0.114247 ParticipantFactor-18 -27.8137 3.9291 0.124249 0.123889 ParticipantFactor-19 -1.5412 3.6467 0.115318 0.115318 ParticipantFactor-20 7.2656 4.0284 0.127389 0.127389 ParticipantFactor-21 -7.7216 3.7769 0.119436 0.119436 sig2 380.7215 21.0074 0.664311 0.703139 g_Condition 0.2786 1.0220 0.032320 0.032320 g_ParticipantFactor 0.3078 0.1106 0.003497 0.004191 > chains = posterior(bf, iterations = 1000) 0 % |----|----|----|----|----|----|----|----|----|----| **************************************************| > summary(chains) • Extract parameter estimates • NoPen=43.719 • Lips=45.7805 • Teeth=47.3425

BRMS Family: gaussian Links: mu = identity; sigma = identity Formula: HappinessRating ~ Condition + (1 | ParticipantFactor) Data: FFdata (Number of observations: 693) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Group-Level Effects: ~ParticipantFactor (Number of levels: 21) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 10.18 1.97 7.11 14.84 1018 1.00 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 43.65 2.55 38.64 48.70 1064 1.00 ConditionPenInLips 2.20 1.83 -1.49 5.75 4000 1.00 ConditionPenInTeeth 3.77 1.81 0.13 7.13 4000 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 19.59 0.53 18.59 20.66 4000 1.00 model5b • Extract parameter estimates • NoPen=43.65 • Lips=45.85 • Teeth=47.42

Posteriors BayesFactor plot(chains) • Summarize

Posteriors BRMS plot(model5b) • Summarize

Posterior details • Posterior distributions for BF • Have to look at chains format to extract what you want # Create posterior for Teeth mean TeethBFPosterior = chains[,1] + chains[,4] dev.new() plot(density(TeethBFPosterior))

Posterior details • Posterior distributions for brms model5b post<-posterior_samples(model5b) TeethBRMSPosterior = post$b_Intercept + post$b_ConditionPenInTeeth

Together • Almost identical! > plot(density(TeethBRMSPosterior)) > lines(density(TeethBFPosterior))

Lips

None condition

Bayes Factor from BRMS • BRMS can compute a Bayes Factor from two models • Huge BF in favor of model4 (null model) BFbrms = bayes_factor(model4, model5b) > print(BFbrms) The estimated Bayes factor in favor of x1 over x2 is equal to: 6.939161e+31

BF vs. loo • A Bayes Factor makes sense when you have informative priors that are used to specify a pretty precise model • The default priors used by brms are the opposite of informative priors (by design) • This makes them a poor choice for computing Bayes Factors • Small differences can get blown up in a ratio • Which approach is right? • It depends on what you want to do with your model

What do you want/have? • Are you testing well specified models with informative priors? Do you believe the true model is among the ones you testing? • Maybe go for the Bayes Factor • Are you hoping to predict future data and avoid overfitting? Do you not believe that you can identify the true model? • Maybe go for WAIC or loo. • Personally, I think WAIC or loo is more appropriate for most situations in the social sciences

Posteriors • Both brms and the BayesFactor library provide fairly reasonable default priors • Both provide posterior distributions • Both produce pretty similar estimates of population parameters • They differ in details, and those details can matter some times • Ease of use • Interpretation

Conclusions • Figure out what you want to do for your analysis • That includes knowing your audience • Some people will understand BayesFactors (or think they do), and that can help you communicate your findings

Comparing analyses

Comparing analyses

Presentation Transcript

Bivariate Analyses

Moderator Analyses

Descriptive Statistical Analyses Reliability Analyses

Microarray Analyses

Phylogenetic analyses

Variance Analyses from Invariance Analyses

Functional Analyses

CEM analyses

Analyses

COMPARING

Falls Analyses

iPod analyses

CORRELATIONAL ANALYSES

Technical Analyses

Comparing

Fourier Analyses

Chi Square Analyses: Comparing Frequency Distributions

Comparing

Rhetorical Analyses

Variance Analyses from Invariance Analyses