Let ’ s continue to do a Bayesian analysis

Let’s continue to do a Bayesian analysis Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University

Visual Search • A classic experiment in perception/attention involves visual search • Respond as quickly as possible whether an image contains a target (a green circle) or not • Vary number of distractors: 4, 16, 32, 64 • Vary type of distractors: feature (different color), conjunctive (different color or shape)

Visual Search • Typical results: For conjunctive distractors, response time increases with the number of distractors

Linear model • Suppose you want to model the search time on the Conjunctive search trials when the target is Absent as a linear equation • Let’s do it for a single participant • We are basically going through Section 4.4 of the text, but using a new data set and using brms instead of rethinking • Download files from the class web site and follow along in class

Define the model • library(brms) • This loads up the R library we want to use • It has some nice functions for plotting and displaying tables • model1=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2 ) • Defines the linear model • Assigns default prior distributions to each parameter (b1, b2, sigma) • “Improper” uniform priors

Output Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 840.42 138.88 574.93 1121.98 2115 1.00 NumberDistractors 41.19 3.81 33.46 48.79 1836 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 374.90 69.17 273.24 543.65 2320 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). • print(summary(model1)) • Prints out a table of estimates of b1, b2, and sigma

HDPI • A useful quantitative characterization of the posterior distribution for a parameter is the Highest Density Posterior Interval (HPDI) > library(broom) > library(coda) > > tidyMCMC(model1$fit, conf.int = TRUE, conf.method = "HPDinterval") term estimate std.error conf.low conf.high 1 b_Intercept 840.4235 138.87583 571.21624 1114.73219 2 b_NumberDistractors 41.1892 3.80958 33.64076 48.94022 3 sigma 374.9036 69.17027 259.08606 514.65371

HPDI vs CI > library(broom) > library(coda) > > tidyMCMC(model1$fit, conf.int = TRUE, conf.method = "HPDinterval") term estimate std.error conf.low conf.high 1 b_Intercept 840.4235 138.87583 571.21624 1114.73219 2 b_NumberDistractors 41.1892 3.80958 33.6407648.94022 3 sigma 374.9036 69.17027 259.08606 514.65371 > fit <- lm(RT_ms ~ NumberDistractors, data = VSdata2) > fit Call: lm(formula = RT_ms ~ NumberDistractors, data = VSdata2) Coefficients: (Intercept) NumberDistractors 832.94 41.38 > confint(fit) 2.5 % 97.5 % (Intercept) 565.75775 1100.13180 NumberDistractors 34.10564 48.66024

HPDI vs CI • Pretty similar, so why bother? Different interpretations: • HPDI is the smallest contiguous set of values of b_NumberDistractors that have a 95% probability • If the model is valid, the priors are appropriate, and so forth • It is a statement about parameter values • CI is the smallest set of values that results from a process that 95% of the time includes the true value of b_NumberDistractors • If the model is valid and so forth • It is a statement about constructing the confidence interval

Significance? • It might be tempting to use HDPI in a way similar to confidence intervals are used • If the HDPI does not include 0, then we have a “significant” result • There are better approaches, which we will develop later • For the moment, we just want to describe the posterior distributions from our Bayesian analysis

Visual Search • Typical results: For conjunctive distractors, response time increases with the number of distractors

Define the model • # Pull out just the trials for the first participant's conjunctive condition (includes Target present and Target absent) • VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" & VSdata$DistractorType=="Conjunction") • # By default brms provides priors that are adapted to fit within the range of your data. Generally, this produces results similar to standard linear regression. Later, we will introduce informative priors • model2 <- brm(RT_ms ~ Target:NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3) • Basically defines the model to treat Target:NumberDistractors as an interaction • # print out summary of model • print(summary(model2))

Output Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ Target:NumberDistractors Data: VSdata2 (Number of observations: 40) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 1; total post-warmup samples = 5400 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 836.53 147.75 545.74 1131.55 4014 1.00 TargetAbsent:NumberDistractors 41.19 4.73 32.07 50.67 3540 1.00 TargetPresent:NumberDistractors 23.81 4.74 14.64 32.98 3857 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 578.31 70.40 460.32 737.97 4301 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). • print(summary(model2)) • Prints out a table of estimates of b1, b2 (2 of them), and sigma

Visual Search • Looking at posteriors

Visual Search • Compare fit to data • > plot(marginal_effects(model2), points = TRUE)

Varying intercepts • Our model so far has one (common) intercept and varying slopes (one for each search condition: target present/absent) • We can have different intercepts (several ways of doing this) • model3 <- brm(RT_ms ~ Target*NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3) • Essentially the same as • model3 <- brm(RT_ms ~ Target + NumberDistractors + Target:NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3)

Output Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ Target * NumberDistractors Data: VSdata2 (Number of observations: 40) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 1; total post-warmup samples = 5400 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 828.33 213.76 390.80 1241.50 2609 1.00 TargetPresent 10.55 300.03 -570.01 592.91 2134 1.00 NumberDistractors 41.32 5.82 29.74 52.90 2506 1.00 TargetPresent:NumberDistractors -17.52 8.18 -33.62 -1.47 2018 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 587.89 70.96 468.71 741.47 4200 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). • print(summary(model3)) • Prints out a table of estimates of b1, b2 (3 of them), and sigma

Visual Search • Looking at posteriors

Visual Search • Compare fit to data • Has to be run from the command line (hit return for next plot) • > plot(marginal_effects(model3), points = TRUE) • Now gives three plots • Main effects • Interaction (full set) • Not much different from single Intercept model • Data specific

Different subject • VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" & VSdata$DistractorType=="Conjunction") • model2 <- brm(RT_ms ~ Target:NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3)

Different subject • VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" & VSdata$DistractorType=="Conjunction") • model3 <- brm(RT_ms ~ Target*NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3)

Computed from 5400 by 40 log-likelihood matrix Estimate SE elpd_loo -313.4 6.3 p_loo 4.5 1.6 looic 626.8 12.6 ------ Monte Carlo SE of elpd_loo is 0.1. Pareto k diagnostic values: Count Pct. Min. n_eff (-Inf, 0.5] (good) 39 97.5% 2523 (0.5, 0.7] (ok) 1 2.5% 253 (0.7, 1] (bad) 0 0.0% <NA> (1, Inf) (very bad) 0 0.0% <NA> All Pareto k estimates are ok (k < 0.7). See help('pareto-k-diagnostic') for details. Model comparison Computed from 5400 by 40 log-likelihood matrix Estimate SE elpd_loo -311.4 5.1 p_loo 5.2 1.5 looic 622.8 10.2 ------ Monte Carlo SE of elpd_loo is 0.1. Pareto k diagnostic values: Count Pct. Min. n_eff (-Inf, 0.5] (good) 39 97.5% 2308 (0.5, 0.7] (ok) 1 2.5% 297 (0.7, 1] (bad) 0 0.0% <NA> (1, Inf) (very bad) 0 0.0% <NA> All Pareto k estimates are ok (k < 0.7). See help('pareto-k-diagnostic') for details. • We might want to see whether a model with a fixed intercept better fits the data than a model that uses a different intercept for different conditions • There are several ways to do it (we discuss them later) • As a taste, brms provides LOO function (Leave-One-Out cross validation) • LOO(model2) • LOO(model3) • compare_ic(LOO(model2), LOO(model3)) • Smaller LOO is better LOOIC SE model2 626.75 12.63 model3 622.81 10.19 model2 - model3 3.95 4.28

Other distributions • We have assumed the data is subject to noise from a normal distribution • Our data measures reaction times, which is often skewed to longer RTs • One alternative distribution is the exgaussian • Three parameters • > model4 <- brm(RT_ms ~ Target*NumberDistractors, family = exgaussian(link = "identity"), data = VSdata2, iter = 2000, warmup = 200, chains = 3)

Output Family: exgaussian Links: mu = identity; sigma = identity; beta = identity Formula: RT_ms ~ Target * NumberDistractors Data: VSdata2 (Number of observations: 40) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 1; total post-warmup samples = 5400 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept -608.51 427.86 -1534.21 145.96 5 1.66 TargetPresent 6.30 17.36 -16.47 47.56 5 2.89 NumberDistractors 24.05 16.25 -5.15 54.36 4 1.92 TargetPresent:NumberDistractors -4.85 8.40 -26.84 13.55 14 1.14 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 1808.66 204.00 1468.43 2257.61 36 1.06 beta 56.64 10.81 41.72 82.43 380 1.01 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). Warning messages: 1: The model has not converged (some Rhats are > 1.1). Do not analyse the results! We recommend running more iterations and/or setting stronger priors. 2: There were 5338 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup • print(summary(model4)) • Prints out a table of estimates of b1, b2 (3 of them), and sigma

Visual Search • Looking at posteriors • Bad chains!

Visual Search • Compare fit to data • Has to be run from the command line (hit return for next plot) • > plot(marginal_effects(model4), points = TRUE)

Other distributions • Increase number of iterations • > model4 <- brm(RT_ms ~ Target*NumberDistractors, family = exgaussian(link = "identity"), data = VSdata2, iter = 8000, warmup = 2000, chains = 3)

Output Family: exgaussian Links: mu = identity; sigma = identity; beta = identity Formula: RT_ms ~ Target * NumberDistractors Data: VSdata2 (Number of observations: 40) Samples: 3 chains, each with iter = 8000; warmup = 2000; thin = 1; total post-warmup samples = 18000 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 829.50 208.98 421.10 1237.09 2263 1.00 TargetPresent 11.57 298.93 -582.62 585.91 1679 1.00 NumberDistractors 41.34 5.73 30.04 52.40 2073 1.00 TargetPresent:NumberDistractors -17.61 8.18 -33.91 -1.70 1622 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 575.28 66.72 462.26 723.10 4657 1.00 beta 25.66 10.08 14.87 52.50 2830 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). Warning message: There were 12903 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup • print(summary(model4)) • Prints out a table of estimates of b1, b2 (3 of them), and sigma

Visual Search • Looking at posteriors • Better chains!

Visual Search • Compare fit to data • Has to be run from the command line (hit return for next plot) • > plot(marginal_effects(model4), points = TRUE)

Conclusions • Many types of models • Can compare models (more later) • HPDI vs. CI • Model convergence

Let ’ s continue to do a Bayesian analysis