1 / 38

Let ’ s do a Bayesian analysis

Let ’ s do a Bayesian analysis. Greg Francis. PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University. Visual Search. A classic experiment in perception/attention involves visual search

amosb
Télécharger la présentation

Let ’ s do a Bayesian analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Let’s do a Bayesian analysis Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University

  2. Visual Search • A classic experiment in perception/attention involves visual search • Respond as quickly as possible whether an image contains a target (a green circle) or not • https://coglab.cengage.com/?labname=visual_search • Log in with ID=GregTest-145, password=12345678

  3. Visual Search • A classic experiment in perception/attention involves visual search • Respond as quickly as possible whether an image contains a target (a green circle) or not • Vary number of distractors: 4, 16, 32, 64 • Vary type of distractors: feature (different color), conjunctive (different color or shape)

  4. Visual Search • A classic experiment in perception/attention involves visual search • Respond as quickly as possible whether an image contains a target (a green circle) or not • Vary number of distractors: 4, 16, 32, 64 • Vary type of distractors: feature (different color), conjunctive (different color or shape)

  5. Visual Search • A classic experiment in perception/attention involves visual search • Respond as quickly as possible whether an image contains a target (a green circle) or not • Vary number of distractors: 4, 16, 32, 64 • Vary type of distractors: feature (different color), conjunctive (different color or shape) • Measure reaction time: time between onset of image and participant’s response • 5 trials for each of 16 conditions: 4 number of distractors x 2 target (present or absent) x 2 distractor types (feature or conjunctive) = 80 trials

  6. Visual Search • Typical results: For conjunctive distractors, response time increases with the number of distractors

  7. Linear model • Suppose you want to model the search time on the Conjunctive search trials when the target is Absent as a linear equation • Let’s do it for a single participant • We are basically going through Section 4.4 of the text, but using a new data set and using brms instead of rethinking • Download files from the class web site and follow along in class

  8. Read in data • rm(list=ls(all=TRUE)) # clear all variables • Clear old variables out from R’s memory • graphics.off() • Remove old graphics from the display • VSdata<-read.csv(file="VisualSearch.csv",header=TRUE,stringsAsFactors=FALSE) • Reads in the contents of the file VisualSearch.csv • Uses the contents of the first row as a header, and creates variables by the names of those headers (no spaces in the header names!) • VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" & VSdata$Target=="Absent" & VSdata$DistractorType=="Conjunction") • Creates a new variable that contains just the data for a particular Participant • Only includes data from the Target absent trials • Only includes data for the Conjunction distractors • Check the contents of VSdata2 in the R console • head(VSdata2)

  9. Define the model • library(brms) • This loads up the R library we want to use • It has some nice functions for plotting and displaying tables • model1=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2 ) • Defines the linear model • Assigns default prior distributions to each parameter (b1, b2, sigma) • “Improper” uniform priors

  10. brms • Here, defining the model basically does all the work • This function takes the priors and the data, and computes posterior distributions • The main output is the set of b1, b2, and sigma that have the highest probabilities • Often this is actually probability density rather than probability

  11. Posterior Distribution • Remember, these are distributions of the population parameters!

  12. Posterior Distribution • Remember, these are distributions of the population parameters!

  13. Posterior Distribution • Remember, these are distributions of the population parameters!

  14. brms The main output of the brm( ) calculation is the set of b1, b2, and sigma that have the highest probabilities • Often this is actually probability density rather than probability

  15. Posterior Distributions • It is more complicated because we have three parameters with a joint posterior probability density function • Finding the peak of a multidimensional surface is not a trivial problem • brm calls an algorithm called Stan that does the computations • Involves lots of random sampling to estimate distributions • Things can go wrong sometimes, and the model may not “converge” • Sometimes takes a long time

  16. Estimates Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 829.14 138.81 549.44 1099.10 2235 1.00 NumberDistractors 41.53 3.80 34.12 49.02 2047 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 376.46 68.52 269.61 544.69 2162 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). There were 16 warnings (use warnings() to see them) • print(summary(model1)) • Prints out a table of estimates of b1, b2, and sigma

  17. Estimates (2nd run) Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 830.07 132.98 570.36 1087.89 2534 1.00 NumberDistractors 41.46 3.63 34.41 48.93 2643 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 372.40 66.54 267.22 531.17 2269 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). There were 16 warnings (use warnings() to see them) • print(summary(model1)) • Prints out a table of estimates of b1, b2, and sigma

  18. Posteriors • plot(model1) • Fuzzy graphs on right indicate convergence of the sampling (these look good)

  19. Working with posteriors • post<-posterior_samples(model1) • Pulls out the sampled values for each parameter > head(post) b_Intercept b_NumberDistractors sigma lp__ 1 762.4510 39.89337 359.6752 -155.3252 2 920.2443 42.11010 371.4696 -155.2212 3 816.8137 38.72261 298.6904 -155.8784 4 820.5475 42.55215 435.3729 -155.2769 5 869.2564 38.18557 398.8995 -155.1636 6 608.1106 43.10367 352.3777 -156.7883

  20. Working with posteriors • Plot posterior of b_NumberDistractors • density(post$b_NumberDistractors) • This is the same plot we got fromplot(model1)

  21. Posterior • You can ask all kinds of questions about predictions and so forth by just using probability • For example, what is the posterior probability that b_NumDistractors is less than 42? > length(post$b_NumberDistractors[post$b_NumberDistractors < 42])/length(post$b_NumberDistractors) [1] 0.562963 > mean(post$b_NumberDistractors) [1] 41.45843

  22. Posterior • You can ask all kinds of questions about predictions and so forth by just using probability • For example, what is the posterior distribution of RT_ms for 35 distractors? RT_at_35 <- post$b_Intercept + post$b_NumberDistractors * 35 dev.new() plot(density(mu_at35))

  23. Posterior • You can ask all kinds of questions about predictions and so forth by just using probability • What is the probability that RT_ms for 35 distractors is greater than 2400? probGT2400 <-length(RT_at_35[RT_at_35 > 2400])/length(RT_at_35) cat("Probability RT_ms for 35 distractors is greater than 2400 = ", probGT2400, "\n") Probability RT_ms for 35 distractors is greater than 2400 = 0.07888889

  24. Questions • What is the probability that predicted RT_ms is greater than 3000 for 42 distractors? • What is the probability that b_Intercept is more than 900? • What is the probability that b_NumberDistractors is less than 35?

  25. Predictions • Automatically predict RT_ms for different values of NumberDistractors • plot(marginal_effects(model1), points=TRUE) • Gray shading indicates 95%of posterior distribution • Called “credible intervals” • This is not the same thingas a 95% confidence interval!

  26. Compare to tradition • Is this any different than traditional linear regression? • The “traditional” bestfit line is • b_Intercept=832.94 • b_NumberDistractors=41.38 • Compare to > mean(post$b_Intercept) [1] 830.0674 > mean(post$b_NumberDistractors) [1] 41.45843

  27. Standard linear regression • What parameter values b1, b2, minimize the prediction error? • This is necessarily just a pair of numbers b1, b2

  28. Bayesian estimates • The mean of the posterior distributions is just one possible characterization of population values • But the posterior distribution contains a lot more information • Uncertainty about the estimates • Uncertainty about the prediction • Compare to confidence intervals

  29. Changing priors > summary(model2) Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 1143.58 103.55 947.97 1356.70 2496 1.00 NumberDistractors 30.59 0.99 28.69 32.51 2461 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 449.58 77.59 325.82 627.83 2438 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 30 • We can set a prior • model2=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(30, 1), class = "b")) ) • print(summary(model2))

  30. Changing priors • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 30 • We can set a prior • model2=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(30, 1), class = "b")) ) • plot(marginal_effects(model2), points = TRUE)

  31. Changing priors • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 30 • We can set a prior • model2=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(30, 1), class = "b")) ) • plot(model2)

  32. Changing priors > print(summary(model3)) Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 456.15 113.62 219.03 687.47 2383 1.00 NumberDistractors 54.35 0.99 52.41 56.23 1417 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 481.06 87.83 343.45 679.90 2112 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 55 • We can set a prior • model3=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(50, 1), class = "b")) ) • print(summary(model3))

  33. Changing priors • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 55 • We can set a prior • model3=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(55, 1), class = "b")) ) • plot(marginal_effects(model3), points = TRUE)

  34. Changing priors • Before running the model, we may have useful information about themodel parameters • For example, maybe we know that the slope parameter is usually around 55 • We can set a prior • model2=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(55, 1), class = "b")) ) • plot(model3)

  35. Changing priors > print(summary(model4)) Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 782.72 139.06 491.94 1047.84 2171 1.00 NumberDistractors 43.11 3.63 36.15 50.50 2207 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 373.82 68.57 269.35 532.89 2235 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 55, but we believe it could be rather larger or smaller • We can set a prior • model4=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(50, 10), class = "b")) ) • print(summary(model4))

  36. Changing priors • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 55, but we believe it could be rather larger or smaller • We can set a prior • model4=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(50, 10), class = "b")) ) • plot(marginal_effects(model4), points = TRUE)

  37. Changing priors • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 55, but we believe it could be rather larger or smaller • We can set a prior • model4=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(50, 10), class = "b")) ) • plot(model4)

  38. Conclusions • That wasn’t so bad • An uninformed prior is nearly the same as standard linear regression • But you get a distribution of population values rather than just point estimates • You can compute and consider probabilities of different ranges of population values • You can compute and consider probabilities of different outcomes • If you have an informed prior, you get quite different results from standard linear regression • How different depends on differences between the priors and the data • Better information leads to better inference!

More Related