Predictive Bayesian Models: Testing and Validation in Psychological Science

Prediction Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University

Hypothesis tests • Hypothesis tests are commonly used as part of a method to establish scientific “truth” • Is there an effect? • What should I believe? • An alternative approach is to give up on “truth” and instead focus on “prediction” • The question is not “Is there an effect?” or “What should I believe?” • Rather: “How should I behave?” • Follow the data, but do not follow it blindly • Build a quantitative model, but test the model

Model building • Suppose you have two samples and you are interested in the means • Further suppose that the population properties are: • μ1=0, μ2=0.3 • σ1=σ2=1 • Typically, we would draw random samples from each group and run a t-test to determine if we should treat the means as being different • Treatment • Theory • Future work • Prediction

Model building • We typically build the following kind of model • The score for subject k is related to the grand mean, to deviations from the grand mean due to being in group 1 or group 2, and to random noise • This model gives mean values for each group

Model building • Draw samples (n1=n2) from populations having • μ1=0,μ2=0.3 • σ1=σ2=1 • Construct different models that vary in the estimate of the mean values: Hypothesis testing model Full model Null model If do not reject H0 (p<.05) If do reject H0

Small samples • 20 experiments • n1=n2=10

Bigger samples • 20 experiments • n1=n2=50

Big samples • 20 experiments • n1=n2=100

Model fit/error • A standard way of judging the quality of a model is by its fit to a data set • One fit measure is root mean squared error • We want a model with low RMSE

Checking model approaches • Draw samples (n1=n2) from populations having • μ1=0,μ2=0.3 • σ1=σ2=1 • Repeat for 10,000 simulated experiments • Compute RMSE for each model and average across experiments • Vary sample size n1=n2

Comparing models • μ2 - μ1=0.3

Comparing models • For small samples, the null model provides the smallest average RMSE • For large samples, the full model provides the smallest average RMSE

Comparing models • There is always a better model (on average) than what is derived by hypothesis testing • Hypothesis testing (on average) leads to over fitting for some small samples (when it rejects) • Hypothesis testing (on average) leads to under fitting for some large samples (when it does not reject)

Bigger effects • Similar for other effect sizes: μ2 - μ1=0.8

Null effects • Similar for other effect sizes: μ2 - μ1=0

Known unknowns • But these simulations are all theoretical • To compute RMSE we need to know the true means • However, we can do something similar if we do not compute RMSE relative to the true means, but relative to test data

Prediction / validation • Suppose I build my models from one set of data, x1i, and x2i, and then test them with another set of data, y1i, and y2i • Here, we compute RMSE relative to means from the test data set • You could also compute RMSE relative to individual data points

Small effect • When μ2 - μ1=0.3

Bigger effect • When μ2 - μ1=0.8

Null effect • When μ2 - μ1=0

Prediction / validation • Can better see differences by subtracting full model RMSE from other models’ RMSE μ2 - μ1=0.8 μ2 - μ1=0.3 μ2 - μ1=0 Smallest number (biggest negative) Indicates the best model (with the smallest RMSE).

Prediction / validation • This looks good • At least on average, the RMSE patterns for testing means of new data are similar to those for RMSE for testing against the true means • If we want to deduce which model best predicts values, we can pick the model that minimizes the test RMSE value • Cost: we have to run the experiment twice • Testing does not require equal sample sizes, but you trade off model development against model testing

Cross validation • We partly avoid that cost by using cross-validation to approximate RMSETest • Divide the data set x1i, and x2i into multiple subsets (a common choice is 10 subsets) • Build your model using all but one of the subsets • Compute RMSE for the left-out subset • Repeat for all possible combinations • 10 build and test “folds” • Compute mean RMSE across the subsets

Cross validation • When μ2 - μ1=0.3, 5-fold cross validation

Optional stopping • Actual use: μ2 - μ1=0.3, 10-fold cross validation • Start with n1=n2=10, compute cross-validated RMSE • Add 10 scores and repeat until n1=n2=200

Cross validation • At each step, you should follow the data and use the best model for minimizing RMSE • As the data changes, so does your model • You can have an intermediate decision, but still expect it to change • If you have to make a decision with the current data it makes sense to choose the best model • Note that the best model is not necessarily a good model • You have to judge whether the RMSE is small enough for whatever purpose you have in mind

Prediction / validation • Cross validation and test validation naturally generalize to more complicated models and experimental designs • Interactions, nonlinear models • Details of how to generate validation “folds” can get complicated • It’s mostly a matter of being careful about generating representative folds and not inputting your own bias and • No need to use RMSE • Other “cost” functions work in a similar way

Conclusions • Prediction / validation seems like a viable approach • It encourages data accumulation • But it gives up on the idea of establishing “truth” from data • Instead, it focuses on practical uses of data • There are Bayesian methods that have the same goal • They are better if you have useful prior knowledge

Predictive Bayesian Models: Testing and Validation in Psychological Science

Predictive Bayesian Models: Testing and Validation in Psychological Science

Presentation Transcript

Linear Prediction

Prediction

Prediction markets

Prediction Diagram

Advancing Climate Prediction Science – Decadal Prediction

Prediction Strategy

Structure Prediction

Branch Prediction

Revenue Prediction

Probabilistic Prediction

Reliability Prediction

Reaction Prediction

Reaction Prediction

Word prediction

Prediction

Prediction

Linear prediction

Prediction

Branch Prediction

Astrology Prediction

Marriage prediction: Future Life Prediction – Janam Kundli Prediction & Milan

Prediction