Inferential statistics

Doing inferential statistics = doing pairwise comparisons of models Inferential statistics Every comparison answers a conceptual question, e.g. - Is there a relationship between X and Y? - Is the difference between the experimental conditions statistically significant? We looked at several models …

With each model, we complete two steps: Inferential statistics Step 1: Make the best predictions, given the information you have Step 2: Compute the total prediction error (sum of squared errors = SSE) Once we have two or more models, we can make pairwise comparisons Ex: Self-complexity data set

Model 0 (= the "Stupid Model" = the "Null Model"): swb = B0 + e ; (here: B0 = 0) Price: P = ? 0 ; Total Prediction Error: SSE = 438 ; Error for participant #7? e7 = 3 ; #7

Model 1 (= the "Basic Model" = the "Mean-Only Model"): swb = b0 + e ; (here: b0 = 5) Price: P = ? 1 ; Total Prediction Error: SSE = 88 ; Error for participant #7? e7 = -2 ; #7

Compact m.: swb = B0 + e = 0 + e ; P = 0 ; SSE = 438 ; Augmented m.: swb = b0 + e = 5 + e ; P = 1 ; SSE = 88 ; Model Comparison 1 Augmented model Compact model

C: swb = B0 + e = 0 + e ; P = 0 ; SSE = 438 ; (Model 0) A: swb = b0 + e = 5 + e ; P = 1 ; SSE = 88 ; (Model 1) Model Comparison 1

C: swb = B0 + e = 0 + e ; P = 0 ; SSE = 438 ; (Model 0) A: swb = b0 + e = 5 + e ; P = 1 ; SSE = 88 ; (Model 1) Model Comparison 1 Mathematical interpretations: It's worth it to estimate the additional parameter. The parameter b0 is reliably different from zero. Conceptual interpretation: The subjective well-being scores are on average reliably different from zero. Conceptual interpretation: ? Often not meaningful  good-bye stupid model

Model 1 (= the "Basic Model" = the "Mean-Only Model"): SST = sum of squares total swb = b0 + e ; (here: b0 = 5) Price: P = 1 ; Total Prediction Error: SSE = 88 ; SSE N - 1 = s2 SSE = the part of the variance that we have not (yet) explained

? ? swb = b0 + b1*comp + e ; (here: b0 = 2.05 and b1 = 1.18) Price: P = ? 2 ; Total Prediction Error: SSE = 58.49 ; Model 2 the "slope" b1 #7 Error for participant #7? e7 = .7 ; the "intercept" b0

Different models Model 2: ^ swb = 2.05 + 1.18 * comp comp 0 1 2 3 4 swb 2.05 3.23 4.41 5.59 6.77 ^

C: swb = b0 + e = 5 + e ; P = 1 ; SSE = 88 ; (Model 1) C: swb = b0 + 0 * comp + e = 5 + 0 + e ; A: swb = b0 + b1* comp + e = 2.05 + 1.18*comp + e ; P = 2 ; SSE = 58.49 ; (Model 2) Model Comparison 2 Augmented model Compact model

C: swb = b0 + e = 5 + e ; P = 1 ; SSE = 88 ; (Model 1) A: swb = b0 + b1* comp + e = 2.05 + 1.18*comp + e ; P = 2 ; SSE = 58.49 ; (Model 2) Model Comparison 2

C: SSEC = 88 ; A: SSEA = 58.49 ; 29.51 Model Comparison 2 58.49 33.53% 66.47% Variance explained by comp Unexplained variance η2 = partial eta squared = proportion of variance explained = = = = = .3353 = .34 = p SSEC – SSEA 29.51 88 – 58.49 88 SSEC 88 = PRE = proportional reduction in error ;

ηpη2 p Effect sizes rpPRE Small effect .1 .01 Medium effect .3 .09 Large effect .5 .25

C: swb = b0 + e = 5 + e ; P = 1 ; SSE = 88 ; A: swb = b0 + b1* comp + e = 2.05 + 1.18*comp + e ; P = 2 ; SSE = 58.49 ; Model Comparison 2 F(1,12) = 6.05; t(12) = 2.46; p < .03; η2 = .34; p Mathematical interpretations: It's worth it to estimate the additional parameter. The parameter b1 (the slope) is reliably different from zero. Conceptual interpretation: ? Conceptual interpretation: There is a statistically significant relationship between subjective well-being and self-complexity. The effect is quite large.

C: swb = 5 + e; P = 1; SSE = 88; (Model 1) A: swb = 2.05 + 1.18*comp + e; P = 2; SSE = 58.49; (Model 2) Model Comparison 2 SSR PA-PC N-PA "numerator (model) degrees of freedom" "denominator (error) degrees of freedom"

R script : m2 <- lm(swb ~ comp, data=d) summary(m2) lm.sumSquares(m2)

> m2 <- lm(swb ~ comp, data=d) > summary(m2) Call: lm(formula = swb ~ comp, data = d) Residuals: Min 1Q Median 3Q Max -3.472 -1.866 0.000 1.866 3.472 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.0491 1.3366 1.533 0.151 comp 1.1804 0.4797 2.460 0.030 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1 Residual standard error: 2.208 on 12 degrees of freedom Multiple R-squared: 0.3353, Adjusted R-squared: 0.2799 F-statistic: 6.054 on 1 and 12 DF, p-value: 0.03001 > lm.sumSquares(m2) SS dR-sqr pEta-sqr df F p-value (Intercept) 11.45597 0.1302 0.1638 1 2.3503 0.1512 comp 29.50897 0.3353 0.3353 1 6.0541 0.0300 Error (SSE) 58.49103 NA NA 12 NA NA Total (SST) 88.00000 NA NA NA NA NA Model comparison: C: swb = B0 + b1* comp + e ; B0 = 0 ; A: swb = b0 + b1* comp + e ; Model comparison: C: swb = b0 + e A: swb = b0 + b1* comp + e

Write-up: Testing one continuous predictor against zero We estimated a simple regression model in which we regressed participants' subjective well-being scores on their self-complexity scores. We observed a positive relationship between the two variables, b1 = 1.18, F(1,12) = 6.05 [t(12) = 2.46], p < .03, h2 = .34. [For every unit increase in self-complexity, participants' self-complexity scores increased by 1.18 units.] As can be seen in Figure 1, the more individuals had a complex representation of the self, the more they reported feeling satisfied with their life. p

Publication-quality graph

Model 2b: What happens if we "center" self-complexity? Different models > d$compC <- d$comp - mean(d$comp) > describe (d) var n mean sd median trimmed mad min max range skew kurtosis se swb 1 14 5.0 2.60 5.0 5.0 2.97 1.0 9.0 8.0 0 -1.59 0.70 comp 2 14 2.5 1.28 2.5 2.5 1.48 0.2 4.8 4.6 0 -1.00 0.34 compC 4 14 0.0 1.28 0.0 0.0 1.48 -2.3 2.3 4.6 0 -1.00 0.34

Model 2b: ? ? swb = b0 + b1*compC + e ; (here: b0 = 5.00 and b1 = 1.18) Different models P = 2 ; SSE = 58.49 ; The "intercept" b0 the "slope" b1

Model 2: Model 2b: swb = b0 + b1*comp + e = = 2.05 + 1.18 comp + e ; P = 2 ; SSE = 58.49 ; swb = b0 + b1*compC + e = = 5.00 + 1.18 compC+ e ; P = 2 ; SSE = 58.49 ; Different models

C: swb = b0 + e = 5 + e ; P = 1 ; SSE = 88 ; A: swb = b0 + b1* compC + e = 5.00 + 1.18*compC + e ; P = 2 ; SSE = 58.49 ; Model Comparison 2b Augmented model Compact model

C: swb = b0 + e = 5 + e ; P = 1 ; SSE = 88 ; A: swb = b0 + b1* compC + e = 5.00 + 1.18*compC + e ; P = 2 ; SSE = 58.49 ; Model Comparison 2b p

Centering an independent variable (subtracting a constant) doesn't change much. Take-home message Especially, it doesn't affect the test of the regression coefficient associated with this variable.

Ex: data s_complexity 3 inter.dat Models with a single dichotomous predictor > d <- lm.readDat ("data s_complexity 3 inter.dat") > describe (d) var n mean sd median trimmed mad min max range skew kurtosis se swb 1 14 5.0 2.60 5.0 5.0 2.97 1.0 9.0 8.0 0 -1.59 0.70 comp 2 14 2.5 1.28 2.5 2.5 1.48 0.2 4.8 4.6 0 -1.00 0.34 sex 3 14 1.5 0.52 1.5 1.5 0.74 1.0 2.0 1.0 0 -2.14 0.14 > describeBy(d$swb,d$sex) group: 1 var n mean sd median trimmed mad min max range skew kurtosis se 1 1 7 3.43 2.3 3 3.43 1.48 1 8 7 0.89 -0.55 0.87 ------------------------------------------------------------------------- group: 2 var n mean sd median trimmed mad min max range skew kurtosis se 1 1 7 6.57 1.9 7 6.57 1.48 3 9 6 -0.59 -0.8 0.72

> m2c <- lm(d$swb ~ d$sex) > summary(m2c) Call: lm(formula = d$swb ~ d$sex) Residuals: Min 1Q Median 3Q Max -3.5714 -1.2143 0.0000 0.5714 4.5714 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.2857 1.7833 0.160 0.8754 d$sex 3.1429 1.1279 2.787 0.0165 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1 Residual standard error: 2.11 on 12 degrees of freedom Multiple R-squared: 0.3929, Adjusted R-squared: 0.3423 F-statistic: 7.765 on 1 and 12 DF, p-value: 0.01645 > lm.sumSquares(m2c) SS dR-sqr pEta-sqr df F p-value (Intercept) 0.1142857 0.0013 0.0021 1 0.0257 0.8754 d$sex 34.5714286 0.3929 0.3929 1 7.7647 0.0165 Error (SSE) 53.4285714 NA NA 12 NA NA Total (SST) 88.0000000 NA NA NA NA NA b0 b1

Model 2c: Models with a single dichotomous predictor swb = b0 + b1*sex + e ; (here: b0 = .29 and b1 = 3.14) men women sex 0 1 2 3 swb .29 3.43 6.57 9.71 ^

Model 2c: swb = b0 + b1*sex + e ; (here: b0 = .29 and b1 = 3.14) Different models P = 2 ; SSE = 53.43 ; the "slope" b1 the "intercept" b0

C: swb = b0 + e = 5 + e ; P = 1 ; SSE = 88 ; A: swb = b0 + b1* sex + e = .29 + 3.14*sex + e ; P = 2 ; SSE = 53.43 ; Model Comparison 2c Augmented model Compact model

C: swb = b0 + e = 5 + e ; P = 1 ; SSE = 88 ; A: swb = b0 + b1* sex + e = .29 + 3.14*sex + e ; P = 2 ; SSE = 53.43 ; Model Comparison 2c

> m2c <- lm(d$swb ~ d$sex) > summary(m2c) Call: lm(formula = d$swb ~ d$sex) Residuals: Min 1Q Median 3Q Max -3.5714 -1.2143 0.0000 0.5714 4.5714 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.2857 1.7833 0.160 0.8754 d$sex 3.1429 1.1279 2.787 0.0165 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1 Residual standard error: 2.11 on 12 degrees of freedom Multiple R-squared: 0.3929, Adjusted R-squared: 0.3423 F-statistic: 7.765 on 1 and 12 DF, p-value: 0.01645 > lm.sumSquares(m2c) SS dR-sqr pEta-sqr df F p-value (Intercept) 0.1142857 0.0013 0.0021 1 0.0257 0.8754 d$sex 34.5714286 0.3929 0.3929 1 7.7647 0.0165 Error (SSE) 53.4285714 NA NA 12 NA NA Total (SST) 88.0000000 NA NA NA NA NA Model comparison: C: swb = b0 + e A: swb = b0 + b1* sex + e

C: swb = b0 + e = 5 + e ; P = 1 ; SSE = 88 ; A: swb = b0 + b1* sex + e = .29 + 3.14*sex + e ; P = 2 ; SSE = 53.43 ; Model Comparison 2c Mathematical interpretations: It's worth it to estimate the additional parameter. The parameter b1 (the slope) is reliably different from zero. Conceptual interpretation: There is a statistically significant relationship between subjective well-being and sex. The effect is quite large. Conceptual interpretation: ?

Write-up: Testing one dichotomous predictor against zero We ran an independent samples t-test with subjective well-being as the outcome variable and sex as the predictor variable. The effect of sex was statistically significant, [b1 = 3.14,] t(12) = 2.79 [F(1,12) = 7.77], p < .02, h2 = .39. As can be seen in Figure 1, women (M = 6.57, SD = 1.90) reported higher subjective well-being than men (M = 3.43, SD = 2.30). p

Write-up: Testing one dichotomous predictor against zero

We analyze dichotomous predictors just as we analyze continuous predictors. Take-home message There is no fundamental difference between dichotomous predictors and continuous predictors.

Model 2d: What happens if we "center" sex? Different models d$sexC[d$sex==1] = -0.5 d$sexC[d$sex==2] = +0.5 > describe (d) var n mean sd median trimmed mad min max range skew kurtosis se swb 1 14 5.0 2.60 5.0 5.0 2.97 1.0 9.0 8.0 0 -1.59 0.70 comp 2 14 2.5 1.28 2.5 2.5 1.48 0.2 4.8 4.6 0 -1.00 0.34 sex 3 14 1.5 0.52 1.5 1.5 0.74 1.0 2.0 1.0 0 -2.14 0.14 sexC 4 14 0.0 0.52 0.0 0.0 0.74 -0.5 0.5 1.0 0 -2.14 0.14

Model 2d: swb = b0 + b1*sexC + e ; (here: b0 = 5.00 and b1 = 3.14) Different models P = 2 ; SSE = 53.43 ; the "intercept" b0 the "slope" b1

Model 2c: Model 2d: swb = b0 + b1*sex + e = = .29 + 3.14 sex + e ; P = 2 ; SSE = 53.43 ; swb = b0 + b1*sexC + e = = 5.00 + 3.14 sexC+ e ; P = 2 ; SSE = 53.43 ; Different models

Centering an independent variable doesn't change much… … regardless of whether it is a continuous or a dichotomous independent variable. Take-home message We center - continuous variables by subtracting the mean d$compC <- d$comp – mean(d$comp) - dichotomous variables by recoding them into -.5 and +.5 d$sexC[d$sex==1] = -0.5 d$sexC[d$sex==2] = +0.5

Just for fun η2 = partial eta squared = PRE = proportional reduction in error = proportion of variance explained p ^ A

Scaling a variable ANOVA table Stuff to add (if time permits) Alternative representation of a dichotomous predictor (two flat lines)

Inferential statistics