Sequential sums of squares

Sequential sums of squares … or … extra sums of squares

Sequential sums of squares: what are they? • The reduction in the error sum of squares when one or more predictor variables are added to the regression model. • Or, the increase in the regression sum of squares when one or more predictor variables are added to the regression model.

Sequential sums of squares:why? • They can be used to test whether one slope parameter is 0. • They can be used to test whether a subset (more than two, but less than all) of the slope parameters are 0.

Example: Brain and body size predictive of intelligence? • Sample of n = 38 college students • Response (Y): intelligence based on the PIQ (performance) scores from the (revised) Wechsler Adult Intelligence Scale. • Predictor (X1): Brain size based on MRI scans (given as count/10,000) • Predictor (X2): Height in inches • Predictor (X3): Weight in pounds

OUTPUT #1 The regression equation is PIQ = 4.7 + 1.18 MRI Predictor Coef SE Coef T P Constant 4.65 43.71 0.11 0.916 MRI 1.1766 0.4806 2.45 0.019 Analysis of Variance Source DF SS MS F P Regression 1 2697.1 2697.1 5.99 0.019 Error 36 16197.5 449.9 Total 37 18894.6

OUTPUT #2 The regression equation is PIQ = 111 + 2.06 MRI - 2.73 Height Predictor Coef SE Coef T P Constant 111.28 55.87 1.99 0.054 MRI 2.0606 0.5466 3.77 0.001 Height -2.7299 0.9932 -2.75 0.009 Analysis of Variance Source DF SS MS F P Regression 2 5572.7 2786.4 7.32 0.002 Residual 35 13321.8 380.6 Total 37 18894.6 Source DF Seq SS MRI 1 2697.1 Height 1 2875.6

OUTPUT #3 The regression equation is PIQ = 111 + 2.06 MRI - 2.73 Height + 0.001 Weight Predictor Coef SE Coef T P Constant 111.35 62.97 1.77 0.086 MRI 2.0604 0.5634 3.66 0.001 Height -2.732 1.229 -2.22 0.033 Weight 0.0006 0.1971 0.00 0.998 Analysis of Variance Source DF SS MS F P Regression 3 5572.7 1857.6 4.74 0.007 Error 34 13321.8 391.8 Total 37 18894.6 Source DF Seq SS MRI 1 2697.1 Height 1 2875.6 Weight 1 0.0

Sequential sums of squares: definition using SSE notation • SSR(X2|X1) = SSE(X1) - SSE(X1,X2) • In general, you subtract the error sum of squares due to all of the predictors both left and right of the bar from the error sum of squares due to the predictor to the right of the bar. • SSR(X2,X3|X1) = SSE(X1) - SSE(X1,X2,X3)

Sequential sums of squares: definition using SSR notation • SSR(X2|X1) = SSR(X1,X2) – SSR(X1) • In general, you subtract the regression sum of squares due to the predictor to the right of the bar from the regression sum of squares due to all of the predictors both left and right of the bar. • SSR(X2,X3|X1) = SSR(X1,X2,X3)-SSR(X1)

Decomposition of regression sum of squares In multiple regression, there is more than one way to decompose the regression sum of squares. For example:

OUTPUT #2 The regression equation is PIQ = 111 + 2.06 MRI - 2.73 Height Predictor Coef SE Coef T P Constant 111.28 55.87 1.99 0.054 MRI 2.0606 0.5466 3.77 0.001 Height -2.7299 0.9932 -2.75 0.009 Analysis of Variance Source DF SS MS F P Regression 2 5572.7 2786.4 7.32 0.002 Residual 35 13321.8 380.6 Total 37 18894.6 Source DF Seq SS MRI 1 2697.1 Height 1 2875.6

OUTPUT #4 The regression equation is PIQ = 111 - 2.73 Height + 2.06 MRI Predictor Coef SE Coef T P Constant 111.28 55.87 1.99 0.054 Height -2.7299 0.9932 -2.75 0.009 MRI 2.0606 0.5466 3.77 0.00 Analysis of Variance Source DF SS MS F P Regression 2 5572.7 2786.4 7.32 0.002 Error 35 13321.8 380.6 Total 37 18894.6 Source DF Seq SS Height 1 164.0 MRI 1 5408.8

Decomposition of SSR: how?

Even more ways to decompose SSR when 3 or more predictors

A sequential sum of squares involving one extra predictor variable has one degree of freedom associated with it: A sequential sum of squares involving two extra predictor variables has two degrees of freedom associated with it: Degrees of freedom and regression mean squares

Sequential sums of squares in Minitab • The SSR is automatically decomposed into one-degree-of-freedom sequential sums of squares, in the order in which the predictor variables are entered into the model. • To get sequential sum of squares involving two or more predictor variables, sum the appropriate one-degree-of-freedom sequential sums of squares.

OUTPUT #5 The regression equation is PIQ = 111 - 2.73 Height + 0.001 Weight + 2.06 MRI Predictor Coef SE Coef T P Constant 111.35 62.97 1.77 0.086 Height -2.732 1.229 -2.22 0.033 Weight 0.0006 0.1971 0.00 0.998 MRI 2.0604 0.5634 3.66 0.001 Analysis of Variance Source DF SS MS F P Regression 3 5572.7 1857.6 4.74 0.007 Error 34 13321.8 391.8 Total 37 18894.6 Source DF Seq SS Height 1 164.0 Weight 1 169.5 MRI 1 5239.2

Testing one slope β1= βMRI is 0 Predictor Coef SE Coef T P Constant 111.35 62.97 1.77 0.086 Height -2.732 1.229 -2.22 0.033 Weight 0.0006 0.1971 0.00 0.998 MRI 2.0604 0.5634 3.66 0.001 Analysis of Variance Source DF SS MS F P Regression 3 5572.7 1857.6 4.74 0.007 Error 34 13321.8 391.8 Total 37 18894.6 Source DF Seq SS Height 1 164.0 Weight 1 169.5 MRI 1 5239.2

Testing one slope β2= βHT is 0 Predictor Coef SE Coef T P Constant 111.35 62.97 1.77 0.086 MRI 2.0604 0.5634 3.66 0.001 Weight 0.0006 0.1971 0.00 0.998 Height -2.732 1.229 -2.22 0.033 Analysis of Variance Source DF SS MS F P Regression 3 5572.7 1857.6 4.74 0.007 Error 34 13321.8 391.8 Total 37 18894.6 Source DF Seq SS MRI 1 2697.1 Weight 1 940.9 Height 1 1934.7

Testing one slope β3= βWT is 0 Predictor Coef SE Coef T P Constant 111.35 62.97 1.77 0.086 MRI 2.0604 0.5634 3.66 0.001 Height -2.732 1.229 -2.22 0.033 Weight 0.0006 0.1971 0.00 0.998 Analysis of Variance Source DF SS MS F P Regression 3 5572.7 1857.6 4.74 0.007 Error 34 13321.8 391.8 Total 37 18894.6 Source DF Seq SS MRI 1 2697.1 Height 1 2875.6 Weight 1 0.0

Full model: Reduced model: Testing one slope βk is 0: why it works?

The general linear test statistic: becomes: Testing one slope βk is 0: why it works? (cont’d)

Full model: Reduced model: Testing whether β2 = β3 = 0

The general linear test statistic: becomes: Testing whether β2 = β3 = 0 (cont’d)

P-value is: Cumulative Distribution Function F distribution with 2 DF in numerator and 34 DF in denominator x P( X <= x ) 3.6700 0.9640

Getting P-value for F-statistic in Minitab • Select Calc >> Probability Distributions >> F… • Select Cumulative Probability. Use default noncentrality parameter of 0. • Type in numerator DF and denominator DF. • Select Input constant. Type in F-statistic. Answer appears in session window. • P-value is 1 minus the number that appears.

Test whether β1 = β3 = 0 Analysis of Variance Source DF SS MS F P Regression 3 5572.7 1857.6 4.74 0.007 Error 34 13321.8 391.8 Total 37 18894.6 Source DF Seq SS Height 1 164.0 Weight 1 169.5 MRI 1 5239.2

Sequential sums of squares

Sequential sums of squares

Presentation Transcript

Sums of Squares

SUMS OF RANDOM VARIABLES

Sums of Cubes

Partial Sums

Sequential sums of squares

6 Sums of Games

Sums of Games

Estimating Sums

Hard sums

Fermat’s Theorem on Sums of Squares

Squares of Opposition

BIBD and Adjusted Sums of Squares

Sums of Multiples Series

Sums of Sums and Series

Riemann Sums

Fitting Sums of Gaussians

Partial Sums

“Squares of Opposition”

Method of Least Squares (Least Squares Regression):

Difference of Squares