Statistical Techniques I EXST7005

Statistical Techniques I EXST7005 Other topics - Linear models and Transformations

Course Progression • Objective - Hypothesis testing Background • Transformation - • Many applications in statistics require modifying an existing distribution to a recognized statistical distribution • Particularly, tests of hypotheses require taking an observed distribution and transforming to a recognized statistical distribution.

The simplest form of the linear additive model • Yi =  + i for i=1, 2, 3,...,N • This is a population version of the model, so the term  is a constant, the population mean • The sample version would useYi, which is a variable. LINEAR MODELS

Yi =  + i for i=1, 2, 3,...,N • i represents the deviations of the observations from the mean. It has a mean of zero since deviations sum to zero. • ei would be used to represent sample deviations, • and or course N would be changed to n for a sample. • Yi = Y + ei for i=1, 2, 3,...,n LINEAR MODELS (continued)

This is a mathematical representation of a population or sample. All of the analyses discussed in the Statistical Methods courses have a linear model. The models get more complex as the analysis gets more advanced. • Multiplicative models and multiplicative errors exist, but are not covered in basic statistical methods. NOTE THAT THE ERROR TERM IN THIS MODEL IS ADDITIVE. LINEAR MODELS (continued)

LINEAR MODELS (continued) • Other models we will discuss this semester include • Yi = i + i for t-tests: • Yi =  + i + i for ANOVA, or another form of the t-test • Yi =  + Xi + i Simple Linear Regression

CODING and TRANSFORMATIONS • THEOREMS • If a constant "a" is added to each observation then, the mean of the data set will increase by "a" units the variance and standard deviation will remain unchanged • EXAMPLE: Population of size N = 4 • Yi = 2, 4, 6, 8 •  = Y / N = 20/4 = 5 • 2 = [Y2 - (Y)2/N] = (120 - 100) / 4 = 5 •  = 2.24

CODING and TRANSFORMATIONS (continued) • now add 10 to each observation • EXAMPLE: Population size still N = 4 • Yi = 12, 14, 16, 18 •  = Y / N = 60/4 = 15 • 2 = [Y2 - (Y)2/N] = (920 - 900) / 4 = 5 •  = 2.24 • Notice that the mean increased by 10 and the variance and standard deviation did not change.

CODING and TRANSFORMATIONS (continued) • NOTE that "a" may be either negative or positive, so we and add or subtract a constant from all values of Y . • if we took the values of Yi = 12, 14, 16, 18 and subtracted 10 from each value we would reverse the previous example. • When subtracting the mean is REDUCED by the value subtracted and the variance and standard deviation remain unchanged. • the mean would then ten less and the variance and standard deviation would be unchanged

CODING and TRANSFORMATIONS (continued) • Another theorem • If each observation Yi is multiplied by a constant "a" then, • the mean of the data set is "a" times the old mean • the new variance is "a2" times the old variance • the standard deviation is "a" times the old standard deviation

CODING and TRANSFORMATIONS (continued) • EXAMPLE: using the same Population as before; N = 4 • Y = 2, 4, 6, 8, •  = 5; 2 = 5;  = 2.24

CODING and TRANSFORMATIONS (continued) • let "a" be 10; so we multiply each observation by 10. • Yi = 20, 40, 60, 80 •  = Y / N = 200/4 = 50 • which is a = 10(5) = 50 • 2 = [Y2 -(Y)2/N]=(12000-10000)/4= 500 • which is a22 = 102(5) = 500 •  = 22.4 • which is as = 10(2.24) = 50

CODING and TRANSFORMATIONS (continued) • NOTE that "a" may also be an inverse (i.e. 1/a instead of a), so we can multiply or divide all values of Yi by any constant • if we took the values of Y=20, 40, 60, 80 and divided each Yi by 10, we would reverse the previous example. • For division, the mean is divided by the value "a" (1/10 ), the variance divided by "a2" (1/100), and the standard deviation divided by "a" (1/10 )

CODING and TRANSFORMATIONS (continued) • The TRANSFORMATION operations may be used in combination. • EXAMPLE: Population of size N = 3 • Y = 10, 20, 30: =20; 2 =66.67;  = 8.16 • The transformation is "divide by 10 (or multiply by 1/10 ) and subtract 2" • Yi = -1, 0, 1 (much easier to work with) • ' = Y / N = 0/3 = 0 • 2' = [Y2 -(Y)2/N] = (2 - 0)/3 = 0.66667 • ' = 0.816

CODING and TRANSFORMATIONS (continued) • To get back the original values we must reverse the transformation. • NOTE THAT ORDER IS IMPORTANT. • Above we 1) divided and then 2) subtracted. • To reverse this we must 1) add and then 2) multiply. •  = 10(' + 2) = 10(2) = 20

CODING and TRANSFORMATIONS (continued) • ADDITION AND SUBTRACTION DO NOT AFFECT MEASURES OF DISPERSION, so we need consider only the division • 2 = a22' = 100(0.66667) = 66.667 •  = a' = 10(0.816) = 8.16 • Note that there is no addition or subtraction for the measures of dispersion since they were unaffected by the original transformation.

OTHER TRANSFORMATIONS • The logarithmic transformation was mentioned previously. • Yi' = log(Yi) • if we calculate statistics such as the mean with the log transformed values, then detransform with the antilog. • antilog(log(Yi')/n) = e(log(Yi')/n) = GM(Yi) • This results in a "geometric mean"

OTHER TRANSFORMATIONS (continued) • HOWEVER, note that we cannot take the logarithm of 0 (zero), so if there are zeros in the data set we must combine two transformations. On common modification is to add 1 to all observations. • USE Yi' = log(Yi + 1) • be careful in detransforming to subtract 1 after taking the anti-log to detransform. Order is important.

OTHER TRANSFORMATIONS (continued) • The same is true for inverses used in calculating the harmonic mean • Yi' = 1/Yi • if we calculate the mean of the inverse transformed values, then detransform with the inverse to get the harmonic mean.

The "Z" TRANSFORMATION - we will use this a lot • It employs the previously discussed transformations in combination • or • This transformation standardizes ANY normal distribution to a different normal distribution with  = 0; 2 = 1;  = 1 OTHER TRANSFORMATIONS (continued)

OTHER TRANSFORMATIONS (continued) • This is necessary, because otherwise there are an infinite number of different normal distributions with different means and variances. • EXAMPLE: transform the data for a population of N = 4. • Y = 2, 4, 6, 8 • initially, calculate the mean and variance •  = 5; 2 = 5;  = 2.24

OTHER TRANSFORMATIONS (continued) • the transformation • Yi = (2-5)/2.24, (4-5)/2.24, (6-5)/2.24, (8-5)/2.24 = 1.34, 0.45, 0.45, 1.34 •  = 0 / 4 = 0 • 2 = 4 / 4 = 1 • NOTE: omit addition and subtraction from variance •  = 1 = 1 • We will see a lot more of the Z transformation.

Statistical Techniques I EXST7005