220 likes | 359 Vues
This course module covers essential concepts in statistical methods, focusing on linear models and transformations. We explore hypothesis testing, the additive model representation (Yi = μ + εi), coding and transformations, and how modifications affect data distribution. Key topics include sample versus population models, the influence of adding or multiplying constants on means and variances, and practical examples demonstrating these principles. Students will learn to apply these techniques to make observed data fit recognized statistical distributions effectively.
E N D
Statistical Techniques I EXST7005 Other topics - Linear models and Transformations
Course Progression • Objective - Hypothesis testing Background • Transformation - • Many applications in statistics require modifying an existing distribution to a recognized statistical distribution • Particularly, tests of hypotheses require taking an observed distribution and transforming to a recognized statistical distribution.
The simplest form of the linear additive model • Yi = + i for i=1, 2, 3,...,N • This is a population version of the model, so the term is a constant, the population mean • The sample version would useYi, which is a variable. LINEAR MODELS
Yi = + i for i=1, 2, 3,...,N • i represents the deviations of the observations from the mean. It has a mean of zero since deviations sum to zero. • ei would be used to represent sample deviations, • and or course N would be changed to n for a sample. • Yi = Y + ei for i=1, 2, 3,...,n LINEAR MODELS (continued)
This is a mathematical representation of a population or sample. All of the analyses discussed in the Statistical Methods courses have a linear model. The models get more complex as the analysis gets more advanced. • Multiplicative models and multiplicative errors exist, but are not covered in basic statistical methods. NOTE THAT THE ERROR TERM IN THIS MODEL IS ADDITIVE. LINEAR MODELS (continued)
LINEAR MODELS (continued) • Other models we will discuss this semester include • Yi = i + i for t-tests: • Yi = + i + i for ANOVA, or another form of the t-test • Yi = + Xi + i Simple Linear Regression
CODING and TRANSFORMATIONS • THEOREMS • If a constant "a" is added to each observation then, the mean of the data set will increase by "a" units the variance and standard deviation will remain unchanged • EXAMPLE: Population of size N = 4 • Yi = 2, 4, 6, 8 • = Y / N = 20/4 = 5 • 2 = [Y2 - (Y)2/N] = (120 - 100) / 4 = 5 • = 2.24
CODING and TRANSFORMATIONS (continued) • now add 10 to each observation • EXAMPLE: Population size still N = 4 • Yi = 12, 14, 16, 18 • = Y / N = 60/4 = 15 • 2 = [Y2 - (Y)2/N] = (920 - 900) / 4 = 5 • = 2.24 • Notice that the mean increased by 10 and the variance and standard deviation did not change.
CODING and TRANSFORMATIONS (continued) • NOTE that "a" may be either negative or positive, so we and add or subtract a constant from all values of Y . • if we took the values of Yi = 12, 14, 16, 18 and subtracted 10 from each value we would reverse the previous example. • When subtracting the mean is REDUCED by the value subtracted and the variance and standard deviation remain unchanged. • the mean would then ten less and the variance and standard deviation would be unchanged
CODING and TRANSFORMATIONS (continued) • Another theorem • If each observation Yi is multiplied by a constant "a" then, • the mean of the data set is "a" times the old mean • the new variance is "a2" times the old variance • the standard deviation is "a" times the old standard deviation
CODING and TRANSFORMATIONS (continued) • EXAMPLE: using the same Population as before; N = 4 • Y = 2, 4, 6, 8, • = 5; 2 = 5; = 2.24
CODING and TRANSFORMATIONS (continued) • let "a" be 10; so we multiply each observation by 10. • Yi = 20, 40, 60, 80 • = Y / N = 200/4 = 50 • which is a = 10(5) = 50 • 2 = [Y2 -(Y)2/N]=(12000-10000)/4= 500 • which is a22 = 102(5) = 500 • = 22.4 • which is as = 10(2.24) = 50
CODING and TRANSFORMATIONS (continued) • NOTE that "a" may also be an inverse (i.e. 1/a instead of a), so we can multiply or divide all values of Yi by any constant • if we took the values of Y=20, 40, 60, 80 and divided each Yi by 10, we would reverse the previous example. • For division, the mean is divided by the value "a" (1/10 ), the variance divided by "a2" (1/100), and the standard deviation divided by "a" (1/10 )
CODING and TRANSFORMATIONS (continued) • The TRANSFORMATION operations may be used in combination. • EXAMPLE: Population of size N = 3 • Y = 10, 20, 30: =20; 2 =66.67; = 8.16 • The transformation is "divide by 10 (or multiply by 1/10 ) and subtract 2" • Yi = -1, 0, 1 (much easier to work with) • ' = Y / N = 0/3 = 0 • 2' = [Y2 -(Y)2/N] = (2 - 0)/3 = 0.66667 • ' = 0.816
CODING and TRANSFORMATIONS (continued) • To get back the original values we must reverse the transformation. • NOTE THAT ORDER IS IMPORTANT. • Above we 1) divided and then 2) subtracted. • To reverse this we must 1) add and then 2) multiply. • = 10(' + 2) = 10(2) = 20
CODING and TRANSFORMATIONS (continued) • ADDITION AND SUBTRACTION DO NOT AFFECT MEASURES OF DISPERSION, so we need consider only the division • 2 = a22' = 100(0.66667) = 66.667 • = a' = 10(0.816) = 8.16 • Note that there is no addition or subtraction for the measures of dispersion since they were unaffected by the original transformation.
OTHER TRANSFORMATIONS • The logarithmic transformation was mentioned previously. • Yi' = log(Yi) • if we calculate statistics such as the mean with the log transformed values, then detransform with the antilog. • antilog(log(Yi')/n) = e(log(Yi')/n) = GM(Yi) • This results in a "geometric mean"
OTHER TRANSFORMATIONS (continued) • HOWEVER, note that we cannot take the logarithm of 0 (zero), so if there are zeros in the data set we must combine two transformations. On common modification is to add 1 to all observations. • USE Yi' = log(Yi + 1) • be careful in detransforming to subtract 1 after taking the anti-log to detransform. Order is important.
OTHER TRANSFORMATIONS (continued) • The same is true for inverses used in calculating the harmonic mean • Yi' = 1/Yi • if we calculate the mean of the inverse transformed values, then detransform with the inverse to get the harmonic mean.
The "Z" TRANSFORMATION - we will use this a lot • It employs the previously discussed transformations in combination • or • This transformation standardizes ANY normal distribution to a different normal distribution with = 0; 2 = 1; = 1 OTHER TRANSFORMATIONS (continued)
OTHER TRANSFORMATIONS (continued) • This is necessary, because otherwise there are an infinite number of different normal distributions with different means and variances. • EXAMPLE: transform the data for a population of N = 4. • Y = 2, 4, 6, 8 • initially, calculate the mean and variance • = 5; 2 = 5; = 2.24
OTHER TRANSFORMATIONS (continued) • the transformation • Yi = (2-5)/2.24, (4-5)/2.24, (6-5)/2.24, (8-5)/2.24 = 1.34, 0.45, 0.45, 1.34 • = 0 / 4 = 0 • 2 = 4 / 4 = 1 • NOTE: omit addition and subtraction from variance • = 1 = 1 • We will see a lot more of the Z transformation.