360 likes | 458 Vues
STAT131 Week 3 Lecture 1a Transformations. Anne Porter. Lecture Outline. Review Exploring and Displaying data Making sense of Raw Data Good Graphics. Measuring Units and transformations. Task 1: A sample of data collected from students Height ShoeSize Sex MobilePhone 5’6” 8 f y
E N D
STAT131Week 3 Lecture 1a Transformations Anne Porter
Lecture Outline • Review • Exploring and Displaying data • Making sense of Raw Data • Good Graphics
Measuring Units and transformations Task 1: A sample of data collected from students Height ShoeSize Sex MobilePhone 5’6” 8 f y 6’ 11 m y 6’6” 15 m y 180cm 13 m y 170 9.5 m y • What do you notice with this data? • What went wrong?
Transforming Height • 6foot= cm
Transforming Height • 6foot= 12*6inches=72inches= 72*2.54 cm=182.88
Activity 2Transformation: Square root • Given a set of points x Transform each x by taking the square root and mark it on Z X 1 4 9 16 25 36 Z What does taking the square root do to this data?
Z 1 2 3 4 5 6 Activity 2Transformation: Square root • Given a set of points x X 1 4 9 16 25 36 Transform each x by taking the square root and call it Z What does taking the square root do to this data? Draws back a tail of high values
X Activity 2Transformation: Square root • Given a set of points Z transform them by squaring them and mark them on X Z 1 2 3 4 5 6 What can we do to spread out a set of data?
X 1 4 9 16 25 36 Activity 2Transformation: Square root • Given a set of points Z Z 1 2 3 4 5 6 What can we do to spread out a set of data? Square each value
Power transformations Table: Common Transformations (Griffiths,1998, p40)
Lies, damned lies and statistics • Is it cheating, misinterpreting when we transform data?
Revisiting Transformations • Convert units Why else do we transform data? • Spread out dense clusters • Contract values that are widely spaced • Reduce asymmetry and make numerical values more • representative of the data
Revisiting Transformations • Transformations allow us to see data from a different perspective • It is simpler to explain our data in terms of height in cm • ...but there is no reason that we cannot measure on other scales log(height)… • We may see different things when data is measured differently
Specific transformations • Square root • Square • Logarithm • Add and subtract constants • Multiply and divide by constants • Z scores to standardise data
Ex: Choose your marker If X and Y are the marks given by two markers who do you want to mark your work? Why? What is the mean mark for both?
Ex: Choose your marker If X and Y are the marks given by two markers who do you want to mark your work? Why? What is the mean mark for both? 15 25 5 3
What might you do to fix the problem? • Add 2 marks to everyone marked by X
Ex: Compare Z and Y • What is the median mark for Z? • What is the mean for Z? 25 Median y=5 5
Ex: Calculate mean • What is the mean for Z? • What is the median mark for both? Who do you want to mark your work Y or Z. Why? 25 25 5 5 Median Z = 5 Median y=5
Ex: Calculate mean • Who do you want to • mark your work • Y or Z. Why? • The spread of marks is different • Good students want 11 & have Y • Students low in confidence will take Z as the lowest is 3 not 1 25 25 5 5 Median Z = 5 Median y=5
Transforming by adding and subtracting constants Changes occur in • Mean • Median • Quartiles No changes occur in the • Range • Interquartile range • Variance • Standard deviation • Check this by doing the relevant calculations on Z and X
How might we alter the spread of marks? • Divide or multiply the scores? • We’ll use an easier set of data!
Ex: Calculate standard deviationsvariance and range Z 4 8 16 10 2
Ex: Calculate standard deviationsvariance and range Z 4 8 16 10 2 -4 0 8 2 -6 16 0 64 4 36 Range= 16-2=14 120
Ex: Let y= Z/2 and calculate mean of y Dividing Z by 2 we have halved the mean
Ex: Calculate standard deviation of y y = Z/2 2 4 8 5 1
Ex: Calculate standard deviation of y y = Z/2 2 4 8 5 1 -2 0 4 1 -3 4 0 16 1 9 Variance = 7.5 Range = 8-1=7 30
Ex: Compare Z and Y Y=Z/2 The standard deviation of y is half the standard deviation of z The variance of y is ¼ of variance of z The range of y is half the range of z
Dividing a data set by a positive constant 2 Changed in the same way • Mean is halved • Median is halved • Range is halved • Standard deviation is halved • But for the variance • The variance is quartered
Multiplying or Dividing by a positive constant • Changed in the same way • Mean • Median • Quartiles • Standard deviation But for variance the situation is different
Z score transformation • Transforms data so that it has • a mean of 0 and • Standard deviation of 1 • When two sets of scores are standardised in this manner, each with their own mean and standard deviation we can compare the standardised scores (Standardised to have a mean 0 and standard deviation 1).
Introduction to Correlation • Video Unit 13, Correlation • Note the use of standardized scores • Examines correlation as a measure of similarity • We will use the Z score transformation a lot!