Thursday August 29, 2013

Thursday August 29, 2013 The Z Transformation

Today: Z-Scores • First--Upper and lower real limits: • Boundaries of intervals for scores that are represented on a continuous number line • Always half-way between adjacent categories (usually x.5 but may be x.05, x.15, etc. if categories are decimalized) • Mainly important for making accurate histograms & identifying percentile ranks

Today: Z-Scores Any (other) questions from last time?

Topics for today • The Z transformation (Chapter 5) • If we have time, we’ll begin our review of probability (Chapter 6)

The Z transformation If you know the mean and standard deviation of a distribution, you can convert a given score into a Z score or standard score. This score is informative because it tells you where that score falls relative to other scores in the distribution.

Locating a score • Where is our raw score within the distribution? • The natural choice of reference is the mean (since it is usually easy to find). • So we’ll subtract the mean from the score (the result, , is called a “deviation score”). • The direction of the deviation will be given to us by the negative or positive sign on the deviation score • The distance of the deviation is the value of the deviation score

Reference point Direction Locating a score X1 - 100= +62 X1 = 162 X2 = 57 X2 - 100= -43

Reference point Below Above Locating a score X1 - 100= +62 X1 = 162 X2 = 57 X2 - 100= -43

Raw score Population mean Population standard deviation Transforming a score • The distance is the value of the deviation score • However, this distance is measured with the units of measurement of the score (such as inches, ounces, likert rating, etc). • Convert the score to a standard (neutral) score. In this case a z-score.

X1 - 100= +1.20 50 X2 - 100= -0.86 50 Transforming scores • A z-score specifies the precise location of each X value within a distribution. • Direction: The sign of the z-score (+ or -) signifies whether the score is above the mean or below the mean. • Distance: The numerical value of the z-score specifies the distance from the mean by counting the number of standard deviations between X and μ. X1 = 162 X2 = 57

Transforming a distribution • We can transform all of the scores in a distribution • We can transform any & all observations to z-scores if we know the distribution mean and standard deviation. • We call this transformed distribution a standardized distribution. • Standardized distributions are used to make dissimilar distributions comparable. • e.g., your height and weight • One of the most common standardized distributions is the Z-distribution.

transformation 50 150 µ µ Xmean = 100 Properties of the z-score distribution = 0

transformation +1 μ μ X+1std = 150 Properties of the z-score distribution 50 150 = 0 Xmean = 100 = +1

transformation -1 μ μ X-1std = 50 Properties of the z-score distribution 50 150 +1 = 0 Xmean = 100 = +1 X+1std = 150 = -1

Properties of the z-score distribution • Shape - the shape of the z-score distribution will be exactly the same as the original distribution of raw scores. Every score stays in the exact same position relative to every other score in the distribution. • Mean - when raw scores are transformed into z-scores, the mean will always = 0. • The standard deviation - when any distribution of raw scores is transformed into z-scores the standard deviation will always = 1.

Self-monitor your understanding • Next, we’ll find out how to convert z-scores back into raw scores. • Before we move on, any questions about z-scores (what they are, how to compute them from raw scores, properties of the z distribution)?

μ μ transformation 50 150 -1 +1 Z = -0.60 From z to raw score • We can also transform a z-score back into a raw score if we know the mean and standard deviation information of the original distribution. Z = (X - μ) (Z)(σ) = (X - μ)  X = (Z)(σ) + μ σ X = (-0.60)( 50) + 100 X = 70

Let’s try it with our data To transform data on mothers’ height into standard scores, use the formula bar in excel to subtract the mean and divide by the standard deviation. Can also choose standardize (x,mean,sd) Show with fathers’ height Observe how height and shoe size can be more easily compared with standard (z) scores

Z-transformations with SPSS You can also do this in SPSS. Use Analyze …. Descriptive Statistics…. Descriptives …. Check the box that says “save standardized values as variables.”

What if you are dealing with a sample (not a population)? Use s instead of  in the formula for Z: To z-transform a population of scores: Z = (X - μ) σ To z-transform a sample of scores: Z = (X - M) s **SPSS produces z-scores based on sample formula; Excel can do either (stdev.p or stdev.s)

57 Other standardized distributions 85 29 43 =57 =14

57 Other standardized distributions 85 29 43 0 2 1 -1 -2 Original (X): =57 =14 Z-Scores: =0 =1

57 Other standardized distributions 85 29 43 0 2 1 -1 -2 60 50 70 40 30 Original (X): =57 =14 Z-Scores: =0 =1 Standardized: =50 =10

In-Class Exercise: • Find the standard deviation for the following population of scores: 1,3,4,4,5,7,9 • Find the standard deviation for the following sample of scores: 1,2,2,3,9,10 • For a distribution with µ=40 and =12, find the z-score for each of the following scores: a. X=36 b. X=46 c. X=56 • A population with a mean of µ=44 and a standard deviation of =6 is standardized to create a new distribution of with µ=50 and =10. • What is the new value for an original score of X=47? • If the new score is 65, what was the original score?

Probability & the Normal Distribution We have talked about distributions, and how to describe the shape, center, & spread of a distribution. We have learned how to convert a distribution of raw scores into a distribution of z-scores. Next we will review some basic probability concepts. Later, we will see how these apply to scores and distributions. Questions before we move on?

Why do we need to know about probability in this class? • Inferential statistics • Focused on making inferences about a population based on sample data • Probability helps us connect a sample to its population • If we know (or can estimate) population parameters, we can use probability to tell us how likely (or unlikely) it is that a given sample came from the population of interest

Basics of Probability Probability Expected relative frequency of a particular outcome, in a situation in which several different outcomes are possible Outcome Could be the result of a coin toss or experiment, could be obtaining a particular score on a variable of interest

Flipping a coin example One outcome classified as heads 1 = = 0.5 2 Total of two outcomes What are the odds of getting a “heads”? n = 1 flip

Flipping a coin example One 2 “heads” outcome = 0.25 Four total outcomes What are the odds of getting two “heads”? n = 2 Number of heads 2 1 1 0 # of outcomes = 2n This situation is known as the binomial

Flipping a coin example Three “at least one heads” outcome = 0.75 Four total outcomes What are the odds of getting “at least one heads”? n = 2 Number of heads 2 1 1 0

Flipping a coin example Number of heads n = 3 HHH 3 HHT 2 HTH 2 HTT 1 2 THH THT 1 TTH 1 TTT 0 = 23 = 8total outcomes 2n

HHH 5 3H = 5 HHH HHH HHH HHH HHT 5 2H = 11 HHT HHT HHT HHT HTH 3 HTH HTH THH 3 THH THH HTT 0 1H = 4 THT 2 THT TTH 2 TTH TTT 6 0H = 6 TTT TTT TTT TTT TTT

Connection between probabilities & graphs • We usually have a population of scores that can be displayed in a graph (such as a histogram) • Each portion of the graph represents a different proportion of the population • The proportion is equivalent to the probability of obtaining an individual in that portion of the graph

HHH 5 3H = 5 HHH HHH HHH HHH HHT 5 2H = 11 HHT HHT HHT HHT HTH 3 HTH HTH THH 3 THH THH HTT 0 1H = 4 THT 2 THT TTH 2 TTH TTT 6 0H = 6 TTT TTT TTT TTT TTT

Example Population with the following scores: 1,1,2,3,3,4,4,4,5,6

Example • What is the probability of obtaining a score greater than 4? • p(X>4) = ?

Example Find the following probabilities: • p(X>2) = ? • p(X>5) = ? • P(X<3) = ?

Check your understanding • We are about to look at the normal distribution and see how probability concepts are related to this specific distribution. • Before we move on, any questions about probability, how to compute it, how it is related to frequency graphs, etc.?

The Normal Distribution • Normal distribution

-2 -1 0 1 2 The Normal Distribution • Normal distribution is a commonly found distribution that is symmetrical and unimodal. • Not all unimodal, symmetrical curves are Normal, so be careful with your descriptions • It is defined by the following equation: • The mean, median, and mode are all equal for this distribution.

-2 -1 0 1 2 The Normal Distribution This equation provides x and y coordinates on the graph of the frequency distribution. You can plug a given value of x into the formula to find the corresponding y coordinate. Since the function describes a symmetrical curve, note that the same y (height) is given by two values of x (representing two scores an equal distance above and below the mean) Y =

-2 -1 0 1 2 The Normal Distribution As the distance between the observed score (x) and the mean increases, the value of the expression (i.e., the y coordinate) decreases. Thus the frequency of observed scores that are very high or very low relative to the mean, is low, and as the difference between the observed score and the mean gets very large, the frequency approaches 0. Y =

-2 -1 0 1 2 The Normal Distribution • As the distance between the observed score (x) and the mean decreases (i.e., as the observed value approaches the mean), the value of the expression (i.e., the y coordinate) increases. • The maximum value of y (i.e., the mode, or the peak in the curve) is reached when the observed score equals the mean – hence mean equals mode. Y =

-2 -1 0 1 2 The Normal Distribution • The integral of the function gives the area under the curve (remember this if you took calculus?) • The distribution is asymptotic, meaning that there is no closed solution for the integral. • It is possible to calculate the proportion of the area under the curve represented by a range of x values (e.g., for x values between -1 and 1). Y =

Check your understanding • Next we will see how probability concepts are related to the normal distribution, by learning about the Unit Normal Table. • Before we move on, any questions about the properties of the normal distribution?

The Unit Normal Table (Appendix B) The normal distribution is often transformed into z-scores. • Unit Normal Table gives the precise proportion of scores (in z-scores) between the mean (Z score of 0) and any other Z score in a Normal distribution • Contains the proportions in the tail to the left of corresponding z-scores of a Normal distribution • This means that the table lists only positive Z scores • Note that for z=0 (i.e., at the mean), the proportion of scores to the left is .5 Hence, mean=median.

34.13% 2.28% 13.59% At z = +1: Using the Unit Normal Table 50%-34%-14% rule Similar to the 68%-95%-99% rule -2 -1 0 1 2 15.87% (13.59% and 2.28%) of the scores are to the right of the score 100%-15.87% = 84.13% to the left

Using the Unit Normal Table • Steps for figuring the percentage above or below a particular raw or Z score: 1. Convert raw score to Z score (if necessary) 2. Draw normal curve, where the Z score falls on it, shade in the area for which you are finding the percentage 3. Make rough estimate of shaded area’s percentage (using 50%-34%-14% rule)

Using the Unit Normal Table • Steps for figuring the percentage above or below a particular raw or Z score: 4.Find exact percentage using unit normal table 5. If needed, subtract percentage from 100%. 6. Check the exact percentage is within the range of the estimate from Step 3

So 90.32% got your score or lower That’s 9.68% above this score SAT Example problems • The population parameters for the SAT are: m = 500, s = 100, and it is Normally distributed Suppose that you got a 630 on the SAT. What percent of the people who take the SAT get your score or lower? • From the table: • z(1.3) =.9032

Thursday August 29, 2013