180 likes | 195 Vues
Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. Averages or measures of central tendency – describes a dataset. Three kinds: mean, median, mode.
E N D
Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. • Averages or measures of central tendency – describes a dataset. • Three kinds: mean, median, mode. • Mean: most common. Sum all the values in a group, divide by the total number of values in that group (Hint: start listing them in columns/headings).
Weighted Mean (symbol=Med): Multiply each value by its frequency. Sum. Divide by total frequency. • Median: the mean is very sensitive to outlier scores that skew the distribution; median is not. It is the midpoint value. Instructions: order all values. Find the middle-most score. That’s the median (if even number of cases, find middle-most two values; add them, divide by two). Percentiles: 50th percentile is the median. 75th percentile means score is at or above 75% of the other scores. • Mode: most frequent value. • When to use what. • Three kinds of data • Nominal – categorical data (race, region). • Ordinal – values are ranked, but not necessarily equal in distance (7 values indicating GOP support). • Interval – values are equal in distance (income). • Use mean for interval (and sometimes ordinal). Use mode for nominal (and sometimes ordinal), especially when generating %s. Use median for interval if you think there are outliers.
II. Variability – how much scores differ from one another. Which set of scores has greater variability? Set 1: 8,9,5,2,1,3,1,9 Set 2: 3,4,3,5,4,6,2,3 Means are Set 1: 4.75 and Set 2: 3.75. Tells us nothing of variability. Variability is more precisely how different/far scores are from the mean. III. Computing the Range Subtract the lowest score from the highest (r=h-l) What is the range of these scores? 98,86,77,56,48 Answer: 50 (98-48=50) IV. Computing the Standard Deviation The standard deviation (s) is the average amount of variability in a set of scores (average distance from mean).
Formula: Compute s for the following: 5,8,5,4,6,7,8,8,3,6 So, an s of 1.76 tells us that each score differs from the mean by an average of 1.76 points. • Purpose: to compare scores between different distributions, even when the means and standard deviations are different (e.g., men and women). Larger the s the greater the variability.
V. Graphing and Tables. Why? Describes data visually, more clearly. Frequency Distribution (Table 11-4) • Class Interval Column – divides the scores up into categories (0-4, 5-9, etc.). Usually range of 2,5,10, or 25 data points. Main thing: be consistent! • Frequency Column – number of scores within that range or category. VI. Graphs • Histogram – shows the distribution of scores by class interval. Can compare different distributions on the same histogram. Shows: • Variability • Skewness - If the mean is greater than the median, positive skewness. If median is greater than mean, negative skewness.
Skewness If the data set is symmetric, the mean equals the median. Median Mean
Skewness If the data set is skewed to the right, the mean is greater than the median. Mean Median
Skewness If the data set is skewed to the left, the mean is less than the median. Mean Median
B. Column Charts – simply tells the quantity of a category according to some scale. SCALE IS IMPORTANT (CSPAN-drug use story). • Bar Charts – same as Column chart, but reverse the axes. • Line Chart – Used to show trends (e.g. rise and fall in presidential popularity – line on page 317). • Pie Charts – Great for proportions (percent of MS budget going to each budget category).
VII. The Normal Curve and Probability Theory A. Tells us likelihood of an outcome • Tells us degree of confidence in a finding or outcome (i.e., how sure are we that the observed outcome is due to X versus random chance? AND how likely is it that our research hypothesis is true?). VIII. Normal Curve or Bell-Shaped Curve Properties (Fig. 11-6) A. Mean, median and mode are same NOT Skewed
B. Perfectly symmetrical about the mean (i.e., two halves fit perfectly together). C. Tails of the normal curve are asymptotic. Curves come close, but never touch the horizontal axis. Are curves usually normal? Yes, especially with large sets of data (more than 30). Most scores are concentrated in the center and few are concentrated at the ends (height, intelligence, coin flipping).
IX. Divisions of the Normal Curve (Fig. 11-9) • Mean is at the center B. Scores along x-axis correspond to standard deviations. C. Sections within the bell curve represent % of cases expected to fall therein. Geometrically true (these are percentages of entire normal distribution). D. For normal distributions (most data sets), practically all scores fall in between +3 and -3 sd’s (99.74%). Look at the probabilities of falling in between. 34.13% x 2 = 68.26% cases fall within 1 to -1 sd’s from mean.
X. Z-scores (standard scores; i.e. the # of standard deviations from the mean) A. Allow us to compare distributions with one another because they are scores that are standardized in units of standard deviations (can’t compare scores if they are measured differently; nonsensical). Different variables or groups will have different means and cannot be compared. But z-scores between groups of data can be compared because they are equivalent (e.g., one unit above or below the mean, respectively).
B. Formula and interpretation • Comparing z-scores from different distributions. -The raw scores of 12.8 and 64.8 in our data are equal distances from their respective means (z=.4 for both) • What z-scores represent A. Z-scores correspond to sections under the curve (percentages under the curve).
B. These percentages can be seen as probabilities of a certain score occurring given on the Z-score table. Example of what we are saying: “In a distribution with a mean of 100 and standard deviation of 10, what is the probability that any score in the data set will be 110 or above?” The answer = _________. C. What about a z-score of 1.38? What are the chances that a score will fall within the mean and a z-score of 1.38? _______ • What about above a z-score of 1.38?____ • What about at or below 1.38?______
What about between a z-score of 1 and 2.5? Answer:______ • Again, we are asking, what is the probability that a score will fall in between 1 and 2.5 standard deviations (z’s) of the mean? -1 and 2.5?