200 likes | 300 Vues
Learn about mean, median, and mode as measures of central tendency in statistics. Explore their significance and applications in data analysis. Understand the difference between samples and populations, and how to calculate each measure.
E N D
Central Tendency Statistics 2126
Introduction • As useful as like histograms and such are, it would be nice to describe data in terms of Central Tendency • A single number to describe a sample • BTW, the Sample is a subset of the population • We are almost always dealing with samples
Back when I was in first year… • 77 80 83 70 90 • Would be nice to describe how I did in first year with a number • Well the one we are all pretty used to is the mean or arithmetic average • The sum of all of the data points, divided by the number of data points
The Mean • Sort of a balancing point in the data • Simply adding up the numbers and dividing by the number of observations (n) • X bar is for the sample • We might want to consider my first year marks as a population
For a population • The formula does not change, but the symbol does • We use statistics for samples • We use parameters for populations • • The formula is the same really
The mean is not mean • In the population, the mean does not change • The sample, yeah it changes, sample to sample • Parameters do NOT CHANGE
However, the lecture is getting meaner • If you sample from a population you will get different values for x bar each time • We don’t care about samples in the long run, we care about populations • Calculating is pretty hard, umm it takes forever • Used sometimes, elections, the census
Samples vs. populations • A good sample will give you a killer estimate of the population • The census could be done via sampling actually • This is because x bar is an unbiased estimator of • It overestimates as often as it underestimates
Weighted averages sometimes • Some assignments worth more than others for example • There are other measures of central tendency though
The median • No need for a formula here • 50th percentile • Midpoint • Half below, half above
The mode • The most common observation • Virtually useless • Example 25 25 37 42 25 • The mode is 25 • Tough eh…
If…. • If the median = mean = mode we have a unimodal, symmetrical distribution • Say IQ in the population, all measures of central tendency = 100
Normal distribution • You don’t have to get a normal distribution when you have a unimodal, symmetrical distribution • It is probably the most common one though
Why? • Why do we need all of these measures of central tendency? • They all have different properties • The mode is useless… • So let’s move on
Median vs. the Mean • Say you have five numbers • 1 2 3 4 5 • The mean is 3, as is the median • (BTW, the mode is umm well there are 5 of them) • Add another value • 750
Mean vs median in a final all out battle to the death • Now the mean is 127.5 • So adding an extreme value really affects the mean • Median is now umm let’s see • 1 2 3 4 5 750 • 3.5 • cool
Median for the win • So sometimes it is good • Think about say union negotiations • Both sides can talk about average salary • Both are right! • In this case the median is more useful
So the median is useful • Especially when there are outliers • However you want to leave them in • When you want to take all of the scores into account though you have to use the mean really • All of our techniques are about means • The median is, pretty much, a dead end statistically
Running out of pithy titles • The mean is most useful for symmetrical distributions • Most distributions we deal with will be like this • Most are pretty much symmetrical, more or less