Créer une présentation
Télécharger la présentation

Télécharger la présentation
## Statistics 100 Lecture Set 6

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Re-cap**• Last day, looked at a variety of plots • For categorical variables, most useful plots were bar charts and pie charts • Looked at time plots for quantitative variables • Key thing is to be able to quickly make a point using graphical techniques**Re-cap**• Recall: • A distribution of a variable tells us what values it takes on and how often it takes these values.**Histograms**• Similar to a bar chart, would like to display main features of an empirical distribution (or data set) • Histogram • Essentially a bar chart of values of data • Usually grouped to reduce “jitteriness” of picture • Groups are sometimes called “bins”**Histogram**• Uses rectangles to show number (or percentage) of values in intervals • Y-axis usually displays counts or percentages • X-Axis usually shows intervals • Rectangles are all the same width**Example (discrete data)**• In a study of productivity, a large number of authors were classified according to the number of articles they published during a particular period of time.**Example (continuous data)**• Experiment was conducted to investigate the muzzle velocity of a anti-personnel weapon (King, 1992) • Sample of size 16 was taken and the muzzle velocity (MPH) recorded**Constructing a Histogram – continuous data**• Find minimum and maximum values of the data • Divide range of data into non-overlapping intervals of equal length • Count number of observations in each interval • Height of rectangle is number (or percentage) of observations falling in the interval • How many categories?**Example**• Experiment was conducted to investigate the muzzle velocity of a anti-personnel weapon (King, 1992) • Sample of size 16 was taken and the muzzle velocity (MPH) recorded**What are the minimum and maximum values?**• How do we divide up the range of data? • What happens if have too many intervals? • Too Few intervals? • Suppose have intervals from 240-250 and 250-260. In which interval is the data point 250 included?**Interpreting histograms**• Gives an idea of: • Location of centre of the distribution • How spread are the data • Shape of the distribution • Symmetric • Skewed left • Skewed right • Unimodal • Bimodal • Multimodal • Outliers • Striking deviations from the overall pattern**Example – mid-term 1 grades (2011)**• Was out of 34 + a bonus question (n=344)**Example – mid-term 1 grades (2011)**• Too many bins?**Example – mid-term 1 grades (2011)**• Too few bins**Example – mid-term 1 grades (2011)**• Potential outlier?**Numerical Summaries (Chapter 12)**• Graphic procedures visually describe data • Numerical summaries can quickly capture main features**Measures of Center**• Have sample of size n from some population, • An important feature of a sample is its central value • Most common measures of center - Mean & Median**Sample Mean**• The sample mean is the average of a set of measurements • The sample mean:**Sample Median**• Have a set of n measurements, • Median (M) is point in the data that divides the data in half • Viewed as the mid-point of the data • To compute the median: • For sample size “n”, compute position = (n+1)/2 • If position is a whole number, then M is the value at this position of the sorted data • If position falls between two numbers, then M is the value halfway between those two positions in the sorted data**Example**• Finding the Median, M, when n is odd • Example: Data = 7, 19, 4, 23, 18**Muzzle Velocity Example**• Data (n=16)**Muzzle Velocity Example**• Mean:**Muzzle Velocity Example**• Median:**Sample Mean vs. Sample Median**• Sometimes sample median is better measure of center • Sample median less sensitive to unusually large or small values called • For symmetric distributions the relative location of the sample mean and median is • For skewed distributions the relative locations are**Other Measures of interest**• Maximum • Minimum**Percentiles**• A percentile of a distribution is a value that cuts off the stated part of the distribution at or below that value, with the rest at or above that value. • 5th percentile: 5% of distribution is at or below this value and 95% is at or above this value. • 25th percentile: 25% at or below, 75% at or above • 50th percentile: 50% at or below, 50% at or above • 75th percentile: 75% at or below, 25% at or above • 90th percentile: 90% at or below, 10% at or above • 99th percentile: ___% at or below, ___% at or above**Percentiles**• Can be applied to a population or to a sample • Usually don’t know population • Use sample percentiles to estimate pop. percentiles • Standardized tests often measured in percentiles • Birth statistics often measured in percentiles • First daughter • 10th percentile weight • 25th percentile length • 95th percentile head circumference**Important Percentiles**• First Quartile • Second Quartile • Third Quartile**Computing the quartiles**• You know how to compute the median • Q1 = • Q3 =**Example**• Finding the other quartiles • For Q1, find the median of all values belowM. • For Q3, find the median of all values aboveM. • Example: 4, 7, 18, 19, 23, M=18 • Q1: • Q3: • Example: 4, 7, 12,18, 19, 23, M=15 • Q1: • Q3:**5 number summary often reported:**• Min, Q1, Q2 (Median), Q3, and Max • Summarizes both center and spread • What proportion of data lie between Q1and Q3?**Box-Plot**• Displays 5-number summary graphically • Box drawn spanning quartiles • Line drawn in box for median • Lines extend from box to max. and min values. • Some programs draw whiskers only to 1.5*IQR above and below the quartiles**Can compare distributions using side-by-side box-plots**• What can you see from the plot?**Example - Moisture Uptake**• There is a need to understand degradation of 3013 containers during long term storage • Moisture uptake is considered a key factor in degradation due to corrosion • Calcination removes moisture • Calcination temperature requirements were written with very pure materials in mind, but the situation has evolved to include less pure materials, e.g. high in salts (Cl salts of particular concern) • Calcination temperature may need to be reduced to accommodate salts. • An experiment is to be conducted to see how the calcination temperature impacts the mean moisture uptake**Working Example - Moisture Uptake**• Experiment Procedure: • Two calcination temperatures…wish to compare the mean uptake for each temperature • Have 10 measurements per temperature treatment • The temperature treatments are randomly assigned to canisters • Response: Rate of change in moisture uptake in a 48 hour period (maximum time to complete packaging)**Other Common Measure of Spread: Sample Variance**• Sample variance of n observations: • Units are in squared units of data**Sample Standard Deviation**• Sample standard deviation of n observations: • Has same units as data**Exercise**• Compute the sample standard deviation and variance for the Muzzle Velocity Example**Comments**• Variance and standard deviation are most useful when measure of center is • As observations become more spread out, s : increases or decreases? • Both measures sensitive to outliers • 5 number summary is better than the mean and standard deviation for describing (i) skewed distributions; (ii) distributions with outliers**Comments**• Standard deviation is zero when • Measures spread relative to**More interpretation of s**• Empirical Rule • If a distribution is bell-shaped and roughly symmetric, then • About 2/3 of data will lie within ±1s of • About 95% of data will lie within ±2s of • Usually all data will lie within ±3s of • So you can reconstruct a rough picture of the histogram from just two numbers**Example**• Mid-term:**Example**• Mid-term: • Empirical rule tells us