210 likes | 228 Vues
Explore the relationship between density estimates and histograms in statistics and probability, emphasizing resolution levels for data visualization. Understand how different bandwidths affect data representation and interpretation. Learn to create histograms and density plots manually for better comprehension.
 
                
                E N D
Raoul LePage Professor STATISTICS AND PROBABILITY www.stt.msu.edu/~lepage click on STT315_F06 Week 11-13-06
WEEK PLAN: Probability Histograms (Sec. 1-5) Data Smoothing - not in your text - Bonus Computer Project
Plot average heights of normal densities placed at each data value, e.g. {10, 14}. It is like smearing each sample value, as if it were a drop of paint, according to the thickness of a normal density. Each normal integrates to one, as does their average the “Sample Density Estimate” shown in dark. Smoothing data , so you can see it. normal densities at data {10, 14} density average
The mean of a sample density estimate is equal to the sample mean of its data.
Making the densities narrower isolates different parts of the data and reveals more detail. NARROWER TENTS = MORE DETAIL
Closer view of the density by itself, with narrow normal curves. density
Histograms lump data into categories (the black boxes), not as good for continuous data. DENSITY OR HISTOGRAM ? density histogram
Form of each rectangle comprising a Probability Histogram. Example: A sample of n = 40 finds three data values which are at least 30 but less than 35 (interval [30, 35)). height = area = w height = 3 / 40 = 3/(40 5) Histograms may radically change their shape in response to minor changes of bin locations or widths. ** * 30 35 bin-width w = 35 - 30 = 5
Plot of average heights of 5 tents placed at data {12, 21, 42, 8, 9}. DENSITY FOR { 12, 21, 42, 8, 9 } normal density smear data density
Narrower tents operate at higher resolution but they may bring out features that are illusory. IS DETAIL ILLUSORY ? which do we trust ? kinkier smoother
Population of N = 500 compared with two samples of n = 30 each. BEWARE OVER-FINE RESOLUTION POP mean = 32.02 population of N = 500 with 2 samples of n = 30
Population of N = 500 compared with two samples of n = 30 each. BEWARE OVER-FINE RESOLUTION sample means are close SAM1 mean = 33.03 SAM2 mean = 30.60 POP mean = 32.02 densities not good at fine resolution population of N = 500 with 2 samples of n = 30
The same two samples of n = 30 each from the population of 500. WE DO BETTER AT COARSE RESOLUTION SAM1 mean = 33.03 SAM2 mean = 30.60 POP mean = 32.02 how about coarse resolution ? population of N = 500 with 2 samples of n = 30
The same two samples of n = 30 each from the population of 500. WE DO BETTER AT COARSE RESOLUTION SAM1 mean = 33.03 SAM2 mean = 30.60 POP mean = 32.02 good at coarse resolution population of N = 500 with 2 samples of n = 30
The same two samples of n = 30 each from the population of 500. HOW ABOUT MEDIUM RESOLUTION ? SAM1 mean = 33.03 SAM2 mean = 30.60 POP mean = 32.02 medium resolution ? population of N = 500 with 2 samples of n = 30
The same two samples of n = 30 each from the population of 500. HOW ABOUT MEDIUM RESOLUTION ? SAM1 mean = 33.03 SAM2 mean = 30.60 POP mean = 32.02 not good at medium resolution population of N = 500 with 2 samples of n = 30
A sample of only n = 600 from a population of N = 500 million.(medium resolution) SAMPLING ONLY 600 FROM 500 MILLION ? large sample of n = 600 ? POP mean = 32.02 medium resolution ? population of N = 500,000 with a sample of n = 600
A sample of only n = 600 from a population of N = 500 million.(MEDIUM resolution) SAMPLING ONLY 600 FROM 500 MILLION ? mean very close sample of n = 600 sample mean = 32.84 POP mean = 32.02 densities are close population of N = 500,000 with a sample of n = 600
A sample of only n = 600 from a population of N = 500 million.(FINE resolution) SAMPLING ONLY 600 FROM 500 MILLION ? sample of n = 600 sample mean = 32.84 POP mean = 32.02 FINE resolution densities very close population of N = 500,000 with a sample of n = 600
TALKING POINTS A density is controlled by the sd, referred to as bandwidth, of the normal densities used to make it. 1a. You have to be content with the information revealed by the population density at your chosen bandwidth. 1b. Small samples zero-in on coarse densities, i.e. made at large bandwidth, fairly well . 1c. Samples in hundreds may perform remarkably well, even at fine resolution, I.e. small bandwidth. 2. Histograms are notorious for being unstable for some data. Yet, they remain popular. Learn to make them by hand. 3. Learn to make a density for 2 to 4 data values by hand.