330 likes | 497 Vues
Communicating Quantitative Information. Diagrams Sampling issue Risk & Communicating Risk Smoking. Hormone Replacement Therapy Dimension Homework: Prepare/Design diagram/chart. Postings. Special Survey. Will come back to topic of Sampling
E N D
Communicating Quantitative Information Diagrams Sampling issue Risk & Communicating Risk Smoking. Hormone Replacement Therapy Dimension Homework: Prepare/Design diagram/chart. Postings
Special Survey • Will come back to topic of Sampling • Accuracy (confidence, error) of tests, surveys depends on • quality of sample • size of sample • My sister asked me: can [young people] identify Einstein? • Use my students / sample of my students • Spring 2008: • students responding to request to take survey in both my classes • Last Spring: quantity was pretty small (14 + 11)
Preview • Margin of error: • Claim actual result (for whole population) is within certain limits (answer plus/minus MoE) • Confidence • confidence that this particular sample is not so unusual as to make results wrong where wrong means the actual result is outside the margin of error. • Generally, the means (averages) of samples are distributed normally around the mean of the whole population and the SD of this distribution is smaller (tighter) than the SD of the distribution for this quantity for the whole population.
Thought experiment…. • Want to get average height of people in the class. • Claim: impossible to measure everyone, so use a sample. • Only time to measure 6 people 62 72 68 66 69 71 for a sample mean of 68 • Statistics say (IF the sample is random) then we can be 95% confident that the mean of the whole class is between61.4 and 74.6 • will spend some more time on this
Thought experiment • Want to know favorable rating of the President from whole USA population. • Ask a sample • p (proportion) of sample view President favorably. • Want to make a statement with 99% confidence • Formula for the margin of error, call it E: we can be 95% sure that the proportion of the whole population favorable is within p-E and p+E • If we want to be 99% sure, then formula will give a bigger margin of error, call this F: we can be 99% sure that the proportion of the whole population favorable is within p-F and p+F.
Another way • Random sample of size N means that each person in the whole population equally likely to be in the sample • There are many samples of size N • The results of a sample of size N vary, but • Some samples of size N are very different from the whole population, but most aren’t • In most cases, the sample result will be close to the result for the whole population • What do I mean by close? Within the margin of error • What do I mean by most? This refers to the confidence interval (19/20, 99/100)
Formally… • The averages of samples of size N are normally distributed • The average (mean) is the mean of the whole population • The standard deviation is smaller by a factor of square root of N • Think of narrow mountain • To half (reduce to ½ what it was) the required margin of error, you need to quadruple (*4) the sample size
Note • The size of the whole population does not enter into these calculations!
Quality of sample • Does not mean: how good you are…in any way. • Does mean: how representative of general population • For the Einstein question, this means how representative of 'young people today'. Amend that to college students. • The former class was practically all seniors. That class and all since tend to be journalism, history, political science majors… • Students who don't have specific required math&science courses • These factors mean sample is not representative of college population!
Quality of sample • Opportunity sample • subjects available to me in my classes. Are they/you typical of 'young people'?(My sister thought yes.) • Response bias • students who took up offer. Are they/you more likely to 'know Einstein' than those who didn't. • higher level of general curiosity • diligence at obtaining extra credit
Tester reliability • I was generous in categorizing answers as correct. • Two questions considered separately • 23 out of 26 • 24 out of 26
Reporting Confidence at level alpha that actual proportion is within error of tested proportion More confidant at larger interval • I am 95% confident (chances are only 1/20 that this is wrong) that the actual proportion is at least 83% that knows Einstein. • I am 99.5% confident (chances are 1/200 that this is wrong) that the actual proportion is at least 78%
Formulas for margin of error • Based on the finding that means of samples are [close to] normally distributed with standard deviation function of tested proportion and size of sample. • One-tailed test (just checking one side because tested proportion close to 1)
Correlation (again) • Two variables • common examples • height & weight • mortality & set of health risks factors (e.g., smoking history) • Are the two correlated? Does value of one predict [some of] value of the other?
Linear model • Linear = line. • X and Y (standard names for two variables—variables, values that vary!) • Y = a + b*X • if a = 0, b>0 if a>0, b>0 Note: negative values of X and/or Y may or may not be valid…
Linear Model • a>0, b<0 • (This will be basis of negative correlation. Still a relationship, but in the negative. As X gets big, Y gets small.)
Cab fare • (Numbers are not right, but the idea is) • $3 to get in • $2 every ¼ of a mile • Y is the fare/total cost (not including tip!) and X is distance, given in miles rounded up to the nearest quarter mile. • Fare = 3 + 2*(miles * 4) • Example: rode ½ a mile. Fare is 3 + 2*2 = 7
(rough) graph of cab fare Points (0,3), (1,8)
Aside • Units: miles versus quarter miles, miles versus feet versus kilometers versus … need to be understood. Some stories/calculations/experiments succeed or fail based on getting the units right! • space flight that failed due to misunderstanding/lack of agreement on units.
Correlation • Two variables, X and Y. • Make a graph (computer program does not make a graph—you think about a graph) • Process: determine line that would be the best fit • defined as minimizing sum of the squares of the distances from the line ('least squares')
Excel example • List two sets of numbers: • Graph using scatter plot • Use =correl(B2:B8,C2:C8) .96927
Other models • … other relationship: quadratic, log, exponential, etc. Say you know deer population at two points in time. Is/will the growth be linear or exponential????? Pop. Time
Caution • Correlation is not cause • coincidental • both caused by other factor • Cause is not….absolute determination. • other factors
Terminology (reprise) • False positive: wrongly say someone/something has condition. • False negative: wrongly say someone/something does NOT have condition when, if fact, he or she or it does • Control group: group in experiment that does not have treatment. • treatment/condition group: group in experiment that does have treatment
Double-blind study • Randomly assign subjects to • treatment • control (may give placebo) • Subject does not know which…. • Tester/evaluator does not know which… • See what happens. Time period may be long. • Smoking cannot be studies using a double-blind study!
Retrospective study • Of the people who did/have X, ask how many did Y? • Not as reliable. • Also need to study group that do not have X. • 85% of people with lung cancer report that they smoke[d]. • (How many George Burns are there?)
Smoking and Lung Cancer • Strong correlation • more smoking increases chances of lung cancer • smoking comes before the cancer • many different studies • Women's incidence of lung cancer went up when women started smoking • Incidence going down in groups decreasing smoking • Biological evidence • nicotine experiments with animals • lab study of lungs, blood, blood pressure, etc.
What's likely to kill you http://www.reason.com/blog/show/128501.html
Small multiples • Several (many) graphs/diagrams of the same format
Homework • Identify complex topic (such as health risks, sports records, voting) • multiple dimensions/factors; multiple categories; timeline?, geography? • Find reputable source (more than one source even better) • Determine critical findings • determine audience • Design/build diagram (chart, graph, picture) • Bring to class to present AND to turn in. Be professional! • as appropriate, consider examples shown in class: • using 'small multiple' idea as done for 31 days • spreadsheet, but with pictures & words (tax cuts) • as appropriate, consider charts presented on health risks • This could be topic for your project I paper + charts • DUE in 1 week. • Continue postings.