310 likes | 311 Vues
Creating User Interfaces. My site. Sampling Homework: User observation reports due next week. My site. Who are users? I am open to suggestions. Sampling. Basic technique when it is impossible or too expensive to measure everything/everybody
E N D
Creating User Interfaces My site. Sampling Homework: User observation reports due next week
My site • Who are users? • I am open to suggestions.
Sampling • Basic technique when it is impossible or too expensive to measure everything/everybody • Premise: possible to get random sample, meaning every member of whole population equally likely to be in sample • NOTE: not a substitute for monitoring directly activity on / with interface
Source • The Cartoon guide to Statistics by Larry Gonick and Woollcott SmithHarperResource • Procedures (formulas) presented without proof, though, hopefully, motivated
Task • Want to know the percentage (proportion) of some large group • adults in USA • television viewers • web users • For a particular thing • think the president is doing a good job • watched specific program • viewed specific commercial • visited specific website
Strategy: Sampling • Ask a small group • phone • solicitation at a mall • Follow-up or prelude to access to webpage • other? • Monitor actions of a small group, group defined for this purpose • Monitor actions of a panel chosen ahead of time ALL THESE: make assumption that those in group are similar to the whole population.
Two approaches • Estimating with confidence intervalc in general population based on proportionphatin sample • Hypothesis testing:H0 (null hypothesis) p = p0 versusHa p > p0
Estimation process • Construct a sample of size n and determine phat • Ask who they are voting for (for now, let this be binomial choice) • Use this as estimate for actual proportion p. • … but the estimate has a margin of error. This means :The actual value is within a range centered at phat …UNLESS the sample was really strange. • The confidence value specifies what the chances are of the sample being that strange.
Statement • I'm 95% sure that the actual proportion is in the following range…. • phat – m <= p <= phat + m • Notice: if you want to claim more confidence, you need to make the margin bigger.
Image from Cartoon book • You are standing behind a target. • An arrow is shot at the target, at a specific point in the target. The arrow comes through to your side. • You draw a circle (more complex than+/- error) and sayChances are:the target point is inthis circle unless shooterwas 'way off' . Shooter would only be way off X percent of the time.(Typically X is 5% or 1%.)
Mathematical basis • Samples are themselves normally distributed… • if sample and p satisfy certain conditions. • Most samples produce phat values that are close to the p value of the whole population. • Only a small number of samples produce values that are way off. • Think of outliers of normal distribution
Actual (mathematical) process Sample size must be this big • Can use these techniques when n*p>=5 and n*(1-p)>=5 • The phat values are distributed close to normal distribution with standard deviation sd(p) = • Can estimate this using phat in place of p in formula! • Choose the level of confidence you want (again, typically 5% or 1%). For 5% (95% confident), look up (or learn by heart the value 1.96: this is the amount of standard deviations such that 95% of values fall in this area. So.95 is P(-1.96 <= (p-phat)/sd(p) <=1.96)
Notes • p is less than 1 so (1-p) is positive. • Margin of error decreases as p varies from .5 in either direction. (Check using excel). • if sample produces a very high (close to 1) or very low value (close to 0), p * (1-p) gets smaller • (.9)*(.1) = .09; (.8)*(.2) = .16, (.6)*(.4) =.24; (.5)*.5)=.25
Notes • Need to quadruple the n to halve the margin of error.
Formula • Use a value called the z transform • 95% confidence, the value is 1.96
Mechanics Process is • Gather data (get phat and n) • choose confidence level • Using table, calculate margin of error. Book example: 55% (.55 of sample of 1000) said they backed the politician) sd(phat) = square_root ((.55)*(.45)/1000) = .0157 • Multiply by z-score (e.g., 1.96 for a 95% confidence) to get margin of error So p is within the range: .550 – (1.96)*(.0157) and .550 + (1.96)*(.0157) .519 to .581 or 51.9% to 58.1%
Example, continued 51.9% to 58.1% may round to 52% to 58% or may say 55% plus or minus 3 percent. What is typically left out is that there is a 1/20 chance that the actual value is NOT in this range.
95% confident means • 95/100 probability that this is true • 5/100 chance that this is not true • 5/100 is the same as 1/20 so, • There is only a 1/20 chance that this is not true. • Only 1/20 truly random samples would give an answer that deviated more from the real • ASSUMING NO INTRINSIC QUALITY PROBLEMS • ASSUMING IT IS RANDOMLY CHOSEN
99% confidence means • [Give fraction positive] • [Give fraction negative]
Why • Confidence intervals given mainly for 95% and 99%?? • History, tradition, doing others required more computing….
Let's ask a question • How many of you watched the last Super Bowl? • Sample is whole class • How many registered to vote? • Sample size is number in class 18 and older • ????
Variation of book problem Divisor smaller • Say sample was 300 (not 1000). • sd(phat) = square_root ((.55)*(.45)/300) = .0287 Bigger number. The circle around the arrow is larger. The margin is larger because it was based on a smaller sample. Multiplying by 1.96 get .056, subtracting and adding from the .55 get .494 to .606You/we are 95% sure that true value is in this range. • Oops: may be better, but may be worse. The fact that the lower end is below .5 is significant for an election!
Exercise • size of sample is n • proportion in sample is phat • confidence level produces factor called the z-score • Can be anything but common values are [80%], 90%, 95%, 99%) • Use table. For example, 95% value is 1.96; 99% is 2.58 • Calculate margin of error m • m = zscore * sqrt((phat)*(1-phat)/n) • Actual value is >= phat – m and <= phat + m
Opportunity sample • Common situation • people assigned/asked to have a meter attached to their TVs • people asked/voluntarily sign up to have a meter (software) installed in their computers. • people asked during a Web session to participate in survey • students in a specific class! • Practice is to determine categories (demographics) and project the sample results to the subpopulation to the population • For example, if actual population was 52% female and 48% male, and sample (panel) is 60% male and 40% female, use proportions to adjust result… • But maybe this fact hides problem with the sample • Has negative features of any opportunity sample • Are these folks different than others in their (sub)population?
Requirements • Model / Categories must be well-defined and valid • Hispanic versus (Cuban, others) in Florida in 2000 • Need independent analysis of subpopulations representation in general population • The sample sizes are the individual Ns, making the margin of errors larger
Adjustment from panel data • Panel of 10: 6 females, 4 males • Population is 52% female and 48% male • Female panelists: 5 liked interface, 1 didn't. Male panelists: 2 liked interface, 2 didn't. • Estimate for whole population (size P) (5/6)* .52 * P + (2/4)*.48* P
Critical part of surveys and survey analysis: • Understand the exact wording of question. • Understand definition of categories of population. • Don't make assumptions… Admire Michelle Obama example Belief in Holocaust example
Usability research • Often aims for qualitative, not quantitative results • Ideas, critical factors • Note: there are fields of study • Non-numeric statistics • Qualitative research • Still necessary to be systematic. • AD: consider taking Statistics!
Homework • Continue work on user observation studies • This is qualitative work