Proportions

Proportions Estimating population proportions Difference of proportions

Review: Setting up a c.i. around a proportion 1. estimate the proportion 2. Take the SD with this formula: s= sqrt(p * (1-p)) 3. Find the s.e. with this formula: s / sqrt(n) 4. Set up the confidence interval with this formula: proportion plus or minus t * s.e.

Example A teacher offers extra tutoring in math. He takes a sample of 100 students who went through his tutoring sessions, and found that 73 started to get higher marks. He wants to figure out the 95% confidence limits in his survey before he goes to the principal and tells her the program was successful.

Step 1-2 Step 1: estimate the population proportion =0.73 start to perform better in math Step 2: get the sample standard deviation using this formula: s= sqrt(p * (1-p)) =sqrt(0.73 * 0.27) =0.444

Step 3 Step 3: Use this in order to find the standard error: = s / sqrt(n) =0.44/ sqrt(100) = 0.044

Build confidence interval Step 4: What are the 95% confidence limits of the proportion? Since n is bigger than 30, the normal curve can be used. Set up a confidence interval using this formula: proportion plus or minus t * s.e. =0.73 + or - 1.96 * 0.044 =0.73 + or - 0.087 =0.63 to 0.81

Testing the difference between 2 groups Chapter 14

Difference of means test Used if someone wants to know if two sample means or proportions are different (statistically) -could both sample means have been drawn from the same population (and the difference is attributed to chance alone) -or are they so different that there is no way they could have been drawn from the same population

Vs. what we’ve already done • -So far, we have used single samples, meaning we wanted to see if a single sample of a particular mean could be drawn from a population with a known or hypothesized mean • now we use 2 samples

A sample problem to walk through the logic of these types of problems A veterans support agency offers continuing education seminars for veterans. They want to evaluate the effect it has on job placement. The agency has half the veterans take the seminars, and the other half does not. They randomly select 50 who have done the seminars and 50 who have not. They want to evaluate whether the job placement rates are different.

Formulate a research and a null hypothesis The research hypothesis: tests whether one of the sample means is larger or smaller than the other sample mean (difference of means) The null hypothesis: when you fail to reject it, you’re saying that the population means in questions are not different (i.e. an after school reading program didn’t raise mean scores)

For this example: H_A: Employees with the seminar will have higher job placement rates H_0: Employees who attend the seminar and those who do not attend the seminar will show no difference in job placement rates

What does it mean to reject the null? -If we fail to reject the null, this is like saying the mean scores of the two populations show no difference -i.e. the population mean is the same and the seminars don’t lead to higher job placement -If we reject the null, the conclusion there is that the test scores were different and job placement rates are in fact different

Practice Problem • The career development center wants to see if its new ad campaign is working, so they can decide whether or not to fire their intern and use the money elsewhere. They randomly sample 9 MIT courses and send them weekly reminders about the career center. They randomly sample another 9 courses and send no emails. They then record the mean student visits per sample. • The average visits in courses with no ads is 135 (SD=110) per term • The average visits in courses with ads is 405 (SD=135) per term

Thinking through the problem • The first step is to state the null and alternative hypothesis: • Even if you aren’t asked for them, it helps you think • Ho: the ads have had no effect on visitations to the career center • Ha: the ads have increased visitations to the career center • What you’re conceptualizing here is the difference between the average visitations: • Mu(experiment) – Mu(control) = d (difference)

Perform calculations (that we already know how to do) • The mean and SD were already given: • The average visits in courses with no ads is 135 (SD=110) per term • The average visits in courses with ads is 405 (SD=135) per term • Get the Standard error: SD / sqrt(n) • Pre advertisements: 110/ sqrt(9) = 36.667 • Post advertisements: 135 / sqrt(9) = 45

New step for difference of means • Calculate the pooled standard error with the following formula: • =58.04

New step 2 for difference of means • Get the t score using the pooled standard error s.e._d. • Why? Because the point of the difference of means test is to see the probability that the groups could have been drawn by the same population and the difference is just by chance • The formula for the t score is: • (135-405)/58.04= -4.65

Look this t score up in the t table • Find the degrees of freedom (n1+n2-2)=18-2=16 • P=.0001 • Reject null

Can also use a calculator • http://stattrek.com/online-calculator/t-distribution.aspx

Types of difference tests

Before we start • We see the word “variance” a lot in this chapter • The variance is just the standard deviation squared

General types of difference of means tests • Independent samples: you have two samples that are not paired or matched in any way • These samples were obtained using random sampling methods • Example: someone at the IRS picks 2 samples from a database of 250 tax returns • These samples could have equal or unequal variances (more on that later) • Dependent Samples: “before and after test” where each item in one sample is paired with an item in the second sample • Example: an agency selects 20 people with low performance scores and has them do a workshop for a month, the same 20 employees are then tested again after the workshop to see if they improve

Difference of means: Independent samples, unequal variances • If you don’t know what type of difference test you’re doing assume it is this one • This is the most conservative test and the one you see the most in real life studies • Conservative means it is hard to reject the null hypothesis • Why? The standard error calculations take large differences in sample variances (s^2) into account • Sampling error is to blame for unequal variances

Calculating the degrees of freedom • Use this formula (it produces smaller df which makes the test more conservative versus the n1+n2-2 formula: • Remember, the lower the df, the bigger the test statistic needs to be when deciding to reject the null or not

Calculating degrees of freedom • The formula is useful when the number of cases in each sample is different • Or if the number of cases in each sample is small (less than 30) • Example: if sample one has 150 cases and sample 2 only has 20 • The variances will be different • More conservative tests make it harder to commit a type I error

Practice Problem: difference of means, independent samples, unequal variances • The president of MIT wants to know if a new technology program for professors has made them use interactive visual aides more in the classroom. He randomly selects 10 courses where the professors in them received the training, and 8 courses where they have not yet taken the course.

State the null and alternative hypotheses • Ho: the use of visual aides with course = use without course • Ha: the use of visual aides with course > use without course

Calculate the standard error • Use s/sqrt(n) • No course: 6.4 / sqrt 8= 2.262 • Course: 6.3 / sqrt 10=1.99

Get the pooled standard error • Use this formula: • =3.015

Get the t score • Use this formula: • t= (32.7-37.6) / 3.015=-1.625

Calculate df • Use this formula: • Numerator: [(6.4 ^2 / 8) + (6.3 ^2 /10)] ^2 =82 • Denominator: 3.74 + 1.75 • 82/(3.74+1.75) = 14.93 • Round to 15 • (compare to 10+8-2)=16

Look up in t table • df=15 • T=-1.625 • =0.2012 • ~20% chance these samples were taken from the same population • Between .10 and .05

Visualized

Difference of means, independent samples, equal variances • Less conservative than the test for unequal variances • Because the former makes for larger standard errors and higher t scores • You determine if two sample variances are equal by using the Levene test • We will go over this in our next Stata lab • This is a super common task for students to use Stata for • The Levene gets interpreted as follows: • Null: two sample variances are equal • Research: two sample variances are unequal • The test statistic here is the F statistic • Example: F=87.4 at significance .00 -> reject the null since the probability the two variances are equal is quite small

Steps to solve these types of problems • Once you have the mean and s1 and s2, calculate a new “pooled” standard deviation using this formula: • This is nothing but a weighted average of the two sample standard deviations

Calculate standard error • Convert the standard deviation to standard error using this formula: • Then get the t statistic using (x bar 1 – x bar 2)/s.e. • Then look it up in the t chart

Example • NOAA scientists sampling fish larvae in New England fisheries have received a bigger budget to buy finer mesh nets for their sampling trips in the spring and in the fall. They believe this net will help them to better survey for fish larvae. Better data means less angry stakeholder assessments. They randomly select 10 boats with old nets and 10 boats with new nets and collect the following information. They want to show that this was money well spent.

State the null and the research hypothesis • Ho: the new net’s catch yield = the old net’s • Ha: the new net’s catch > old net’s catch

Get the pooled standard deviation • Use this formula: • Numerator= 73728 • Denominator= 4096 • S_d= 64

Get the standard error & t statistic • Use this formula: • =28.622 • (326-526)/28.622 • = -6.98

Look it up in the chart at 18 df • p<.0005 • This means we can reject the null and say with high certainty that the new nets are working. • Let’s extrapolate these results to make claims on government spending in general. • Just kidding.

Difference of means tests: Dependent Samples • This is the “before and after” where the befores are paired with the afters Example: The IRS implemented a training program to reduce the time it takes to process an organization’s tax exempt status. They took the following data from 10 different regional offices before and after the program:

The remaining steps to solve • ARE ALL PERFORMED ON THE D COLUMN • Your results and the statistical inference you make are on the difference

Step 1: get the standard error • Mean=4.07 • S=4.56 • s.e.= 4.56/ sqrt(10) = 1.44

Get the t score • (4.07-0) / 1.44 • Use 0 because you are seeing if there is a difference between the difference you found (d) and no difference at all 0 • =2.83 • Look this up in the t table at df=9, or use stat calculator:

Difference of proportions • The t test can be used for the difference of 2 sample proportions in the same way that it can be used for differences between sample means

Practice Problem • DUSP wants to know if math camp is working for its new MCP admits. 65% of the incoming class was put through the program. 80 Students are sampled from the math camp group 65 passed quant. 40 MCPs were sampled from the group that was exempted from math camp. From those, 29 passed quant. Does math camp work? • Calculate the proportions: • Math camp: 81.2% pass • No math camp: 72.5% pass

Calculate the s • S=sqrt(p*(1-p)) • Those who did math camp: sqrt (.81*.19)=.39 • Those who did not do math camp: sqrt(.72*.25)=.42

Get the s.e. • s / sqrt(n) • Those who did math camp: .39 / sqrt (80) = .044 • Those who did not do math camp: .42 / sqrt (40)= .067

Proportions

Proportions

Presentation Transcript

Proportions

Proportions

Proportions

Proportions

Proportions

Proportions

Proportions

Proportions

proportions

Proportions!!!

Proportions

Proportions

Proportions

Proportions

Proportions

Proportions

Proportions

Proportions

Proportions

Proportions

Proportions