Module 5

5 Week Modular Course in Statistics & Probability Strand 1 Module 5

Bernoulli Trials • Bernoulli Trials show up in lots of places. There are 4 essential features: • There must be a fixed number of trials, n • The trials must be independent of each other • Each trial has exactly 2 outcomes called success or failure • The probability of success, p, is constant in each trial • Where do we see this occurring? • tossing a coin • looking for defective products rolling off an assembly line • shooting free throws in a basketball game • Whenever we are dealing with a Bernoulli trial there is a discrete random variable X. This • random variable needs to be identified because all probability questions will involve • finding the probability of different values of this variable. For example if you toss a coin n • times, the random variable X could be the number of heads occurring in 3 tosses e.g. X • can take on the values 0, 1, 2, 3. • We will look at three different types of Problems: • 1. calculating the probability of first success after n repeated Bernoulli trials • 2. calculating the probability of k successes in n repeated Bernoulli trials • 3. calculating the probability until the kth success in n trials. Module 5.1

Success/Failure

First Success After n Repeated Bernoulli Trials Module 5.2

k Successes in n Repeated Bernoulli Trials: Binomial Distribution Module 5.3

Module 5.4

Graph of the Distribution 0.35 0.30 0.25 Probability 0.20 0.15 0.10 0.05 0 1 2 3 4 5 6 No. of heads Module 5.5

Module 5.6

Sample Space Module 5.7

Module 5.8

Module 5.9

Module 5.10

Module 5.11

Your turn! 5.1

Normal Distribution Discrete data (golf scores, dice scores) are generally represented by bar charts. In a bar chart we compare the heights of the bars. Continuous data (height, weight, physical characteristics) are represented by histograms. In a histogram we compare the areas of the columns. Histogram Showing the Height of Seedlings Frequency Height (mm) The histogram shows that a large quantity of the data is clustered at the centre. Module 5.12

Probability Area Histogram Showing the Height of Seedlings Frequency Height (mm) If a seedling is chosen at random it has approximately a 77.88% chance of having a height within the yellow area shown on the histogram i.e. between 12 mm and 26 mm Module 5.13

Sampling Distribution If another batch of seedlings were taken the picture might look slightly different. It is likely that all batches will follow a common pattern with most of the data clustered around the centre of the histogram. This pattern is common to most measurements in nature. It peaks in the middle and tails at the beginning and end. To get a perfect model we would need to 1. Increase the sample size to infinity 2. Take measurements to an infinite number of decimal places 3. Have the widths of the columns approach zero This is impossible to achieve. We can only create a mathematical model of it. This model is called the NORMAL DISTRIBUTION. Module 5.14

What does a Normal Distribution curve look like? Module 5.15

Normal Distribution to Standard Normal Distribution Different sets of data have different means and standard deviations but any that are normally distributed have the same bell-shaped normal distribution type of curves. Normal Distribution Curve Standard Normal Curve In order to avoid unnecessary calculations and graphing the scale a Normal Distribution curve is converted to a standard scale called the z score or standard unit scale. Normal Distributions Standard Normal Distribution 4 242 –3 7 –2 254 –1 266 10 0 278 13 290 16 1 19 2 302 3 314 22 Module 5.16

Standard Normal Distribution Module 5.17

Empirical Rule Empirical Rule Part 1 About 68% of the area is within 1 standard deviation of the mean. Empirical Rule Part 2 About 95% of the area is within 2 standard deviations of the mean. Empirical Rule Part 3 About 99.7% of the area is within 3 standard deviations of the mean. 68% 95% 99.7% Module 5.18

Standard Units (z – scores) z – scores define the position of a score in relation to the mean using the standard deviation as a unit of measurement. z – scores are very useful for comparing data points in different distributions. The z – score is the number of standard deviations by which the score departs from the mean. This standardises the distribution. Module 5.19

Why do we standardise? Module 5.20

Reading z – values From Tables Pg. 36 Pg. 37 –3 –2 –1 0 1 2 3 1.31 Module 5.21

Pg. 36 Pg. 37 0 –3 –2 –1 0 1 2 3 1.32 z Module 5.22

Pg. 36 Pg. 37 –3 –2 –1 0 1 2 3 –0.74 0 –z z Module 5.23

Pg. 36 Pg. 37 –1.32 1.29 –3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3 –1.32 1.29 Module 5.24

Your turn! 5.2 – 5.3

v 47–0.4 741.4 8–3 23–2 38–1 530 681 832 983 Module 5.25

30–3 40–2 50–1 600 701 802 903 30–3 40–2 50–1 600 701 802 903 45–1.5 Module 5.26

751.5 30–3 40–2 50–1 600 701 802 903 30–3 40–2 50–1 600 701 802 903 72.81.28 Module 5.27

Your turn! 5.4 – 5.6

5.4 15.87% • 49 marks z = 1

23 yrsz = –0.75 z = 0.75 5.5 90% • z = 1.28 22.66% Remember the curve is symmetric 22.66%

6.17 mz =2.333 z = 2.88 5.6 0.2% 0.99%

Hypothesis Testing • Often we need to make a decision about a population based on a sample. • Is a coin which is tossed biased if we get a run of 8 heads in 10 tosses? • Assuming that the coin is not biased is called a NULL HYPOTHESIS (H0) • Assuming that the coin is biased is called an ALTERNATIVE HYPOTHESIS (H1) • During a 5 minute period a new machine produces fewer faulty parts than an old machine. • Assuming that the new machine is no better than the old one is called a NULL HYPOTHESIS (H0) • Assuming that the new machine is better than the old one is called an ALTERNATIVE HYPOTHESIS (H1) • Does a new drug for Hay-Fever work effectively? • Assuming that the new drug does not work effectively called a NULL HYPOTHESIS (H0) • Assuming that the new drug does work effectively called an ALTERNATIVE HYPOTHESIS (H1) Module 5.28

Margin of Error for Population Proportions A sample of 60 students in a school were asked to work out how much money they spent on mobile phone calls over the last week. If the mean of this sample was found to be €5∙80. Can we say that the mean amount of money spent by the students in the school (population) was €5∙80? The answer is no, (unless the sample size was the same as the population size), we can’t say for certain. However we could say with a certain degree of confidence, if the sample was large enough and representative then the mean of the sample was approximately equal to the mean of the population. How confident we are is usually expressed as a percentage. We already saw (from the empirical rule) that approximately 95% of the area of a Normal Curve lies within ± 2 standard deviations of the mean. This means that we are 95% certain that the population mean is within ± 2 standard deviations of the sample mean. ± 2 standard deviations is our margin of error and the ± 2 standard deviations depends on the sample size. If n = 1000 the percentage margin of error of ± 3% v Module 5.29

Some Notes on Margin of Error • As the sample size increases the margin of error decreases • A sample of about 50 has a margin of error of about 14% at 95% level of confidence • A sample of about 1000 has a margin of error of about 3% at 95% level of confidence • The size of the population does not matter • If we double the sample size (1000 to 2000) we do not get do not half the margin of error • Margin of error estimates how accurately the results of a poll reflect the “true” feelings of the population Module 5.30

Module 5.31

Module 5.32

Your turn! 5.7

5.7 A survey is carried out on 400 randomly selected students in Munster and the result is that 60% are in favour of Project Maths. The confidence level is cited as 95%. (i) Calculate the margin of error.(ii) A similar Survey was carried out in Leinster among 400 randomly selected students to see if there was any appreciable difference between support for Project Maths in the Munster and Leinster Area, and the results show that 45% of students were in favour of Project Maths. State the Null Hypothesis and would you accept or reject the Null Hypothesis according to this survey? Give a reason for your conclusion. Solutions • (i) 5% = margin of error. (ii) Null hypothesis : There is no difference in the attitude of Leinster students to PM. • According to the results of the survey we fail to accept the null hypothesis as 45% is outside the margin of error of the results for Munster which is from 55% to 65%.

Module 5

Module 5

Presentation Transcript

MODULE 5

Module 5

Module 5

Module 5

Module 5

Module 5

Module 5

Module 5

Module 5

Module 5

Module 5

Module 5

Module 5

Module 5

Module 5

Module 5

Module 5