Improved Statistics Understanding for Effective Decision-Making

Preamble OBL 302 Statistics What seems to be the problems? Students’ view Enhanced Face to Face Dec 2009 Mpwapwa

Observations from the course tutor It seems that some of you DO NOT know; • How to use a scientific calculator for retrieval of mean, sum of squares, standard deviation etc. • Which test should be used for a particular situation (question?). • How to do simple calculations, hence you fail to arrive at the correct answer even when you know the correct procedure • That in hypothesis testing you can not arrive at a conclusion without comparing the value of the test statistic and the critical value that must be read from a specific/appropriate table (you are allowed to bring statistical tables in examination rooms) INFACT many students do not know how to read tables to obtain critical values and hence fail to make correct conclusion(s) when answering questions. Enhanced Face to Face Dec 2009 Mpwapwa

Weaknesses among most of my students • That you waste time copying the whole question in your answer scripts. This is not necessary. • That you waste time trying to answer questions that you are not sure of instead of starting with questions that you are confident with. • It is important to follow instructions. • That it is not a good practice to attempt a paper when you are not fully conversant with the course. Enhanced Face to Face Dec 2009 Mpwapwa

Enhanced Face to Face Dec 2009 Mpwapwa

Depth of material • Course outline • Extended Course outline • Compendium • Study material • Reference materials

OBL 302 Biostatistics Dr Said M.S. Massomo FSTES, Morogoro Enhanced Face to Face Dec 2009 Mpwapwa

1.0 Introduction What is statistics? • There are two meanings 1. Numerical information Singular form Statistic, Collection of > one figure = statistics 2. A branch of science within applied mathematics that deals with a collection of methods/techniques for Planning experiments, Collecting data, and then Organising, Summarizing, Presenting, Analysing, Interpreting data so as to assist in making more effective decisions

Statistics? Taught as core course in most programmes (examples at the OUT) The main difference is on scope of coverage and examples used It is applied Mathematics, basic knowledge of simple algebra is sufficient to master the course Introduction cont..

Introduction cont.. Why study Statistics? • Numerical information is everywhere, how do we determine if the conclusions determined are reasonable? • Decisions affect our daily lives and personal welfare eg Drugs and appropriate dosage • Understand why decisions are made and will give you a better understanding of how they affect you Enhanced Face to Face Dec 2009 Mpwapwa

Introduction cont.. • You will always be required to make informed decisions. The questions will be • Is the information adequate or not • Will additional information, if it is needed, provide results that are not misleading • How do you summarise the information in a useful and informative way • How do you analyze the information • How to draw conclusions and make inferences while assessing the risks of an incorrect conclusion

Introduction cont.. • Biostatistics: Also known as Biometry, refers to the application of statistics to solve biological problems What is an experiment? • Planned enquiry/activity designed with the aim of getting new information, confirm or deny certain previous information

Course Coverage • Introduction • Describing data: Measure of location • Describing data: Measure of dispersion • The normal Probability distribution • Confidence intervals • Test of hypothesis: Small samples • Test of Hypothesis: Large samples • Linear Regression and correlation • Chi-square test • ANOVA • Other important notes Enhanced Face to Face Dec 2009 Mpwapwa

Types of Statistics There are two types of statistics • Descriptive statistics and • Inferential statistics • Planning experiments, • Collecting data, • Organising data, • Summarizing data, • Presenting data, • Analysing data, • Interpreting data • Conclusions Descriptive statistics Inferential statistics

Introduction cont.. Descriptive statistics: Are statistical procedures that describe, organise and summarize the main characteristics of sample data • Organising data, How? • Summarizing data, how? • Means, range Standard deviations, Frequency tables etc • Presenting data, • Histograms, charts & tables etc

Introduction cont.. Standard Deviation & Standard error of the mean • Calculations • Short cut method • Use of scientific calculators • Implications

Comparing Qualities of Measurement Scales (After Dunn, 2001)

Normal Probabilities • Comprehension of this table is vital to success in the course! • There is a table which must be used to look up standard normal probabilities. The z-score is broken into two parts, the whole number and tenth are looked up along the left side and the hundredth is looked up across the top. The value in the intersection of the row and column is the area under the curve between zero and the z-score looked up. • Because of the symmetry of the normal distribution, look up the absolute value of any z-score.

Normal Probabilities • There are several different situations that can arise when asked to find normal probabilities.

Normal Probabilities • This can be shortened into two rules. • If there is only one z-score given, use 0.5000 for the second area, otherwise look up both z-scores in the table • If the two numbers are the same sign, then subtract; if they are different signs, then add. If there is only one z-score, then use the inequality to determine the second sign (< is negative, and > is positive).

Normal Probabilities Finding z-scores from probabilities • This is more difficult, and requires you to use the table inversely. You must look up the area between zero and the value on the inside part of the table, and then read the z-score from the outside. • Finally, decide if the z-score should be positive or negative, based on whether it was on the left side or the right side of the mean. Remember, z-scores can be negative, but areas or probabilities cannot be.

Normal Probabilities Enhanced Face to Face Dec 2009 Mpwapwa

The Normal Probabilities.. • The values in the table are the areas between zero and the z-score. • That is, P(0 < Z < z-score) • See tables Enhanced Face to Face Dec 2009 Mpwapwa

Enhanced Face to Face Dec 2009 Mpwapwa

Inferential statistics Inferential statistics: extend the scope of descriptive statistics by examining the relationships within a set of data, in particular, inferential statistics enable the researcher to make inference, that is conclusions / deductions / judgements, about the population based on the relationships within the sample data. Inferential statistics: Unlike descriptive statistics, inferential statistics make inference about a population basing on sample data Enhanced Face to Face Dec 2009 Mpwapwa

Population Vs Sample • Population: All subjects possessing a common characteristic that is being studied. Example? • Sample: subgroup or subset of the population. examples? • Why do we work with samples? Enhanced Face to Face Dec 2009 Mpwapwa

Inferential Statistics Why do we work with samples? • Cost • Practicability, eg destructive sampling • Time constraint • It is possible to draw correct conclusions if sampling is done in a proper way Enhanced Face to Face Dec 2009 Mpwapwa

Inferential statistics... • Random sample: A sample that has been drawn from a population such that each individual in the population has an equal chance of being selected. • Central limit theorem: theorem which states ‘as the sample size increases, the sampling distribution of the sample means will become approximate normally distributed • Sampling error: Difference that occurs between the sample statistic and the population parameter due to the fact that the sample is not a perfect representation of the population. Enhanced Face to Face Dec 2009 Mpwapwa

Inferential statistics: Test of hypotheses Test of hypotheses (Also called test of significance) • Definition: It is a statistical test that examine a set of sample data, and on a basis of an expected distribution of the data (eg Z, t, F or Chi),at a specific level of significance and leads to a decision about whether to reject null hypothesis or alternative hypothesis Enhanced Face to Face Dec 2009 Mpwapwa

Test of hypotheses... • Statistical tests = difference between sample means divided by Error term (within group error) • Statistical significance: Refers to whether a test detected a reliable difference between two or more groups, one caused by the effect of an independent variable on a dependent measure Enhanced Face to Face Dec 2009 Mpwapwa

Test of hypotheses.. • A hypothesis: A statement about a population that is subject for testing. Hypothesis may be null or alternative. • Null hypothesis: A statement about a population that is under test. Denoted as Ho, and Ho state that there is no difference between means or there is no effect. Always include the equal sign =, • Alternative hypothesis: A statement that is true when Ho is false. Hi determine whether the test is left/right one tailed of two tailed. Characterised by presence of inequality sign Enhanced Face to Face Dec 2009 Mpwapwa

Test of hypotheses: Type I error and Type II error Type I: Rejecting Ho when it is true. Usually more serious error Type II: Accepting Ho when it is false, that is saying true when it is false (examples...). • Usually defendants are presumed innocent until proven guilty. The purpose of a court trial is to see whether a null hypothesis of innocence is rejected by the weight of the data (evidence). • The null hypothesis : Ho = the person is innocent, • The alternative hypothesis Hi = the person is guilty Which is more serious error? Convicting an innocent person or letting the guilty person go free? Enhanced Face to Face Dec 2009 Mpwapwa

Test of hypotheses... Level of significance: Also called p-value or alpha. Refers to the probability of rejecting the null hypothesis when it is true. P=0.05 and 0.01 are common for biological studies. It is a way of expressing the likelihood that Ho is not true. The level of significance is the complement of the level of confidence in estimation. If no level of significance is given then use P=0.05. Test statistic: A value, determined from sample information, used to determine whether to reject the null hypothesis. Eg Z, t, F value. Enhanced Face to Face Dec 2009 Mpwapwa

Test of hypotheses... A critical value: The value(s) which separates the critical region from the non critical region. • The critical values are determined independently of the sample statistics. • They are read from appropriate tables of distribution. Critical region: also called rejection region, is a set of all values which would cause us to reject Ho. If the test statistic falls in the rejection region Ho is rejected. Enhanced Face to Face Dec 2009 Mpwapwa

Test of hypotheses... Arrive at a decision: A statement based upon the null hypothesis. It is either ‘reject the null hypothesis’ or ‘fail to reject the null hypothesis’. Usually we NEVER accept the null hypothesis. Conclusion: A statement which indicates the level of evidence (sufficient or insufficient), at a specific level of significance and decide whether the original claim is rejected (null) or supported (alternative) Enhanced Face to Face Dec 2009 Mpwapwa

Coverage OBL 302 part II • Test of hypothesis: Small samples using t test • The t test for small samples • The t test for independent samples • The t test for dependent samples • Test of Hypothesis: Large samples using Z test • Chi-square test • Linear Regression and correlation • ANOVA • Other important notes Enhanced Face to Face Dec 2009 Mpwapwa

Test of hypothesesSmall samples using the t test Why use t distribution? The t-test is used for small samples (n < 30) as Z distribution provides unreliable estimates of differences between samples when the number of available observation is less than 30 Remember the t distribution is more flatter than the Z distribution Enhanced Face to Face Dec 2009 Mpwapwa

The t-test .. Application of the t-test • The t test was created to deal with small samples when parameters and variability of larger parent population is unknown • The t tests are used to compare one or two sample means but not more than two means. • The t test detects a significant difference between means when the • Difference is large, • Sample standard deviation is small and or • Sample size is large Enhanced Face to Face Dec 2009 Mpwapwa

Variation of the t test: 1. Single or one sample t test • This is used to compare the observed mean of one sample with a hypothesized value assumed to represent a population. • T or Z test both use similar formulas Test statistic= Diference between sample means Standard error of the mean • It tries to answer the question: is it likely that a sample with a given mean could have come from a population with the proposed µ? • It is usually used to determine if some set of scores or observation deviate from some established pattern examples? Enhanced Face to Face Dec 2009 Mpwapwa

Variation of the t test: 1. Single or one sample t test • If the population standard deviation, sigma, is unknown, then the population mean has a student's t distribution, and you will be using the t-score formula for sample means. • The test statistic is very similar to that for the z-score, except that sigma has been replaced by s and z has been replaced by t. • The critical value is obtained from the t-table. The degree of freedom for this test is n-1. Enhanced Face to Face Dec 2009 Mpwapwa

1. Single or one sample t test • A poultry farm produces chickens with a mean weight of 2.18 kg at the age of 5 months. The weights are normally distributed. In an event to increase their weight, a special additive was mixed with the chicken feed. The subsequent weights of a sample of five-month-old chickens were (in Kg) 2.21, 2.19, 2.17, 2.18, 2.15, 2.20, 2.18, 2.19, 2.20 and 2.20. • At the 0.05 level of significance, determine whether the special additive has increased the weight of chicken. (26 marks) Enhanced Face to Face Dec 2009 Mpwapwa

1. Single or one sample t test Enhanced Face to Face Dec 2009 Mpwapwa

Variation of the t test: 1. Single or one sample t test Conclusion: Since the value of test statistic (t = 1.228) is less than the critical value (2.262) We fail to reject the null hypothesis Ho: X = 2.18, instead we reject the alternative hypothesis Hi: X ≥ 2.18 hrs In other words the sample mean (2.187 kg) is not significantly different from the population mean (2.18 kg). Enhanced Face to Face Dec 2009 Mpwapwa

2. The T test for independent groups (two sample test) • Independent Samples: samples are independent when they are not related. • Independent samples may or may not have the same sample size. • Designed to detect significant difference between a control group and an experimental group • It tries to answer the Question: Is X1 different from X2 or could the two sample means come from identical population? • Examples....see Z test for two samples why? • The test statistic is very similar to that for the z-score, except that sigma has been replaced by s and z has been replaced by t. Enhanced Face to Face Dec 2009 Mpwapwa

3. Dependent Samples T test (paired samples t test) • Samples in which the subjects are paired or matched in some way • Dependent samples must have the same sample size, but it is possible to have the same sample size without being dependent. Type of Dependent samples are • Those characterised by a measurement, an intervention of some type, then another measurement. In other words, a paired t test is designed to detect the presence of measurable change in the average attitude/behaviour of group from one point in time to another point in time. It tries to answer the Question: Is the mean one (X1)different from mean two (X2)? • Involves matching or pairing of observation Enhanced Face to Face Dec 2009 Mpwapwa

The t test: Dependent samples ...... characterised by a measurement, an intervention of some type, then another measurement. In other words, a paired t test is designed to detect the presence of measurable change in the average attitude/behaviour of group from one point in time to another point in time. Enhanced Face to Face Dec 2009 Mpwapwa

The t test: Dependent samples Enhanced Face to Face Dec 2009 Mpwapwa

The t test: Dependent samples 2.35 Enhanced Face to Face Dec 2009 Mpwapwa

Improved Statistics Understanding for Effective Decision-Making

Improved Statistics Understanding for Effective Decision-Making

Presentation Transcript

Preamble

Preamble

Preamble

Preamble

Preamble

Preamble Discussions

Preamble

PREAMBLE

PREAMBLE

PREAMBLE

Preamble

Preamble

PREAMBLE

Preamble:

PREAMBLE

Preamble

PREAMBLE

Preamble

Preamble

Preamble

Preamble

PREAMBLE: