Hypothesis testing

Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok 19-23 November 2007

Hypothesis testing Hypothesis testing involves: • defining research questions and • assessing whether changes in an independent variable are associated with changes in the dependent variable by conducting a statistical test Dependent and independent variables • Dependent variables are the outcome variables • Independent variables are the predictive/ explanatory variables

Example… • Research question: Is educational level of the mother related to birthweight? • What is the dependent and independent variable? • Research question: Is access to roads related to educational level of mothers? • Now?

Tests statistics • To test hypotheses, we rely on test statistics… • Test statistics are simply the result of a particular statistical test The most common include: • T-tests calculate T-statistics • ANOVAs calculate F-statistics • Correlations calculate the pearson correlation coefficient

Significant test statistic • Is the relationship observed by chance, or because there actually is a relationship between the variables??? • This probability is referred to as a p-value and is expressed a decimal percent (ie. p=0.05) • If the probability of obtaining the value of our test statistic by chance is less than 5% then we generally accept the experimental hypothesis as true: there is an effect on the population • Ex: if p=0.1-- What does this mean? Do we accept the experimental hypothesis? • This probability is also referred to as significance level (sig.)

Hypothesis testingPart 1: Continuous variables Intermediate Training in Quantitative Analysis Bangkok 19-23 November 2007

Topics to be covered in this presentation • T- test • One way analysis of variance (ANOVA) • Correlation • Simple linear regression

Learning objectives By the end of this session, the participant should be able to: • Conduct t-tests • Conduct ANOVA • Conduct correlations • Conduct linear regressions

Hypothesis testing… WFP tests a variety of hypothesis… Some of the most common include: 1. Looking at differences between groups of people (comparisons of means) Ex. Are different livelihood groups more likely to have different levels food consumption?? 2. Looking at the relationship between two variables… Ex. Is asset wealth associated with food consumption??

How to assess differences in two means statistically T-tests

T-test A test using the t-statistic that establishes whether two means differ significantly. Independent means t-test: • It is used in situations in which there are two experimental conditions and different participants have been used in each condition. Dependent or paired means t-test: • This test is used when there are two experimental conditions and the same participants took part in both conditions of experiment.

T-test assumptions In order to conduct a T-test, data must be: • Normally distributed • Interval • Estimates are independent • Homogeneity of variance Independent and dependent t-tests Independent t-tests

The independent t-test • The independent t-test compares two means, when those means have come from different groups of people; • This test is the most useful for our purposes

T-tests formulas Quite simply, the T-test formula is a ratio of the: Difference between the two means or averages/ the variability or dispersion of the scores Statistically this formula is:

Example T-tests Difference in weight for age z-scores between males and females in Kenya T-test = T-test = 5.56

To conduct an independent t-test In SPSS, t-tests are best run using the following steps: • Click on “Analyze” drop down menu • Click on “Compare Means” • Click on “Independent- Sample T-Test…” • Move the independent and dependent variable into proper boxes • Click “OK”

One note of caution about independent t-tests It is important to ensure that the assumption of homogeneity of variance (sometimes referred to as homoschedasticity) is met To do so: Look at the column labelled Levene’s Test for Equality of Variance. If the Sig. value is less than .05 then the assumption of homogeneity of variance has been broken and you should look at the row in the table labelled Equal variances not assumed. If the Sig. value of Levene’s test is bigger than .05 then you should look at the row in the table labelled Equal variances assumed.

Testing for homogeneity of variance • Look at the column labelled Sig. : if the value is less than .05 then the means of the two groups are significantly different. • Look at the values of the means to tell you how the groups differ.

What to do if we want to statistically compare differences in three means? Analysis of variance (ANOVA)

Analysis of Variance (ANOVA) • ANOVAs, however, produce an F-statistic, which is an omnibus test, i.e. it tells us if there are any difference among the different means but not how (or which) means differ. • ANOVAs are similar to t-tests and in fact an ANOVA conducted to compare two means will give the same answer as a t-test.

Calculating an ANOVA ANOVA formulas: calculating an ANOVA by hand is complicated and knowing the formulas are not necessary… Instead, we will rely on SPSS to calculate ANOVAs…

Example of One-Way ANOVAs Research question: Do mean child malnutrition (GAM) rates differ according to mother’s educational level (none, primary, or secondary/ higher)?

To calculate one-way ANOVAs in SPSS In SPSS, one-way ANOVAs are run using the following steps: • Click on “Analyze” drop down menu • Click on “Compare Means” • Click on “One-Way ANOVA…” • Move the independent (factor) and dependent variable into proper boxes • Click “OK”

Determining where differences exist In addition to determining that differences exist among the means, you may want to know which means differ. There is one type of test for comparing means: • Post hoc tests are run after the experiment has been conducted (if you don’t have specific hypothesis).

ANOVA post hoc tests Once you have determined that differences exist among the means, post hoc range tests and pairwise multiple comparisons can determine which means differ. Tukeys post hoc test is the amongst the most popular and are adequate for our purposes…so we will focus on this test…

To calculate Tukeys test in SPSS In SPSS, Tukeys post hoc tests are run using the following steps: • Click on “Analyze” drop down menu • Click on “Compare Means” • Click on “One-Way ANOVA…” • Move the independent and dependent variable into proper boxes • Click on “Post Hoc…” • Check box beside “Tukey” • Click “Continue” • Click “OK”

Tukey’s post hoc test

Other types of Post Hoc tests There are lots of different post hoc tests, characterized by different adjustment/ setting of the error rate for each test and for multiple comparisons. if interested, please feel free to investigate more and to try different tests – SPSS help might provide you some good hints!

Now what if we would like to measure how well two variables are associated with one another? Correlations

Correlations • T-tests and ANOVAs measure differences between means • Correlations explain the strength of the linear relationship between two variables… • Pearson correlation coefficients (r) are the test statistics used to statistically measure correlations

Strong negative correlation No correlation Strong positive correlation Types of correlations • Positive correlations: Two variables are positively correlated if increases (or decreases) in one variable results in increases (or decreases) in the other variable. • Negative correlations: Two variables are negatively correlated if one increases (or decreases) and the other decreases (on increases). • No correlations: Two variables are not correlated if there is no linear relationship between them. -1--------------------------0---------------------------1

Illustrating types of correlations Perfect positive correlation Test statistic= 1 Positive correlation Test statistics>0 and <1 Perfect negative correlation Test statistic= -1 Negative correlation Test statistic<0 and >-1

Example for the Kenya Data Correlation between children’s weight and height… Is this a positive or negative correlation?? In what range would the test statistics fall?

Measuring the strength of a correlation: Pearson’s correlation coefficient Pearson correlation coefficient (r) is the name of the test statistic It is measured using the following formula: Looks complicated and we will rely on spss to calculate them…

To calculate a Pearson’s correlation coefficient in SPSS In SPSS, correlations are run using the following steps: • Click on “Analyze” drop down menu • Click on “Correlate” • Click on “Bivariate…” • Move the variables that you are interested in assessing the correlation between into the box on the right • Click “OK”

example in SPSS… Using SPSS we get Pearson’s correlation (0.932)

Lets refresh briefly, what does a correlation of 0.932 mean?? • What does *** mean?

What if we are interested in defining this relationship further by assessing how change in one variable specifically impacts the other variable? Linearregression

Linear regression • Allows to statistically model the relationship between variables… • allowing us to determine how change in one unit of an independent variable specifically impacts

Types of linear regression There are two types of linear regression: • Simple linear regression • Multiple linear regression • Simple linear regression compares two variables, assessing how the dependent affects the independent (as discussed) • Multiple linear regression is more complicated– this involves assessing the relationship of two variables, while taking account of the impact of other variables. • We will focus only on simple linear regression…

The mechanics of simple linear regression…put simply • Linear regression allows us to linearly model the relationship between two variables (in this case x and y), allowing us to predict how one variable would respond given changes in another • Linear regression actually fits the line that best shows the relationship between x and y and provides the equation for this line • Y = a + b x • Y= dependent variable • a= constant coefficient • b= independent variable coefficient • Using this equation we can predict changes in dependent variables, given changes in the independent variable

Simple linear regression To illustrate, lets return to the previous example of wealth index and FCS Here, the correlation coefficient (0.932) indicates that increases in wealth index are associated with increases in FCS. Conducting a linear regression would allow us to estimate specifically how FCS increases given increases in units of wealth index

Simple linear regression Regressing FCS by wealth index gives the following output: Using this output, we can build the regression equation… Y = a + b x Y= FCS a= 38.482 b= 14.101 x= wealth index

Compiling the equation… • FCS= 38.482 + 14.101(wealth index) • What if we wanted to predict the FCS of a households in this population who had an wealth index of 0.569? • FCS= 38.482 + 14.101 (0.569) • FCS= 46.50 • What would thepredicted FCS of a household be if the wealth index is: • 2.256? • -1.256?

To calculate a linear regression in SPSS… • In SPSS, correlations are run using the following steps: • Click on “Analyze” drop down menu • Click on “Regression” • Click on “Linear…” • Move the independent and dependent variables into the proper boxes • Click “OK”

Now… practical exercise!

Hypothesis testing