Statistical Analysis Methods for Research Questions and Hypotheses

Lecture 4Statistical Analysis For use in fall semester 2015 Lecture notes were originally designed by Nigel Halpern. This lecture set may be modified during the semester. Last modified: 4-8-2015

Lecture Aim & Objectives Aim • To investigate methods of statistical analysis Objectives • Research questions & hypotheses • Statistical tests

Introduction • Survey research is all about answering questions • Analytical techniques covered in lecture 3 are used to answer descriptive questions • They were univariate in nature • i.e. only use information from a single variable • Questions are often more complicated & require analysis of 2+ variables • e.g. Do the personal characteristics of shoppers affect customer satisfaction?

Variables • The research question is • Do the personal characteristics of shoppers affect customer satisfaction? • The variables required are • Personal characteristics and satisfaction • Personal characteristics is an independent variable (IV) • Causes change in the DV • Customer satisfaction is the dependent variable (DV) • Influenced by changes in the IV

IV 1 (age) IV 2 (experience) DV (satisfaction) IV 3 (gender) Independent & Dependent Variables

Hypotheses • You may then have a hypothesis • A statement used to test a particular proposition • i.e. older shoppers have significantly higher levels of customer satisfaction than younger shoppers • i.e. more experienced shoppers have significantly higher levels of customer satisfaction than less experienced shoppers • i.e. female shoppers have significantly higher levels of customer satisfaction than male shoppers

Null & Alternative Hypotheses • Null hypothesis (H0) • “There is no significant difference or relationship” • Alternative hypothesis (H1) • “There is a significant difference or relationship” • One-tailed is directional • i.e. X is significantly different to Y • Two-tailed is non-directional • i.e. There is a significant difference between X and Y

Null & Alternative Hypotheses • Hypotheses are normally labelled • H0¹ • H0² • H0³ • H1¹ • H1² • H1³ • etc

Null & Alternative Hypotheses • Null hypothesis • There is no significant difference in customer satisfaction of older versus younger shoppers • Alternative hypothesis • Customer satisfaction is significantly higher for older than younger shoppers (one-tailed) • There is a significant difference in customer satisfaction for older versus younger shoppers (two-tailed) • I’d recommend 2-tailed over 1-tailed

Your turn….. • Which of the following are IV’s and which are DVs? • Sales volume affects profits • Advertisting affects volume of customers • Customer service levels affect customer retention • Study time influences exam results • Academic performance is affected by gender • Increases in aviation fuel burn reduce air quality • Older staff work harder

Your turn….. • Which of the following is a H0, which is a H1 (1-tailed / directional), which is a H1 (2-tailed / non-directional)? • Sales volume has a significant positive effect on profits • Advertisting has a significant effect on the volume of customers • Customer service levels have a significant effect on customer retention • Study time has a significant positive effect on exam results • There is no significant difference in the academic performance of men versus women • Increases in aviation fuel burn have a significant negative effect on air quality • There is no significant difference in the effort of younger versus older workers

Your Survey You are expected to develop a research question(s) based on theoretical context Example • Customer-supplier relationships affect the performance of supply chain networks (Ellinger et al, 1999). This has never been investigated in Norway so this study asks: Do customer-supplier relationships affect the performance of supply chain networks in Norway?

Your Survey • The study by Ellinger et al. (1999) and others such as Jammernegg & Kischka (2005) suggest that frequent customer satisfaction surveys affect performance. • Discussions with industry experts (e.g…..) suggest performance may also be affected by the frequency of meetings with customers and personal visits by senior managers. • This study will investigate the overall effect of the customer-supplier relationship as well as the effect of individual aspects of the customer-supplier relationship • What variables are needed…..?

Your Survey • Variables: • Performance • Customer-supplier relationship • Frequent customer satisfaction surveys • Frequency of meetings with customers • Personal visit by senior managers • How might you create the variables using a survey? • What hypotheses might you use…..?

Statistical Analysis • The significance of each hypothesis is then tested using statistical analysis • The objective is to ‘prove’ or ‘disprove’ each hypothesis

What Tests?

Nature of the Question The way a question is posed suggests different statistical tests Are younger students more satisfied with their course than older students? Inferential statistics Are there differences in levels of satisfaction between younger and older students? Measures of association Is there a relationship between age of student and levels of satisfaction?

Inferential Statistics • Chi-square • One Sample t-test • Paired Samples t-test • Independent Samples t-test • One-Way Analysis of Variance (ANOVA)

Chi-square • Crosstabs can provide initial analysis but….. • It is difficult to interpret the data • It does not comment on the significance of any differences • Chi-square can be used to investigate the significance of the difference between observed and expected values • Used for 2 nominal variables • e.g. course enrolments & gender

Chi-square Cross-tabulations may suggest a trend e.g. course enrolments according to gender

Chi-square Procedure SPSS • Analyse • Descriptive statistics • Crosstabs • Select a variable for rows • Select a variable for columns • Statistics • Tick Chi-square • Cells • Tick Expected • Continue • OK

Chi-square Output SPSS • Use Pearson Chi-square • The value is 6.588 • The greater the value, the greater the difference between observed and expected values • Significance of the difference is 0.037 (i.e. 3.7%) • This means we can be 96.3% confident that the difference is not down to chance • We therefore reject the null hypothesis and accept the alternative hypothesis

Probability - remember this…..? i.e. 95% confidence level means we are saying that we believe that there is a 95% chance that what we found is true (and 5% chance that it is not): written as p<0.05

One Sample t-test • Compares the mean of a single sample with the population mean • e.g. a University claims that it’s BSc graduates have an average starting salary that is ’significantly’ higher than the national average

One Sample t-test Procedure SPSS • Analyse • Compare means • One Sample t-test • Enter test variable • i.e. graduate salary of the sample • Enter test value • i.e. national average of 2.5mn NOK • OK

One Sample t-test Output SPSS • Sample average of 2.6mn NOK is higher than the national average of 2.5mn NOK (t=1.551) • But the difference is not significant (p=0.135) • Accept null hypothesis, reject alternative

Paired & Independent Samples t-tests • Compares 2 means • i.e. are they statistically different? • ORDINAL or RATIO variables • Two main situations: • Compare means of 2 variables for whole sample • Paired samples test e.g. average spend on shoes v food • Compare means of 1 variable for 2 sub-groups • Independent samples test e.g. average spend on shoes by men v women

Paired Samples t-test Procedure SPSS Average spend on shoes v food • Analyse • Compare means • Paired Samples t-test • Select the two variables to be compared • OK

Paired Samples t-test Output SPSS • Averages can be compared • t = -0.937 (food spend is lower) • But the difference is not significant (p=0.353, i.e. 35.3%) • Accept null hypothesis, reject alternative

Independent Samples t-test Procedure SPSS Average spend on shoes by men v women • Analyse • Compare means • Independent Samples t-test • Select the test variable • i.e. shoe spend • Select the grouping variable • i.e. gender • Define groups for the grouping variable • i.e. 1 for group 1 and 2 for group 2 – this corresponds to 1 for female and 2 for male • OK

Independent Samples t-test Output SPSS • Averages can be compared • t = 5.862 (female more than male) • Difference is significant (p=0.000, i.e. 99.9%+) • Reject null hypothesis, accept alternative

One-Way ANOVA • t-test examines differences between 2 means • Analysis of Variance (ANOVA) examines 3+ means • e.g. average spend on shoes by course (BSc, MSc, PhD) • Examines whether means for each group vary • Labelled as ’between groups’ in the output

One-Way ANOVA Procedure SPSS Average spend on shoes by course • Analyse • Compare means • One-Way Anova • Select the DV • i.e. shoe spend • Select the factor • i.e. course • OK

One-Way ANOVA Output SPSS • F = 3.966 (variance test statistic) • Differences are significant (p=0.026, i.e. 2.6%) • Reject null hypothesis, accept alternative

Measures of Association • Correlation Analysis • Linear Regression Analysis • Multiple Regression Analysis

Correlation Analysis • Examines relationship between 2 or more ORDINAL or INTERVAL/RATIO variables • They are ‘CORRELATED’ if they are systematically related • POSITIVELY: as one increases, so does other • NEGATIVELY: as one decreases, so does other • UN-CORRELATED: no relationship

Correlation Analysis • Correlation is measured by the correlation co-efficient, ‘r’. The co-efficient is: • Helps to think of correlation in visual terms • e.g. see next slide

y y x x r close to 1 r close to -1 y y x x both of these would have r close to 0

Association for Income & Profit

Scatter-plot Procedure SPSS • Graphs • Interactive • Scatterplot • IV for the x-axis • DV for the y-axis • OK

Scatter-plot Output SPSS

Correlation Procedure SPSS • Analyse • Correlate • Bivariate • Add variables to variables list • Tick Pearson’s for interval/ratio data (Spearman’s for ordinal) • OK

Correlation Output SPSS • r = .617 (moderate-strong positive relationship) • Relationship is significant (p=0.001, i.e. 0.1%) • Reject null hypothesis, accept alternative

Linear Regression Analysis • Correlation shows strength of relationship but not the causality • Causality indicates the likely impact of IV on DV • e.g. How many pax would visit AMS if more flights were provided (forecasting) • Regression calculates equation for ‘best fit line’: y = a + bx a = a constant representing the point the line crosses the y-axis b = a co-efficient representing the gradient of the slope y & x = DV & IV

Annual ticket sales & profits data p/region for HiMolde Airlines

xy scatter plot indicates possible linear relationship (or not)

Best Fit Line • Perhaps we wish to predict profit for given values of sales • Profit is the dependent variable (y) • Sales the independent variable (x) • Data seems scattered around a straight line • Then need to find the equation of a ‘best fit’ line: y = a + bx Profit = ‘a number’ + (‘some other number’ x sales)

Linear Regression Procedure SPSS • Analyse • Regression • Linear • Place DV and IV in relevant box • OK

Linear Regression Output SPSS Extent to which DV can be predicted by the IV(s) i.e. 66% Profit = a + b x sales Effect of IV on DV is significant (p=0.001, i.e. 0.1%)

Statistical Analysis Methods for Research Questions and Hypotheses

Statistical Analysis Methods for Research Questions and Hypotheses

Presentation Transcript

Statistical Data Analysis: Lecture 5

Statistical Data Analysis: Lecture 12

Statistical Data Analysis: Lecture 2

Statistical Data Analysis: Lecture 4

Statistical Data Analysis: Lecture 11

Statistical Data Analysis: Lecture 3

Chapter 4 Statistical Data Analysis

Statistical NLP: Lecture 4

Statistical Data Analysis: Lecture 5

Computing and Statistical Data Analysis Lecture 4

Statistical Data Analysis: Lecture 10

Statistical Data Analysis: Lecture 4

Statistical Data Analysis: Lecture 2

Statistical analysis and modeling of neural data Lecture 4

Statistical Data Analysis: Lecture 6

Statistical NLP: Lecture 4