Univariate and Bivariate Analysis (Source: W.G Zikmund, B.J Babin, J.C Carr and M. Griffin, Business Research Methods, 8th Edition, U.S, South-Western Cengage Learning, 20
Types of Statistical Analysis • Univariate Statistical Analysis • Tests of hypotheses involving only one variable. • Testing of statistical significance • Bivariate Statistical Analysis • Tests of hypotheses involving two variables. • Multivariate Statistical Analysis • Statistical analysis involving three or more variables or sets of variables.
Statistical Analysis: Key Terms • Hypothesis • Unproven proposition: a supposition that tentatively explains certain facts or phenomena. • An assumption about nature of the world. • Null Hypothesis • No difference in sample and population. • Alternative Hypothesis • Statement that indicates the opposite of the null hypothesis.
Significance Levels and p-values • Significance Level • A critical probability associated with a statistical hypothesis test that indicates how likely an inference supporting a difference between an observed value and some statistical expectation is true. • p-value • Probability value, or the observed or computed significance level. • p-values are compared to significance levels to test hypotheses. • Higher p-values equal more support for an hypothesis.
Type I and Type II Errors • Type I Error • An error caused by rejecting the null hypothesis when it is true. • Practically, a Type I error occurs when the researcher concludes that a relationship or difference exists in the population when in reality it does not exist. • “There really are no monsters under the bed.”
Type I and Type II Errors (cont’d) • Type II Error • An error caused by failing to reject the null hypothesis when the alternative hypothesis is true. • Practically, a Type II error occurs when a researcher concludes that no relationship or difference exists when in fact one does exist. • “There really are monsters under the bed.”
Choosing the Appropriate Statistical Technique • Choosing the correct statistical technique requires considering: • Type of question to be answered • Number of variables involved • Level of scale measurement
Parametric versus Nonparametric Tests • Parametric Statistics • Involve numbers with known, continuous distributions. • Appropriate when: • Data are interval or ratio scaled. • Sample size is large. • Nonparametric Statistics • Appropriate when the variables being analyzed do not conform to any known or continuous distribution.
Bivariate Analysis - Introduction • Measures of Association • Refers to a number of bivariate statistical techniques used to measure the strength of a relationship between two variables. • The chi-square (2) test provides information about whether two or more less-than interval variables are interrelated. • Correlation analysis is most appropriate for interval or ratio variables. • Regression can accommodate either less-than interval or interval independent variables, but the dependent variable must be continuous.
Bivariate Analysis—Common Procedures for Testing Association
Simple Correlation Coefficient • Correlation coefficient • A statistical measure of the covariation, or association, between two at-least interval variables. • Covariance • Extent to which two variables are associated systematically with each other.
Simple Correlation Coefficient • Correlation coefficient (r) • Ranges from +1 to -1 • Perfect positive linear relationship = +1 • Perfect negative (inverse) linear relationship = -1 • No correlation = 0 • Correlation coefficient for two variables (X,Y)
Correlation, Covariance, and Causation • When two variables covary, they display concomitant variation. • This systematic covariation does not in and of itself establish causality. • e.g., Rooster’s crow and the rising of the sun • Rooster does not cause the sun to rise.
Correlation Analysis of Number of Hours Worked in Manufacturing Industrieswith Unemployment Rate
Coefficient of Determination • Coefficient of Determination (R2) • A measure obtained by squaring the correlation coefficient; the proportion of the total variance of a variable accounted for by another value of another variable. • Measures that part of the total variance of Y that is accounted for by knowing the value of X.
Correlation Matrix • Correlation matrix • The standard form for reporting correlation coefficients for more than two variables. • Statistical Significance • The procedure for determining statistical significance is the t-test of the significance of a correlation coefficient.
Regression Analysis • Simple (Bivariate) Linear Regression • A measure of linear association that investigates straight-line relationships between a continuous dependent variable and an independent variable that is usually continuous, but can be a categorical dummy variable. • The Regression Equation (Y = α + βX ) • Y = the continuous dependent variable • X = the independent variable • α= the Y intercept (regression line intercepts Y axis) • β = the slope of the coefficient (rise over run)
The Regression Equation • Parameter Estimate Choices • βis indicative of the strength and direction of the relationship between the independent and dependent variable. • α (Y intercept) is a fixed point that is considered a constant (how much Y can exist without X) • Standardized Regression Coefficient (β) • Estimated coefficient of the strength of relationship between the independent and dependent variables. • Expressed on a standardized scale where higher absolute values indicate stronger relationships (range is from -1 to 1).