Correlation Research & Inferential Statistics

Correlation Research&Inferential Statistics by Atheer L. Khamoo & Wissam A. Askar

Table of Content • Definition • Purpose • Independent and dependent variables • Scatter plot • Correlation coefficient • Range of correlational coefficient • Types of correlational study

The Goal of Correlational Research The goal of correlational research is to find out whether one or more variables can predict other variables. Correlational research allows us to find out what variables may be related. However, the fact that two things are related or correlated does not mean there is a causal relationship. It is important to make a distinction between correlation and causation. Two things can be correlated without there being a causal relationship

Independent and Dependent Variables • Independent variable: is a variable that can be controlled or manipulated. • Dependent variable: is a variable that cannot be controlled or manipulated. Its values are predicted from the independent variable

Example • Independent variable in this example is the number of hours studied. • The grade the student receives is a dependent variable. • The grade student receives depend upon the number of hours he or she will study. • Are these two variables related?

Definition of 'Pearson Coefficient' • A type of correlation coefficient that represents the relationship between two variables that are measured on the same interval or ratio scale.

Pearson Correlation Coefficient • A number between –1.0 and +1.0 • Describes the relationship between 2 variables: • direction (+ positive or – negative) • strength (from 0 to ±1) • r = +1.0 a very strong positive relationship • r = –1.0 a very strong negative relationship • r = 0 there is no relationship

Scatter Plot • The independent and dependent can be plotted on a graph called a scatter plot. • By convention, the independent variable is plotted on the horizontal x-axis. • The dependent variable is plotted on the vertical y-axis.

Correlation Coefficient • The correlation coefficient computed from the sample data measures the strength and direction of a relationship between two variables. • The range of the correlation coefficient is. - 1 to + 1 and is identified by r.

Positive and Negative Correlations • A positive relationship exists when both variables increase or decrease at the same time. (Weight and height). • A negative relationship exist when one variable increases and the other variable decreases or vice versa. (Strength and age).

Plotting correlations • each data point on the scatterplot indicates the score on both variables • GPA and Study hours per week • 3.0 18 • 3.5 21 • 2.4 12 • 1.8 10 • 2.7 11 • 5 data points, one for each student

Range of correlation coefficient • In case of exact positive linear relationship the value of r is +1. • In case of a strong positive linear relationship, the value of r will be close to + 1.

Range of correlation coefficient • In case of exact negative linear relationship the value of r is –1. • In case of a strong negative linear relationship, the value of r will be close to – 1.

Range of correlation coefficient In case of nonlinear relationship the value of r will be close to 0.

Types of Correlational Studies: 1. Naturalistic Observation2. The Survey Method 3. Archival Research

1. Naturalistic Observation Involves observing and recording the variables of interest in the natural environment without interference or manipulation by the experimenter.

Advantages of naturalistic observation • Gives the experimenter the opportunity to view the variable of interest in a natural setting. • Can offer ideas for further research. • May be the only option if lab experimentation is not possible.

Disadvantages of Naturalistic • Can be time consuming and expensive. • Does not allow for scientific control of variables. • Experimenters cannot control extraneous variables. • Subjects may be aware of the observer and may act differently as a result.

2. The Survey Method • Survey and questionnaires are one of the most common methods used in psychological research. In this method, a random sample of participants completes a survey, test, or questionnaire that relates to the variables of interest. Random sampling is a vital part of ensuring the generalizability of the survey results.

Advantages of the survey method • It’s fast, cheap, and easy. Researchers can collect large amount of data in a relatively short amount of time. • More flexible than some other methods

Disadvantages of the Survey Method: • Can be affected by an unrepresentative sample or poor survey questions. • Participants can affect the outcome. Some participants try to please the researcher, lie to make themselves look better, or have mistaken memories.

3. Archival Research • Archival research is performed by analyzing studies conducted by other researchers or by looking at historical patient records. For example, researchers recently analyzed the records of soldiers who served in the Civil War to learn more about PTSD ("The Irritable Heart")

Advantages of Archival Research: • The experimenter cannot introduce changes in participant behavior. • Enormous amounts of data provide a better view of trends, relationships, and outcomes. • Often less expensive than other study methods. Researchers can often access data through free archives or records databases.

Disadvantages of Archival Research: • The researchers have not control over how data was collected. • Important date may be missing from the records. • Previous research may be unreliable.

Inferential Statistics • One use of statistics is to be able to make inferences or judgments about a larger population based on the data collected from a small sample drawn from the population • Inferential statistics are used to draw conclusions about a population by examining the sample • Statistical inference is a procedure by means of which you estimate parameters (characteristics of population) from statistics (characteristics of samples). Population Sample

Estimation • Estimations are based on the laws of probability and are best estimates rather than absolute facts. In making any inferences, a certain degree of error is involved. Inferential statistics can be used to test hypotheses about populations on the basis of observations of a sample drawn from the population.

Rationale of sampling • The inductive method involves making observations and then drawing conclusions from these observation. • Samples must be representative if you are to be able to generalize with reasonable confidence from the sample to the population. • An unrepresentative sample is termed a biased sample

Steps in sampling 1. Probability sampling It involves sample selection in which the elements are drown by chance procedures. 2. Nonprobability sampling It includes methods of selection in which elements are not chosen by chance procedure.

The types of probability sampling 1. Simple random sampling 2. Stratified sampling 3. Cluster sampling 4. Systematic sampling

1. Simple random sampling It comprise the following steps: 1. Define the population 2. List all members of the population 3. Select the sample by employing a procedure where sheer chance determines which members on the list are drawn for the sample.

2. Stratified sampling Population consists of a number of subgroups, or strata, that may differ in the characteristics being studied.

3. Cluster sampling • With cluster sampling, the researcher divides the population into separate groups, called clusters. Then, a simple random sample of clusters is selected from the population. • The main difference between stratified and cluster sampling is that in stratified sampling all the strata need to be sampled. In cluster sampling one proceeds by first selecting a number of clusters at random and then sampling each cluster or conduct a census of each cluster. But usually not all clusters would be included.

4. Systematic sampling A common way of selecting members for a sample population using systematic sampling is simply to divide the total number of units in the general population by the desired number of units for the sample population. For example, if you wanted to select a random group of 1,000 people from a population of 50,000 using systematic sampling, you would simply select every 50th person, since 50,000/1,000 = 50.

Non probability sampling 1-Convenience sampling, It is a sampling method in which units are selected based on easy access/availability. which is regarded as the weakest of all sampling procedures, involves using available cases for study. 2-Purposive sampling _ also referred to as judgment sampling elements judged to be typical, or representative, are chosen from the population. 3-Quota Sampling involves selecting typical cases from diverse strata of a population. The quotas are based on known characteristics of the population to which you wish to generalize.

Random Assignment • When the primary goal of a study is to compares the outcomes of two treatments with the same dependent variable, random assignments is used. Here a chance procedure such as a table of random numbers is used to divide the available subjects into groups.

The size of the sample (fundamentals) How large should a sample be? Other things being equal a larger sample is more likely to be a good representative of the population than a smaller sample. However, the most important characteristic of a sample is its representativeness, not its size. A random sample of 200 is better than a random sample of 100, but a random sample of 100 is better than biased sample of 2500000.

The concept of sampling error The researcher has observed only a sample and not the entire population. Sampling error is “the difference between a population parameter and a sample statistic”.

Hypothesis Testing A statistical hypothesis is an assumption about a population parameter. This assumption may or may not be true. Hypothesis testing refers to the formal procedures used by statisticians to accept or reject statistical hypotheses.

There are two types of statistical hypotheses. • Null hypothesis. The null hypothesis, denoted by H0, is usually the hypothesis that sample observations result purely from chance. The null always says there is norelationship or difference. H0 (null) is that mean1=mean2, meaning the meanscores are equal OR thedifference between the mean scores is zero .

Alternative hypothesis ( Hi ) • It means, there is a difference between two groups or there is a statistically significant difference of population. • Types of alternative hypothesis. _ Non-directional _ Directional

Two types of errors can result from a hypothesis test. • Type I error. A Type I error occurs when the researcher rejects a null hypothesis when it is true. The probability of committing a Type I error is called the significance level. This probability is also called alpha, and is often denoted by α. • Type II error. A Type II error occurs when the researcher fails to reject a null hypothesis that is false. The probability of committing a Type II error is called Beta, and is often denoted by β. The probability of not committing a Type II error is called the Power of the test • Type I _ reject true null ; Type II _ accept a false

State level of significance • Level of significance = risk of rejecting a TRUE Hypothesis a = probability you will reject (e.g., 1% chance) a = probability you will not reject (e.g., 99%)

Degrees of Freedom • The number of degrees of freedom ( df ) is the number of observations free to vary around a constant parameter. To illustrate the general concept of degrees of freedom.

Correlation Research & Inferential Statistics