Research Design and Analysis

Research Design and Analysis Jan B. Engelmann, Ph.D. Department of Psychiatry and Behavioral Sciences Emory University School of Medicine Contact: jan.engelmann@emory.edu

Brief review Central tendency spread

Brief review

Inferential statistics • Descriptive statistics are useful for describing data. • But, they will not tell us whether the difference between 2 means is due to chance, or significant. • A difference may simply be due to sampling error and may not be reproducible. • To test whether we are observing a true effect of our treatment on behavior we need inferential statistics, such as: • T-test • ANOVA • Correlation/Regression.

Correlation

Associations are important • We all try to make connections between events in the world. • This is an important task our brain accomplishes by way of forming associations. • These associations are the basis of many behaviors that ensure our survival. • E.g. coming close to someone with illness increases our chances of getting ill. • If you touch a stove you will burn yourself. • Prediction of your partner’s mood state when certain events occur.

Correlation • Correlation refers to whether the relationship between 2 variables is positive, negative or zero. • The degree of association between two variables can be statistically measured. • The Pearson product-moment correlation coefficient. • The Pearson r is a statistic that quantifies the extent to which two variables X and Y are associated, and whether their association is positive, negative or zero. • It assesses the degree to which X and Y vary together (covary). • Linear relatedness between X and Y.

Naming Conventions • Correlations are based on pairs of variables, X and Y. • The X variables is the independent (predictor) variable. • The Y variable is the dependent (criterion) variable. • Changes in X predict changes in Y. • A word of caution: correlation does not imply causation. • More on this later.

An example: extraversion • Extraverts are individuals who show sociable characteristics: • They are outgoing in their demeanor, are drawn to new people, seek new experiences. • Introverts are on the other end of this continuum: • They are generally shy and less sociable, especially in novel settings. • We are interested in developing a new scale of extraversions. • In it, we ask our participants how they would react invarioussocial situations. • How could we test the validity of our measure?

An example: extraversion • 1-week after completion of questionnaire, participants take part in a staged social interaction: a party. • We measure how many people participants choose to interact with. • We expect a certain relationship between our personality scores and observed social behavior: • We expect a positive correlation. • This indicates that our inventory is predictive of real world interactions.

Types of relationships • Positive relationship = as the value of X increases (decreases), the corresponding value of Y increases (decreases). • X and Y values change in the same direction. • Negative relationship = as the value of X increases (decreases), the corresponding value of Y decreases (increases). • X and Y values change in opposite directions.

Types of relationships • Positive correlation (+)

Types of relationships • Negative correlation (-)

Types of relationships • Zero correlation (0) • Strength of correlation: r varies between -1 and +1

Calculating r • The sums of squares is integral to a vast majority of inferential statistics tests. • The Pearson r is no exception: Sums of squares for X Sums of squares for Y

The cool thing is: • Numerator is referred to as covariance: • Another type of SS indicating how X and Y vary together. • Denominator: SS indicating how X and Y vary separately. • You can do this now! • You learned all the steps yesterday. • Here is the equation in simplified form: • Let’s try it: data set Extroversion at webpage.

Magnitude of r • > +/-0.8 – Very strong • 0.6 – 0.79 - Strong • 0.4 – 0.59 - Moderate • 0.2 – 0.39 - Weak • < 0.19 - Very weak • Coefficient of determination = r2 • Indicates the proportion of change in one variable that can be accounted for by another variable.

A note on causality • Correlation does not imply causation! • Example: Study on contraceptive use in Taiwan. • Researchers found that the single best predictor of contraceptive use was number of electrical appliances owned. • The third variable problem: Socioeconomic status and education level leads to more income that can be spent on toasters and such. • So there are three problems in correlational research: • We do not know the direction of the effect: • X can cause a change in Y • Y can cause a change in X • A third variable can cause a change in both X and Y.

Hypothesis testing

Hypothesis testing • Hypothesis testing is the process by which decisions are made concerning the values of parameters. • The decision to be made is whether our results are due to small chance fluctuations alone, or whether differences are in fact present. • E.g. Does the stress of divorce lead to behavioral problems in children? • A sample of 5 children from divorced households was drawn and tested on the Achenbach Youth Self-Report scale. • A mean score of 56 was obtained from our sample. • We know the population mean is 50. • Is this difference large enough to conclude that stress from divorce produces behavioral problems? • What is the likelihood of obtaining a score of 56 if our sample was drawn from a normal population?

The sampling distribution of a statistic • Inferential statistics rely heavily on sampling distributions. • A sampling distribution is the distribution of a statistic over repeated sampling from a population. • They tell us what values to expect for a particular statistics under predefined conditions. • Because we often work with means, we focus here on the sampling distribution of the mean. • The sampling distribution of the mean is the distribution of sample means over repeated sampling from one population.

What is the sampling distribution of the mean? • Typically, sampling distributions are derived mathematically. • Conceptually, it is the distribution of means of an infinite number of random samples (with a specified N) drawn under specified conditions • Conditions: population mean = 50, standard deviation = 10. • Samples are repeatedly drawn from the same population and a distribution is plotted. • For each sample, the mean is calculated and recorded. • After doing this a large number of times, we will be able to draw the sampling distribution of the mean. • We will find that this distribution is Gaussian (bell shaped) and normally distributed. • We will also find that most observation cluster around the population mean.

Illustration of the sampling distribution Population μ=50 σ=10 Repeated sampling Sample Plot each sample

Sampling distribution of the mean • From the sampling distribution of the mean, we can tell that certain values are very likely, while others are highly unlikely. • E.g. most values cluster around the population mean. • Such values are likely to be obtained in any given sample. • Others are at the tails of the distribution. • These are very unlikely sample means in any given sample. • So, the sampling distribution tells us what values to expect in our sample IF we in fact obtained a sample from this population. • We can even assign an exact probability to obtaining a certain sample mean if we sampled from a given population.

Hypothesis testing revisited • Now that we know about sampling distributions, we can do some hypothesis testing. • How can we test what the likelihood of obtaining a sample mean of 56 is, if this sample was drawn from a population with a mean of 50? • 56 surely is larger than 50, but is this difference significant? • If we did in fact sample from a population with a mean of 50, the probability of sampling a mean of 56 is 0.09. • About 10% of the time would we get a sample mean of 56. • Likely same population. • How about a sample mean of 62? • The probability of that is equal to 0.0037 • Likely different population.

Hypothesis testing steps • 1. Set up a research hypothesis. • Children under the stress of divorce are more likely to exhibit behavior problems than normal children. • The sample was not drawn from a population of normal children. • 2. Set up a null hypothesis (Ho). • Children under the stress of divorce are equally likely to exhibit behavioral problems as normal children. • The sample was drawn from a population of “normal” children, whose parameters we know. • 3. Obtain a random sample of children under stress.

Hypothesis testing steps • 4. Obtain the sampling distribution of the mean assuming that the Ho is true. • 5. Given the sampling distribution, calculate the probability of obtaining a sample mean at least as large as the observed value. • 6. On the basis of this probability, reject or accept the Ho. • Accept Ho = sample was drawn from normal population. • Reject Ho = sample was drawn from particularly stressed population.

Why test the null hypothesis • 1. Fisher: “We can never prove something to be true, but we can prove something to be false.” • Observing 4000 cows with one head does not prove the statement “Every cow has one head” right. • Finding one cow with two heads, however, proves it wrong. • 2. Practicality: In experiments, we typically want to show that some value (treatment) is greater or smaller than another (control). • Research hypotheses are typically not specific enough. • E.g. the population mean of children stressed by divorce could be 62, 63, or greater or smaller than that. We do not know. • Having a specific null hypothesis remedies this problem and allows us to obtain a sampling distribution.

Sampling distributions of test statistics • Test statistics are the results of statistical tests, such as: • T-tests (t), ANOVAs (F), correlations (r), etc. • These tests are specific statistical procedures used to infer the statistical significance of results. • They have their own sampling distributions that are obtained and used in the same way as the sampling distribution of the mean. • We will never actually do this – they are inferred mathematically.

Decision making (type I and II errors) • The decision we are making, no matter the statistic, is whether to accept or reject Ho. • Once we know the conditional probability of obtaining a sample under the null, we can make such a decision. • Decision to reject depends on significance level. • If this probability is < 0.05 we reject Ho. • α-level = 0.05. • This means that we erroneously reject Ho 5% of the time we conduct this experiment (Type I error). • This standard has been adapted as sufficiently unlikely by behavioral scientists. • But we can also erroneously fail to reject the Ho (Type II error).

The t-test

Good things come from beer • The student’s t test. • The t-test was developed as a method to improve beer quality. • There are 3 applications of t tests: • 1. One-sample t test: Comparison of sample mean to population mean. • 2. Independent-samples t test: Comparison of sample means from control and experimental groups. • 3. Paired-samples t test: employed in repeated measures designs to test effect of treatment.

Assumptions underlying the t test • 1. The population the sample data are drawn from is normally distributed. • 2. Data are randomly sampled from a population so that we can generalize back to population. • 3. Data need to be interval/ratio scales so that we can calculate the mean. • 4. Samples need to have equal variances.

What the t test does • The t test assesses whether any measurable difference exists between means. • Absence of difference is indicated by a t statistic of 0. • Both positive and negative t values are possible, providing an indication of which mean is larger and smaller. • How great is the difference between sample means relative to the standard error? • Let us look at an equation.

T test • The t test basically contrasts the difference between 2 means to the standard error. • The standard error is a measure of our sampling error. • Based on the type of comparison we are making, the error term changes. • So, the t test provides a numerical value of the extent (ratio) to which the effect of our manipulation exceeds the amount of error we can expect from sampling.

The error term revisited • Error term represents a combined index of the average deviation from the mean behavior within each group. • Referred to as within-group variability. • This is explained by 2 sources of error: • 1. random error (sampling error/individual differences). • 2. experimental error is due to failures on the experimenters part.

If the difference between the means exceeds our estimated sampling (random) error, we conclude that the difference is due to our manipulation, not chance. • If the difference does not exceed estimated random error, we conclude that findings are due to change.

One sample t test • One-sample t tests are used to compare an observed mean with a hypothesized value assumed to represent the population. • Typically employed to test whether observed scores deviated from an already established pattern. Degrees of freedom: df = N -1

Independent-sample t-test • Most common, because it allows us to verify whether a difference between means exists. • Did our experimental manipulation have an effect? • This type of test is employed to analyze between-subjects designs. • Did the independent variable create an observable change in behavior measured by the dependent variable. • Denominator is the standard error of the difference between the means. Degrees of freedom: df= (N1 + N2) -2

Conceptual model of t test • Between groups variation = the difference of interest • E.g. the difference between 2 means. • Within groups variation = summary of error sources. • This almost always involves the calculation of SS.

Let us do some statistics • Data set handout. • Problem 1: • 1. Use Excel to conduct t test. • 2. Use SPSS to conduct t test. • 3. Plot results in Excel and SPSS.

Our experiment

What we tested • The effect of level of optimism on the efficacy of simple mood induction. • What did we do? • How do we test this? • Measure we obtained: • 1. Pre- and post-induction mood. • 2. LOT-R • What is that? • What we want to test: • 1. Was the mood induction effective? • 2. Did personality have an effect on efficacy?

Research Design and Analysis