Experimental Design

Experimental Design - The basics Richard Preziosi

How to formulate hypotheses Where do you start? What is a hypothesis? Stating a hypothesis Generating predictions Statistical hypotheses (different!) Only after completing this process will you be able to decide what data to collect

Hypotheses: Where do you start? • Start by stating your research question • E.g. ‘Why are male and female humans different sizes?’ • Your question may easily produce more than one hypothesis, that’s fine.

Hypothesis: • A hypothesis is a clear statement articulating a plausible candidate explanation for observations • It should be constructed in such a way as to allow gathering of data that can be used to either refute or support the candidate explanation

Stating a Hypothesis: • Phrase your hypothesis as a possible answer to your research question. • E.g. ‘Male and female size differ because males grow faster than females’

Generating predictions • These are the testable statements that follow logically from your hypothesis • E.g. ‘males have a faster growth rate than females’

Statistical hypotheses • Predictions should lead you to testable statistical hypotheses • Note that the hypothesis of interest in statistics is the one where nothing is different (the null hypothesis) • A clearly stated null hypothesis will generally lead you to the correct statistical test • E.g. ‘There is no difference in the growth rate of males and females’

Question Hypothesis Predictions Statistical (Null) hypothesis

Pitfalls of generating predictions • Weak tests • Indirect measures • Non-useful outcomes • Your tests must satisfy the devil’s advocate (e.g. reviewers or examiners)

Weak test • Consider the hypothesis: Students enjoy the course in radiation training more than the workshop in experimental design • Prediction: Students will get better grades in radiation training than in experimental design • This is a weak test (prediction) because other explanations are equally likely AND because we have used an indirect measure (grades as a measure of enjoyment)

Non-useful outcomes • These are hypotheses that may well prove interesting if true but are uninformative if false

Satisfying skeptics • Reviewers will look for logical flaws in your experiments. You do not want to finish your paper with: • ‘My results indicate that mechanism A determines apoptosis rates. Although mechanism B could also produce the same response I believe that mechanism A is the important one’ • This will earn you a review of the form • ‘This study provides no clear evidence to distinguish between mechanisms A and B. The authors need to redesign their study and start again. Recommendation, reject this manuscript’

Pilot Studies and Preliminary Data • May be observational or mini-experiments • Ensures sensible questions • Can you observe the phenomenon? • Practice and validate techniques • Minimize training effects of data • Recognize logistic constraints • Standardization across observers • Allows tuning of design and statistics • Assessment of sample sizes (power) • Test run of statistical analysis

Experimental ManipulationVs. Natural Variation • In Manipulation studies you change an aspect of the system and measure effects on traits of interest (majority of lab studies and Agricultural studies) • In Correlational studies you measure associations between traits of interest (often assuming one is influencing the other) (Many Environmental and most Human studies) • Consider the hypothesis ‘Long tail streamers seen in many species of birds have evolved to make males more attractive to females’

Correlational study usingNatural Variation • In the bird tail length example we could • Measure the tails of males at the beginning of the breeding season • Observe the number of matings each male has • Do statistics to determine if there is a relationship between tail length and number of matings • Results showing a relationship would support our hypothesis • Results not showing a relationship would go against our hypothesis

Manipulative study • In the bird tail length example we could have 4 groups of birds • Results showing males with artificially long tails had more mates supports our hypothesis • Results showing males with reduced tails had fewer mates also supports our hypothesis • A comparison of group 1 males with the unmanipulated males acts as a control comparison

Arguments for correlational studies • Often less work (but larger sample sizes usually needed) • Deals with real levels of biological variation (manipulations may take things outside naturally occurring limits) • Requires less handling of organisms (important if there are constraints like stress to animals or endangered species) • Manipulative studies may produce unintended effects (e.g. flight ability in example or epistatic effects in knockouts) • Manipulation may not be possible • May provide a baseline study manipulative expts.

Arguments for manipulative studies(really, against correlational studies) • Third variables • Reverse causation • These can be BIG problems if they occur

Third Variables • Third variables occur when there is an apparent link between A and B but in fact there is no direct link or mechanism. Instead both A and B depend on C, the third variable. • This means that patterns in correlations studies are just that, correlations. • Remember, correlation does not imply causation

Third Variables - an example • In the bird tail length example lets say that we do see a correlation between tail length and number of mates • Suppose that females are actually attracted to territories not males, but that males on better territories can grow larger tails • The third variable here is territory quality and it drives both tail length and number of mates and produces an ‘apparent’ relationship.

Third Variables - Two famous examples • Fisher suggested that the link between smoking and cancer was correlational not causative and that another factor, perhaps stress, led people both to smoke and develop cancer. • Fewer women postgrads marry than women in the population as a whole. This relationship is presumable due to some other correlated factor (third variable)

Reverse causation • This occurs when it is assumed that ‘correlation implies causation’ • In some cases this can be ruled out based on other data or common sense • In the bird example it is unlikely that the number of mates for a male has any effect on tail length measured at the start of the mating season.

Reverse causation - a famous example • There is a correlation between the number of storks nesting in chimneys and the number of children in a house (old data from Holland) • Although storks bringing babies makes a nice story the causation is likely reversed • Larger families tend to live in larger houses with more chimneys, and hence more opportunities for storks to nest.

Variation, replication and sampling • Variation among individuals • Replication and the experimental unit • Pseudoreplication

Variation among individuals • Variation among individuals is a given for most biological systems • In any experiment we are concerned with variation in the Response or Dependent Variable • Variation in the response variable can be divided into; • Variation explained by experimental factors (IV) • Variation not explained by experimental factors (AKA error variation, random variation noise) • In most studies we are interested in reducing noise and, hopefully, increasing explained variation

Variation among individuals • Single measurements from each treatment do not allow us to distinguish between noise and effect • make sure you have a sufficient number of individuals that experience the same manipulation • These individuals that receive the same manipulation are called replicates • ‘What is the experimental unit?’

Pseudoreplication • This occurs when there is confusion between treatments, replicates and blocks. • Consider an experiment comparing the effect of a toxicant on fish behaviour. • Lets say the toxicant is prepared in a batch and drip fed into the treatments tanks (water is drip fed into the control tanks) • Are the replicates; • Each fish in a tank? • Each tank? • Each set of tanks on a common drip? • Each batch of toxicant? • Don’t expect a simple answer, the answer is in the biology, not in statistics

Common sources of Pseudoreplication • Shared enclosures • Common environments • Relatedness • Pseudoreplicated stimulus • Non-independence of group behaviour • Pseudoreplicated measurements over time • Species comparisons • Sometimes pseudoreplication is unavoidable

Random sampling • Proper random sampling means that each individual has an equal chance of being allocated to each treatment group • The problem with non-random treatment of samples is that any bias in assignment of individuals or systematic pattern to ‘errors’ may bias your results • True random samples almost always require the use of computers or random number tables

Random assignment and treatment • Random means not only random assignment but also random treatment • Lets say that you are examining the effect of rhizosphere bacteria on plant growth. • Not only should each plant have an equal opportunity of being assigned to the bacterial or non-bacterial (control) group all other aspects of the process should be random as well. • Plants should be planted in equivalent compost (possibly in random order) • Plants should be randomly allocated to growth chambers and perhaps positions in chambers

Haphazard sampling • Haphazard does not mean Random • A haphazard sample is based on personal assignment by the experimenter in a fashion that they believe is random • Often severely biased even if the experimenter is consciously trying to take a random sample • Consider trying to ‘randomly’ select mice from a bucket or ‘randomly’ pippetting out aliquots of a cell culture • True random samples usually involve setting up experimental units BEFORE assigning treatments • BUT this is not always possible, use common sense (or blind assignment)

Self selection • This is a real problem with survey or poll data • The subset of a population that respond to surveys is rarely a random sample and thus may bias your results • By all means use surveys to inform your research BUT be very suspicious of anything but general conclusions

Pitfalls of Random Sampling • Make sure that the randomization procedure you use does what you intend • Randomise the order of collecting data - learning effects • Random samples Vs. Representative samples - don’t let computers do your thinking for you

Sample size - how many replicates • Too few replicates can be a disaster - too many can be a crime! • Always use educated guesswork - i.e. look at similar experiments by previous workers and determine what worked. • Pay attention to differences between the studies • Formal power analysis - do if possible!!! • Requires that you have some guess of variation among replicates • Requires that you have an idea of how big of a treatment effect you can expect (or require) • Requires that you know what statistical test you will use

Sample size - Resource Equation Model • Can be used for complex studies or when variation among individuals is unknown • Only appropriate for quantitative data • Gives conservative estimates of sample size so more appropriate for: • Large effect size (e.g. lab rather than clinical) • Testing for significant effects rather than estimating parameters • E = N - T - B • N is the total number of individuals -1 • T is the number of treatments -1 • B is the number of blocks -1 • E is the error df and should be between 10 and 20 • In some cases E should be larger (see Festing et al.)

Sample size optimization (Festing et al.)

Controls • This is the reference against which the results of an experimental manipulation can be compared • Thus your control group should be identical to your treatment group in everything except the treatment itself • Simple concept, common mistake • If the predictions and statistical hypotheses have been constructed well then the control group will be obvious • Lack of a control group makes an experiment pointless

Types of Controls • Negative control - unmanipulated • Positive control - manipulated but not treated (vehicle control, sham procedure control) • Concurrent control - run at the same time as the treatment group • Historic control - based on previous data (be certain that individuals are identical except for the treatment)

Blind Procedures • Designed to remove the perception that unconscious bias might taint results • Particularly useful when response variables are measured in a subjective way • Blind Procedure - person measuring has does not know what treatment has been applied • Double Blind - Both the subject and the person measuring does not know the treatment assigned (human studies)

When controls are not needed (or allowed) • In medical or veterinary studies controls may be an ethical issue, Historical controls can be used but give careful consideration to criticisms • When sets of treatments are being compared (e.g. effect of two drugs on rat behaviour)

Factorial experiments • 2 group comparison (t-test) design • Treatment and control compared • 1 factor design • Control and several levels of treatment compared • 2 factor design • More than one treatment considered simultaneously • Allows estimation of both main effects AND the interaction between them

Main effects and interactions Food Strain Interact X - - X X - X X X X X X X X X

Main effects and interactions

Completely randomized designs Vs. Blocking • Completely Randomized designs are usually simple • Completely Randomized designs assume small among individual variation • If among individual variation can be attributed to a known factor then you can BLOCK by that factor, reduce error variation and increase your signal to noise ratio (=clearer results)

Advantages of blocking

Advantages of blocking • Blocking is commonly used to remove effects of • Space • Time • Individual characters that can be ranked • Continuous characters that effect among individual variation can be used as covariates to remove effects and improve signal to noise ratio

The most common design errors • Ad hoc designs • Inappropriate control/treatment groups • Sample sizes too large or too small • Failure to use blocking • Lab animal studies: failure to use isogenic strains when GxE unimportant

Experimental Design - The basics