Statistics in Genetics: Combining Probabilities & Population Analysis

eatworms.swmed.edu/~leon • leon@eatworms.swmed.edu

Combining probabilities Samples and Populations Four useful statistics: The mean, or average. The median, or 50% value. Standard deviation. Standard Error of the Mean (SEM). Three distributions: The binomial distribution. The Poisson distribution. The normal distribution. Four tests The chi-squared goodness-of-fit test. The chi-squared test of independence. Student’s t-test The Mann-Whitney U-test. Basic Statistics

Combining probabilities • When you throw a pair of dice, what is the probability of getting 11?

Combining probabilities • The probability that allof several independentevents occurs is the productof the individual event probabilities. • The probability that oneof several mutually exclusiveevents occurs is the sumof the individual event probabilities.

Combining probabilities • When you throw a pair of dice, what is the probability of getting 11? • When you throw five dice, what is the probability that at least one shows a 6?

Populations and samples • What proportion of the population is female?

Populations and samples • What proportion of the population is female? • Abstract populations: what does a mouse weigh?

Populations and samples • What proportion of the population is female? • Abstract populations: what does a mouse weigh? • Population characteristics: • Central tendency: mean, median • Dispersion: standard deviation

Four sample statistics

Standard deviation and SEM • Use standard deviation to describe how much variation there is in a population. • Example: income, if you’re interested in how much income varies within the US population. • Use SEM to say how accurate your estimate of a population mean is. • Example: measurement of -gal activity from a 2-hybrid test.

Sample stats: recommendations • When you report an average, report it as mean  SEM. • Same for error bars in graphs. • In the figure caption or the table heading or somewhere, say explicitly that that’s what you’re reporting. • Use the median for highly skewed data.

Three distributions • The binomial distribution • When you count how many of a sample of fixed size have a certain characteristic. • The Poisson distribution • When you count how many times something happens, and there is no upper limit. • The normal distribution • When you measure something that doesn’t have to be an integer or when you average several continuous measurements.

The binomial distribution

The Poisson distribution

The normal distribution

Hypothesis testing

A genetic mapping problem

The experiment • Look at the SSR genotype of 40 e/e kids. • If about 1/4 are /, the SSR is probably unlinked. • If the number of / is much less than 1/4, the SSR is probably linked. • We’re going to figure out how to make the decision in advance, before we see the results.

Expected results if unlinked

Is the SSR linked? • We want to know if the SSR is linked to the epilepsy gene. • What would your answer be if: • 10/40 kids were /? • 0/40 kids were /? • 5/40 kids were /? • Need a way to set the cut-off.

Type I errors • Suppose that in reality, the SSR and the epilepsy gene are unlinked. • Still, by chance, the number of / in our sample may be <cut-off. • We would decide incorrectly that the genes were linked. • This is a type I error.

What’s the probability of a type I error () if we cut off at 5?

Probability of a type I error

Some terminology • The hypothesis that nothing special is going on is the null hypothesis, H0. • A type I error is the rejection of a true null hypothesis. • The probability of a type I error is called , or the level of significance.

Levels of significance • “Statistically significant,” if nothing more precise is added, means significant at P≤ 5%. • “Highly significant” is less universal, but typically means P≤ 1%. • The other level worth distinguishing isP≤ 0.1%. • Recommendation: stick with these levels, don’t report ridiculously low probabilities.

How many tails? • The test I have just described is a one-tailed test, because we were only interested in the possibility that the frequency of / was less than ¼. • More commonly, you want to test whether an observation is either less than or greater than a predicted value. • In that case you need two cutoffs, a lower one and an upper one. • The probability of a type I error will then be the sum of the probability of too low a number and the probability of too high a number.

Two tails of the binomial

The two-tailed test • Typically we put half of the probability (2.5%) in each tail. • Our decision rule will be to reject if n≤ 4 or if n≥ 16. • This is called a two-tailed test. • Recommendation: if you are at all uncertain, do a two-tailed test.

Statistical tests • Chi-squared goodness-of-fit test: • Test whether a single measurement from a binomial matches a theoretical value. • Test whether two Poisson distributions have equal means (by testing whether one measurement is 50% of the sum). • Chi-squared test of independence: • Test whether two binomial distributions have equal means. • Student’s t test: • Test whether two normal distributions have equal means. • Mann-Whitney U test: • Test whether two samples come from distributions with the same location. Can be used with any continuous distribution.

Test on the probability of a binomial variable • You looked at N things (people in the room for instance), and counted the number n who matched some criterion (female, for instance). • The null hypothesis is that this is a binomial with probability p0 (some definite value that you predict based on theory). • Chi-squared goodness-of-fit test. • Example: progeny classes from genetic cross.

Tests of independence • When you have measured two binomial variates to test if the p of the two distributions is the same. • Chi-squared test of independence. • For instance, suppose we want to know if the proportion of biologists who are women is different from the proportion of doctors who are women. So we count some biologists and some doctors and we find that 24/61 biologists are women (39%), but 36/72 doctors are women (50%). We could use a chi-squared test to find out if this difference is significant. (Turns out it isn’t even close.)

Student’s t test on the means of normal variables • This is when you have two sample averages and you want to know if they’re different. • For instance, maybe you have weighed mice that are homozygous for a gene knockout and their heterozygous siblings. The hotes weigh less, a common sign that they’re unhealthy in some way, and you want to know if the difference is significant. • This test assumes that weight (or at least the average of several weights) is normally distributed.

The Mann-Whitney U test • Used under almost exactly the same circumstances as the t-test. For instance, you could use it to compare mouse weights. • Doesn’t compare averages; compares the positions of the entire distributions. • This test makes NO ASSUMPTIONS about the underlying distributions. • Probably the most useful of all statistical tests.

THINK

Statistics in Genetics: Combining Probabilities & Population Analysis

Statistics in Genetics: Combining Probabilities & Population Analysis

Presentation Transcript

Dog Heartworm Paul R Earl Facultad de Ciencias Biológicas Universidad Autónoma de Nuevo León San Nicolás, NL 66451, Me

Effective Abstractions for Defining and Verifying Processes

The Tailor’s Wish Retold by Dorothy Leon

Non-invasive Prenatal Trisomy test

Georgia

Leon Ntziachristos Dimitrios Gkatzoflias Charis Kouridis Giorgos Mellios Savvas Geivanidis Zissis Samaras

Chapter 5 Slides

Pathogenesis and TCM Treatment of Cervical Spondylosis

Chapter 3 Slides

1. What famous band was the lead singer of Foo Fighters, Dave Grohl, previously in?

Chronic Intercarpal Instability

Chapter 32 Slides

Chapter 19 Slides

Josip Broz-TITO Personal details

Chapter 3 Digital Transmission Fundamentals