Histograms…contd. • A histogram helps us study the shape of a frequency distribution • For example, we expect that length (or weight) of most organisms follows the familiar bell-shape • The bell-shape gives us biological information about the variable for the organism.
Histograms…contd. • However, a given histogram can only help us obtain information that is specific to the sample used. • If we can fit a curve to the histogram, we can then not only infer values for data-points within the range of the sample but also outside the range.
? 3.75 Intrapolation and Extrapolation If length (x) = 3.75 cm, what is the weight (y) expected to be? If length (x) = 3.75 cm, weight (y) = 0.0113(3.75)3.3409 = 0.9341 g
18 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10 2.20 15 15 12 12 9 9 4 4 1 1 Fry length class (cm) Probability of inclusion in distribution What is the chance that a random fry you pick is in the length range 1.15-1.27? What is the chance that a random fry that you pick is in the class 1.6-1.7? = 18/100 = 0.18 = 18% What is the chance that a random fry that you pick is in the class 1.4-1.9? = (12+15+18+15+12)/100 =92/100 = 0.92 = 92% What is the chance that a random fry that you pick is in the class 1.1-1.2? =1/100 = 0.01 = 1%
Mathematical equations are fit to frequency distributions or histograms. Probability distribution They are called probability distributions
Note that all the relative frequencies in a frequency table must add up to 1 or 100%. • In other words, the total area under the probability distribution curve = 1.
Some common probability distributions • Continuous distributions: • The Normal distribution • The t distribution (Student’s t distribution) • The F distribution • The chi-square (2) distribution • The Gamma distribution
Some common probability distributions…contd. • Discrete distributions: • The Binomial distribution • The Hypergeometric distribution • The Negative Binomial distribution • The Poisson distribution
The Normal (Gaussian) Distribution • The distribution model that perhaps fits most commonly used frequency distributions of continuous variables. • Familiar symmetric, bell shape curve • What does this shape mean, biologically? • Many other distributions, including discrete distributions, approximate the normal distribution, under certain conditions
Probability Variable The Normal distributions The location and shape of the normal probability curve is completely defined by just two parameters and two constants: Parameters: : true mean of the variable, and : true standard deviation of the variable Constants: : = approx. 3.14159, and e: = approx. 2.71828
Let us fit a normal distribution • Data • Steps: • Construct a frequency table for the data and draw a histogram. • Obtain the f(Y) values for the mid points of each class based on the formula for the normal distribution. • Superimpose the f(Y) values for the midpoints on the histogram. Note: This is only an approximate fitting of the normal curve to the frequency table.
Relative Frequency (Probability) Variable Shapes of the normal distribution
The Normal distributions…contd. • There isn’t just one normal distribution – there is a normal distribution for each combination of values of and . • The value of decides the location of the distribution and the value of decides the shape of the distribution.
Same different Same different Value of variable --- Location and Shape of the Normal distribution
34.13% 13.59% 2.14% Useful properties of the normal distribution contains 68.27% of the items 2 contains 95.45% of the items 3 contains 99.73% of the items
50% 95% 99% Useful properties of the normal distribution…contd. 50% of the items fall between 0.674 95% of the items fall between 1.960 99% of the items fall between 2.576
These properties are useful only if • It is known that the variable follows the normal distribution – not usually a serious problem • The true mean and standard deviation are known for the population – unfortunately almost never true. • Even if they are known, we then need to obtain the cut-off values (as in previous slide) for each variable (because there is a different normal distribution for each variable). • Of what use are these properties then?
The Standard Normal Distribution • As mentioned before, there are infinite number of normal distributions based on the values of and . • It would indeed be very tedious if the cut-off values have to be computed for each distribution. • Fortunately, there is another property of the normal distribution that allows us to standardize it.
The Standard Normal Distribution…contd. • If and are known then it is possible to compute the following: • This quantity, known as the standard normal deviate, gives the distance of an observation, Yi, from the mean, in terms of the standard deviation. • Thus, there is a change of units.
The Standard Normal Distribution…contd. • Z, known as the standard normal deviate, is also normally distributed, but with = 0and = 1 • This distribution is called the Standard Normal Distribution (SND). • The SND can be generated from any variable with any and as long as the variable is normally distributed.
+ - = 0 Standard Normal Distribution
The Standard Normal Distribution…contd. • For example, let us assume that five different labs work on fruitflies in a university, and the fruitfiles in eqch collection are of different sizes. For one of the collections (ours), we know that the population mean wing length, = 4.55 mmand the standard deviation, = 0.39 mm. • Furthermore, we know that the variable follows the normal distribution. • Then an individual wing length of 4.1mm will be -1.1538 standard deviations from the mean
The Standard Normal Distribution…contd. • A typical question can be: • You have found a fly in the cafeteria and you are not sure if it belongs to our lab collection. • You know the values of the following parameters for the lab collection: = 4.55 mmand = 0.390 mm
The Standard Normal Distribution…contd. • You measure the wing length of the fly (let’s assume that wing length is an important distinguishing character) and find it to be 3.7 mm. • Let’s assume that the chances of a fly escaping the collection are NOT related to its wing-size. • In other words, every fly has an equal chance of escaping the collection.
The Standard Normal Distribution…contd. • If the fly had a wing length of 4.55 mm then logical conclusion = it likely is from collection because the wing-length is representative of the flies in the collection. • But it is 3.7 mm. • That is, it is rather smaller than the mean. • So we ask – how likely is a fly with wing length 3.7mm likely to belong to our collection?
The Standard Normal Distribution…contd. • Of course, the only information we may have is that the parameters, = 4.55mmand = 0.39mm, and that the variable follows a normal distribution. • With this information, we know that we can find the area under the normal curve for any two given cut-off points. • Recall the graphs (next slide)
34.13% 50% 13.59% 95% 2.14% 99% contains 68.27% of the items 50% of the items fall between 0.674 2 contains 95.45% of the items 95% of the items fall between 1.960 3 contains 99.73% of the items 99% of the items fall between 2.576 The Standard Normal Distribution…contd.
The Standard Normal Distribution…contd. • We know that 3.7mm is less than the known parametric mean, 4.55mm. • But is it so low as to make it unlikely to belong to the collection? • So we ask – what proportion are as low as 3.7mm or lower?
We are interested in this area 4.55mm The Standard Normal Distribution…contd.
The Standard Normal Distribution…contd. • But how do we find that area if we do not have the original distribution? • Fortunately, we have all the information we need: • Variable follows normal distribution • Parametric mean = 4.55mm, and • Parametric standard deviation = 0.39mm
The Standard Normal Distribution…contd. With that information, we can obtain the Z value for our fly
The Standard Normal Distribution…contd. • Because the standard normal distribution is unique, the area between any two points on X-axis has already been computed • Available in any statistics textbook
Area given in table Area needed The Standard Normal Distribution…contd. Look up the standard normal tables with the value -2.18 We know that the area under the curve from 0 to -∞ is 0.5 The table gives us a value of 0.4854 as the area from 0 to -2.18 Therefore, the area from -2.18 to -∞ is given as 0.5 – 0.4854 = 0.0146 That is, approximately 1.5% of the flies from the collection are expected to have a wing length as small as 3.7 mm (z = -2.18) or lower. Z = -2.18 Or Wing-length = 3.7mm
The Standard Normal Distribution…contd. • The logic in making a decision about the question of whether the fly belongs to the lab collection is simple.
The Standard Normal Distribution…contd. • We don’t know which population the fly belongs to. • Even in our collection, flies with wing length ≤ 3.7mm is rare (~1.5%). • So is it reasonable to accept that a unknown fly of that wing length belongs to our collection?
The Standard Normal Distribution…contd. • General thumb rule: If the probability <0.05 (i.e., <5%) then you conclude that the chances that an observation belongs to that distribution are low. • So we conclude that the fly is unlikely to belong to our lab collection.
Some exercises • Show schematic diagrams of the area of interest, and the probabilities associated with the following • What if the wing-length of the fly was : 5.16mm? 5.26mm? 4.55mm? 3.63mm? 2.38mm? • What proportion of flies in the collection have wing-lengths between 3mm and 4mm? 5mm and 6mm 3mm and 7mm?