Créer une présentation
Télécharger la présentation

Télécharger la présentation
## Probability Distributions

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Histograms…contd.**• A histogram helps us study the shape of a frequency distribution • For example, we expect that length (or weight) of most organisms follows the familiar bell-shape • The bell-shape gives us biological information about the variable for the organism.**Histograms…contd.**• However, a given histogram can only help us obtain information that is specific to the sample used. • If we can fit a curve to the histogram, we can then not only infer values for data-points within the range of the sample but also outside the range.**?**3.75 Intrapolation and Extrapolation If length (x) = 3.75 cm, what is the weight (y) expected to be? If length (x) = 3.75 cm, weight (y) = 0.0113(3.75)3.3409 = 0.9341 g**18**1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10 2.20 15 15 12 12 9 9 4 4 1 1 Fry length class (cm) Probability of inclusion in distribution What is the chance that a random fry you pick is in the length range 1.15-1.27? What is the chance that a random fry that you pick is in the class 1.6-1.7? = 18/100 = 0.18 = 18% What is the chance that a random fry that you pick is in the class 1.4-1.9? = (12+15+18+15+12)/100 =92/100 = 0.92 = 92% What is the chance that a random fry that you pick is in the class 1.1-1.2? =1/100 = 0.01 = 1%**Mathematical equations are fit to frequency distributions or**histograms. Probability distribution They are called probability distributions**Note that all the relative frequencies in a frequency table**must add up to 1 or 100%. • In other words, the total area under the probability distribution curve = 1.**Some common probability distributions**• Continuous distributions: • The Normal distribution • The t distribution (Student’s t distribution) • The F distribution • The chi-square (2) distribution • The Gamma distribution**Some common probability distributions…contd.**• Discrete distributions: • The Binomial distribution • The Hypergeometric distribution • The Negative Binomial distribution • The Poisson distribution**The Normal (Gaussian) Distribution**• The distribution model that perhaps fits most commonly used frequency distributions of continuous variables. • Familiar symmetric, bell shape curve • What does this shape mean, biologically? • Many other distributions, including discrete distributions, approximate the normal distribution, under certain conditions**Probability**Variable The Normal distributions The location and shape of the normal probability curve is completely defined by just two parameters and two constants: Parameters: : true mean of the variable, and : true standard deviation of the variable Constants: : = approx. 3.14159, and e: = approx. 2.71828**Let us fit a normal distribution**• Data • Steps: • Construct a frequency table for the data and draw a histogram. • Obtain the f(Y) values for the mid points of each class based on the formula for the normal distribution. • Superimpose the f(Y) values for the midpoints on the histogram. Note: This is only an approximate fitting of the normal curve to the frequency table.**Relative Frequency (Probability)**Variable Shapes of the normal distribution**The Normal distributions…contd.**• There isn’t just one normal distribution – there is a normal distribution for each combination of values of and . • The value of decides the location of the distribution and the value of decides the shape of the distribution.**Same different **Same different Value of variable --- Location and Shape of the Normal distribution**34.13%**13.59% 2.14% Useful properties of the normal distribution contains 68.27% of the items 2 contains 95.45% of the items 3 contains 99.73% of the items**50%**95% 99% Useful properties of the normal distribution…contd. 50% of the items fall between 0.674 95% of the items fall between 1.960 99% of the items fall between 2.576**These properties are useful only if**• It is known that the variable follows the normal distribution – not usually a serious problem • The true mean and standard deviation are known for the population – unfortunately almost never true. • Even if they are known, we then need to obtain the cut-off values (as in previous slide) for each variable (because there is a different normal distribution for each variable). • Of what use are these properties then?**The Standard Normal Distribution**• As mentioned before, there are infinite number of normal distributions based on the values of and . • It would indeed be very tedious if the cut-off values have to be computed for each distribution. • Fortunately, there is another property of the normal distribution that allows us to standardize it.**The Standard Normal Distribution…contd.**• If and are known then it is possible to compute the following: • This quantity, known as the standard normal deviate, gives the distance of an observation, Yi, from the mean, in terms of the standard deviation. • Thus, there is a change of units.**The Standard Normal Distribution…contd.**• Z, known as the standard normal deviate, is also normally distributed, but with = 0and = 1 • This distribution is called the Standard Normal Distribution (SND). • The SND can be generated from any variable with any and as long as the variable is normally distributed.**+**- = 0 Standard Normal Distribution**The Standard Normal Distribution…contd.**• For example, let us assume that five different labs work on fruitflies in a university, and the fruitfiles in eqch collection are of different sizes. For one of the collections (ours), we know that the population mean wing length, = 4.55 mmand the standard deviation, = 0.39 mm. • Furthermore, we know that the variable follows the normal distribution. • Then an individual wing length of 4.1mm will be -1.1538 standard deviations from the mean**The Standard Normal Distribution…contd.**• A typical question can be: • You have found a fly in the cafeteria and you are not sure if it belongs to our lab collection. • You know the values of the following parameters for the lab collection: = 4.55 mmand = 0.390 mm**The Standard Normal Distribution…contd.**• You measure the wing length of the fly (let’s assume that wing length is an important distinguishing character) and find it to be 3.7 mm. • Let’s assume that the chances of a fly escaping the collection are NOT related to its wing-size. • In other words, every fly has an equal chance of escaping the collection.**The Standard Normal Distribution…contd.**• If the fly had a wing length of 4.55 mm then logical conclusion = it likely is from collection because the wing-length is representative of the flies in the collection. • But it is 3.7 mm. • That is, it is rather smaller than the mean. • So we ask – how likely is a fly with wing length 3.7mm likely to belong to our collection?**The Standard Normal Distribution…contd.**• Of course, the only information we may have is that the parameters, = 4.55mmand = 0.39mm, and that the variable follows a normal distribution. • With this information, we know that we can find the area under the normal curve for any two given cut-off points. • Recall the graphs (next slide)**34.13%**50% 13.59% 95% 2.14% 99% contains 68.27% of the items 50% of the items fall between 0.674 2 contains 95.45% of the items 95% of the items fall between 1.960 3 contains 99.73% of the items 99% of the items fall between 2.576 The Standard Normal Distribution…contd.**The Standard Normal Distribution…contd.**• We know that 3.7mm is less than the known parametric mean, 4.55mm. • But is it so low as to make it unlikely to belong to the collection? • So we ask – what proportion are as low as 3.7mm or lower?**We are interested**in this area 4.55mm The Standard Normal Distribution…contd.**The Standard Normal Distribution…contd.**• But how do we find that area if we do not have the original distribution? • Fortunately, we have all the information we need: • Variable follows normal distribution • Parametric mean = 4.55mm, and • Parametric standard deviation = 0.39mm**The Standard Normal Distribution…contd.**With that information, we can obtain the Z value for our fly**The Standard Normal Distribution…contd.**• Because the standard normal distribution is unique, the area between any two points on X-axis has already been computed • Available in any statistics textbook**Area given**in table Area needed The Standard Normal Distribution…contd. Look up the standard normal tables with the value -2.18 We know that the area under the curve from 0 to -∞ is 0.5 The table gives us a value of 0.4854 as the area from 0 to -2.18 Therefore, the area from -2.18 to -∞ is given as 0.5 – 0.4854 = 0.0146 That is, approximately 1.5% of the flies from the collection are expected to have a wing length as small as 3.7 mm (z = -2.18) or lower. Z = -2.18 Or Wing-length = 3.7mm**The Standard Normal Distribution…contd.**• The logic in making a decision about the question of whether the fly belongs to the lab collection is simple.**The Standard Normal Distribution…contd.**• We don’t know which population the fly belongs to. • Even in our collection, flies with wing length ≤ 3.7mm is rare (~1.5%). • So is it reasonable to accept that a unknown fly of that wing length belongs to our collection?**The Standard Normal Distribution…contd.**• General thumb rule: If the probability <0.05 (i.e., <5%) then you conclude that the chances that an observation belongs to that distribution are low. • So we conclude that the fly is unlikely to belong to our lab collection.**Some exercises**• Show schematic diagrams of the area of interest, and the probabilities associated with the following • What if the wing-length of the fly was : 5.16mm? 5.26mm? 4.55mm? 3.63mm? 2.38mm? • What proportion of flies in the collection have wing-lengths between 3mm and 4mm? 5mm and 6mm 3mm and 7mm?