Probability distribution functions
E N D
Presentation Transcript
Probability distribution functions • Normal distribution • Lognormal distribution • Mean, median and mode • Tails • Extreme value distributions
Normal (Gaussian) distribution • Normal density function • What does the figure tell us about the values of the CDF?
More on the normal distribution • P = normcdf(X,MU,SIGMA) returns the cdf of the normal distribution with mean MU and standard deviation SIGMA, evaluated at the values in X. The size of P is the common size of X, MU and SIGMA. • normcdf(1)=0.8413. • 1-normcdf(6)= 9.8659e-010 • If X is normally distributed, Y=aX+b is also normally distributed. What would be the mean and standard deviation of Y? • Notation
Estimating mean and standard deviation • Given a sample from a normally distributed variable, the sample mean is the best linear unbiased estimator of the true mean. • For the variance the equation gives the best unbiased estimator, but the square root is not an unbiased estimate of the standard deviation x=randn(5,10000); s=std(x); mean(s) 0.9463 s2=s.^2; mean(s2) 1.0106
Lognormal distribution • If ln(X) has normal distribution X has lognormal distribution. That is, if X is normally distributed exp(X) is lognormally distributed. • Notation: • Probability distribution function (PDF) • Mean and variance
Mean, mode and median • Mode (highest point) • Median (50% of samples)
Light and heavy tails • Normal distribution has light tail. Six sigma is equivalent to .999999999 (nine nines) safety. • Lognormal is heavy tailed 0.9963 m=exp(0.5) m =1.6487 v=exp(1)*(exp(1)-1) v =4.6708 sig=sqrt(v) sig =2.1612 sig6=m+6*sig sig6 =14.6159 logncdf(sig6,0,1) =0.9963
Fitting distribution to data • Typically fit to CDF.
Empirical CDF [F,X] = ecdf(Y) calculates the Kaplan-Meier estimate of the cumulative distribution function (cdf), also known as the empirical cdf. Y is a vector of data values. F is a vector of values of the empirical cdf evaluated at X. [F,X,FLO,FUP] = ecdf(Y) also returns lower and upper confidence bounds for the cdf. These bounds are calculated using Greenwood's formula, and are not simultaneous confidence bounds. ecdf(...) without output arguments produces a plot of the empirical cdf. Use the data cursor to read precise values from the plot.
Example x=lognrnd(0,1,1,20);ecdf(x) hold on x=lognrnd(0,1,1,10000); ecdf(x)
Extreme value distributions • No matter what distribution you sample from, the mean of the sample tends to be normally distributed as sample size increases (what mean and standard deviation?) • Similarly, distributions of the minimum (or maximum) of samples belong to other distributions. • Even though there are infinite number of distributions, there are only three extreme value distribution. • Type I (Gumbel) derived from normal. • Type II (Frechet) e.g. maximum daily rainfall • Type III (Weibull) weakest link failure
Example x=5-0.3*randn(10,1000); minx=min(x); hist(minx); ecdf(minx)
Gumbel distribution • PDF and CDF • Mean, median, mode and variance
Weibull distribution • Probability distribution • Used to describe distribution Of strength or fatigue life in brittle materials (weakest link connection) • If it describes time to failure, then • k<1 indicates that failure rate decreases with time, • k=1 indicates constant rate, • k>1 indicates increasing rate. • Useful for other phenomena like wind speed distribution. • Can add 3rd parameter by replacing x by x-c.