1 / 38

EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell

EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell. Objectives. To review basic statistical modelling To review the notion of probability distribution To review the notion of probability distribution To review the notion of probability density function

dorjan
Télécharger la présentation

EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EE3J2 Data MiningLecture 10 Statistical ModellingMartin Russell EE3J2 Data Mining

  2. Objectives • To review basic statistical modelling • To review the notion of probability distribution • To review the notion of probability distribution • To review the notion of probability density function • To introduce mixture densities • To introduce the multivariate Gaussian density EE3J2 Data Mining

  3. Discrete variables • Suppose that Y is a random variable which can take any value in a discrete set X={x1,x2,…,xM} • Suppose that y1,y2,…,yNare samples of the random variable Y • If cmis the number of times that the yn = xmthen an estimate of the probability that yn takes the value xmis given by: EE3J2 Data Mining

  4. Symbol 1 2 3 4 5 6 7 8 9 Total Num.Occurrences 120 231 90 87 63 57 156 203 91 1098 Discrete Probability Mass Function EE3J2 Data Mining

  5. Continuous Random Variables • In most practical applications the data are not restricted to a finite set of values – they can take any value in N-dimensional space • Simply counting the number of occurrences of each value is no longer a viable way of estimating probabilities… • …but there are generalisations of this approach which are applicable to continuous variables – these are referred to as non-parametric methods EE3J2 Data Mining

  6. Continuous Random Variables • An alternative is to use a parametric model • In a parametric model, probabilities are defined by a small set of parameters • Simplest example is a normal, or Gaussian model • A Gaussian probability density function (PDF) is defined by two parameters – its mean and variance  EE3J2 Data Mining

  7. Gaussian PDF • ‘Standard’ 1-dimensional Guassian PDF: • mean =0 • variance =1 EE3J2 Data Mining

  8. Gaussian PDF P(a  x  b) a b EE3J2 Data Mining

  9. Constant to ensure area under curve is 1 Defines ‘bell’ shape Gaussian PDF • For a 1-dimensional Gaussian PDF p with mean  and variance : EE3J2 Data Mining

  10. =0.1 =10.0 =1.0 =5.0 More examples EE3J2 Data Mining

  11. Fitting a Gaussian PDF to Data • Suppose y = y1,…,yn,…,yN is a set of N data values • Given a Gaussian PDF p with mean  and variance , define: • How do we choose  and  to maximise this probability? EE3J2 Data Mining

  12. Fitting a Gaussian PDF to Data Good fit Poor fit EE3J2 Data Mining

  13. Maximum Likelihood Estimation • Define the best fitting Gaussian to be the one such that p(y|,) is maximised. • Terminology: • p(y|,), thought of as a function of y is the probability (density) of y • p(y|,), thought of as a function of , is the likelihood of , • Maximising p(y|,) with respect to , is called Maximum Likelihood (ML) estimation of , EE3J2 Data Mining

  14. ML estimation of , • Intuitively: • The maximum likelihood estimate of  should be the average value of y1,…,yN, (the sample mean) • The maximum likelihood estimate of  should be the variance of y1,…,yN. (the sample variance) • This turns out to be true: p(y| , ) is maximised by setting: EE3J2 Data Mining

  15. Multi-modal distributions • In practice the distributions of many naturally occurring phenomena do not follow the simple bell-shaped Gaussian curve • For example, if the data arises from several difference sources, there may be several distinct peaks (e.g. distribution of heights of adults) • These peaks are the modes of the distribution and the distribution is called multi-modal EE3J2 Data Mining

  16. Gaussian Mixture PDFs • Gaussian Mixture PDFs, or Gaussian Mixture Models (GMMs) are commonly used to model multi-modal, or other non-Gaussian distributions. • A GMM is just a weighted average of several Gaussian PDFs, called the component PDFs • For example, if p1and p2are Gaussiam PDFs, then p(y) = w1p1(y) + w2p2(y) defines a 2 component Gaussian mixture PDF EE3J2 Data Mining

  17. Gaussian Mixture - Example • 2 component mixture model • Component 1: =0, =0.1 • Component 2: =2, =1 • w1 = w2=0.5 EE3J2 Data Mining

  18. Example 2 • 2 component mixture model • Component 1: =0, =0.1 • Component 2: =2, =1 • w1 = 0.2 w2=0.8 EE3J2 Data Mining

  19. Example 3 • 2 component mixture model • Component 1: =0, =0.1 • Component 2: =2, =1 • w1 = 0.2 w2=0.8 EE3J2 Data Mining

  20. Example 4 • 5 component Gaussian mixture PDF EE3J2 Data Mining

  21. Gaussian Mixture Model • In general, an M component Gaussian mixture PDF is defined by: where each pmis a Gaussian PDF and EE3J2 Data Mining

  22. Estimating the parameters of a Gaussian mixture model • A Gaussian Mixture Model with M components has: • M means: 1,…,M • M variances 1,…,M • M mixture weights w1,…,wM. • Given a set of data y = y1,…,yN, how can we estimate these parameters? • I.e. how do we find a maximum likelihood estimate of 1,…,M, 1,…,M, w1,…,wM? EE3J2 Data Mining

  23. Parameter Estimation • If we knew which component each sample ytcame from, then parameter estimation would be easy: • Set mto be the average value of the samples which belong to the mth component • Set mto be the variance of the samples which belong to the mth component • Set wmto be the proportion of samples which belong to the mth component • But we don’t know which component each sample belongs to. EE3J2 Data Mining

  24. This is a measure of how much yn ‘belongs to’ the mth component Solution – the E-M algorithm • Guess initial values • For each n calculate the probabilities • Use these probabilities to estimate how much each sample yn‘belongs to’ the mth component • Calculate: REPEAT EE3J2 Data Mining

  25. The E-M algorithm local optimum p(y | ) (0)… (i) Parameter set  EE3J2 Data Mining

  26. E-M Algorithm • Let’s just look at estimation of a the mean μ of a single component of a GMM • In fact, • In other words, λn is the probability of the mth component given the data point yn EE3J2 Data Mining

  27. Calculate from mth Gaussian component mth weight Sum over all components E-M continued • From Bayes’ theorem: EE3J2 Data Mining

  28. Example – initial model P(m1|y6)=λ1 m1 P(m2|y6)=λ2 m2 y6 EE3J2 Data Mining

  29. Example – after 1st iteration of E-M EE3J2 Data Mining

  30. Example – after 2nd iteration of E-M EE3J2 Data Mining

  31. Example – after 4th iteration of E-M EE3J2 Data Mining

  32. Example – after 10th iteration of E-M EE3J2 Data Mining

  33. Multivariate Gaussian PDFs • All PDFs so far have been 1-dimensional • They take scalar values • But most real data will be represented as D-dimensional vectors • The vector equivalent of a Gaussian PDF is called a multivariate Gaussian PDF EE3J2 Data Mining

  34. 1-dimensional Gaussian PDFs Multivariate Gaussian PDFs Contours of equal probability EE3J2 Data Mining

  35. 1-dimensional Gaussian PDFs Multivariate Gaussian PDFs EE3J2 Data Mining

  36. The covariance matrix Multivariate Gaussian PDF • The parameters of a multivariate Gaussian PDF are: • The (vector) mean  • The (vector) variance  • The covariance EE3J2 Data Mining

  37. Multivariate Gaussian PDFs • Multivariate Gaussian PDFs are commonly used in pattern processing and data mining • Vector data is often not unimodal, so we use mixtures of multivariate Gaussian PDFs • The E-M algorithm works for multivariate Gaussian mixture PDFs EE3J2 Data Mining

  38. Summary • Basic statistical modelling • Probability distributions • Probability density function • Gaussian PDFs • Gaussian mixture PDFs and the E-M algorithm • Multivariate Gaussian PDFs EE3J2 Data Mining

More Related