1 / 33

Pattern Recognition and Machine Learning

Lars Kasper, December 15 th 2010. Pattern Recognition and Machine Learning. Chapter 12: Continuous Latent Variables. Relation To Other Topics. Last weeks: Approximate Inference Today: Back to data-preprocessing Data representation/Feature extraction “Model-free” analysis

lora
Télécharger la présentation

Pattern Recognition and Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lars Kasper, December 15th 2010 Pattern Recognition and Machine Learning Chapter 12: Continuous Latent Variables

  2. Relation To Other Topics Last weeks: Approximate Inference Today: Back to data-preprocessing Data representation/Feature extraction “Model-free” analysis Dimensionality reduction The matrix Link: We also have a (particular easy) model of the underlying state of the world whose parameters we want to infer from the data

  3. Take-home TLAs (Three-letter acronyms) Although termed “continuous latent variables”, we mainly deal with PCA (PrincipalComponent Analysis) ICA (Independent Component Analysis) Factoranalysis General motivation/theme: “What is interesting about my data – but hidden (latent)? … And what is just noise?”

  4. Importance Sampling ;-) Publications concerning fMRIand (PCA or ICA orfactor Analysis) Source: ISI Web ofKnowledge, Dec 13th, 2010

  5. Importance Sampling: fMRI Used for fMRI analysis, e.g. software package FSL: “MELODIC” MELODIC Tutorial: 2nd principal component (eigenimage) and corresponding time series of a visual block stimulation

  6. Motivation: Low intrinsic dimensionality • Generating hand-written digit samples by translating and rotating one example 100 times • High dimensional data (100 x 100 pixel) • Low degrees of freedom (1 rotation angle, 2 translations)

  7. Roadmap fortoday

  8. Heuristic PCA: Projection View 2D-data Projected on 1D-line • How do we simplify or compress our data (make it low-dimensional) without losing actual information? • Dimensionality reduction by projecting on a linear subspace

  9. Heuristic PCA: DimensionalityReduction • Advantages: • Reducedamountofdata • Mightbeeasiertorevealstructurewithininthedata (patternrecognition, datavisualization)

  10. Heuristic PCA: Maximum Variance View We wanttoreducethedimensionalityofourdataspace via a linear projection. But we still wanttokeeptheprojectedsamplesasdifferent aspossible. A goodmeasureforthisdifferenceisthedatacovarianceexpressedbythematrix Note: This expressesthecovariancebetween different datadimensions, notbetweendatapoints. Wenowaimtomaximizethevarianceoftheprojecteddata in theprojectionspacespannedby the basis vectors .

  11. Maximum Variance View: The Maths Maximum varianceformulationof 1D-projection withprojectionvector: Constraint optimization: Leads to best projector being an eigenvector of , the data covariance matrix: with maximum projected variance equal to the maximum eigenvalue:

  12. Heuristic PCA: Conclusion Byinductionweyieldthegeneral PCA resulttomaximizethevarianceofthedata in theprojecteddimensions: The projection vectors shallbetheeigenvectorscorrespondingtothelargesteigenvaluesofthedatacovariancematrix. These vectorsarecalledtheprincipalcomponents.

  13. Heuristic PCA: Minimum error formulation By projecting, wewantto lose asfewinformationaspossible, i.e. keeptheprojecteddatapointsassimiliartotherawdataaspossible. Thereforeweminimizethemeanquadraticerror Withrespecttotheprojectionvectors. This leadstothe same resultas in themaximumvarianceformulation: shallbetheeigenvectorscorrespondingtothelargesteigenvaluesofthedatacovariancematrix.

  14. Example: Eigenimages

  15. Eigenimages II Christopher DeCorohttp://www.cs.princeton.edu/cdecoro/eigenfaces/

  16. DimensionalityReduction

  17. Roadmap fortoday

  18. Probabilistic PCA: A synthesizer’sview • – standardised normal distribution • Independent latent variables withzeromean & unitvariance • – a sphericalGaussian • i.e. identicalindependentnoise in eachofthedatadimensions • Prior predictiveor marginal distributionofdatapoints:

  19. Probabilistic PCA: ML-solution Same as in heuristic PCA matrix of first eigenvectors, diagonal matrix of eigenvalues Only specified up to a rotation in latent space

  20. Recap: The EM-algorithm The Expectation-Maximizationalgorithmdeterminesthe Maximum Likelihood-solution forourmodelparametersiteratively Advantageouscomparedtodirecteigenvectordecomposition, if, i.e. ifwehaveconsiderablyfewer latent variables thandatadimensions Projection on a very low dimensional space, e.g. for data visualization to

  21. EM-Algorithm: Expectation Step We consider the complete-data likelihood Maximizing the marginal likelihood insteadwouldneed an integrationover latent space E-Step: The posterior distribution of the latent variables is updated and used to calculate the expected value ofthecomplete-data log likelihoodwithrespectto Keeping estimates of fixed

  22. EM-Algorithm: Maximization Step M-Step: The calculated expectation isnowmaximizedwithrespectto: keeping theestimatedposteriordistributionoffixedfromthe E-Step

  23. EM-algorithmfor ML-PCA M E M Green dots: Data points, alwaysfixed Expectation: Redrodisfixed, cyanconnectionofbluespringsmoves obeying spring forces ( Maximization: Cyan connectionsarefixed, redrodmoves (obey spring forces)

  24. Roadmap fortoday

  25. Bayesian PCA – Findingthe real dimension Maximum Likelihood Bayesian PCA Estimated projection matrix for an dimensional latent variable modelandsyntheticdatageneratedfrom a latent modelwith Estimating Introducinghyperparameters, marginalizing:

  26. Roadmap fortoday

  27. Factor Analysis: A non-spherical PCA with) Noise is still independentandGaussian Controversy: Do thefactors (dimensionsof) have an interpretablemeaning? Problem: posterior invariant wrtrotationsof

  28. Independent Component Analysis (ICA) with Still a linearmodelofindependentcomponents Nodatanoisecomponents, fordim(latent space) = dim(dataspace) Explicitly Non-Gaussian Otherwise, noseparationofmixingcoefficients in from latent variables wouldbepossible Rotationalsymmetry Maximizationof Non-Gaussianity/Independence Different criteria, e.g. kurtosis, skewness Minimizationof mutual information

  29. ICA vs PCA Unsupervised method: No class labels! ICA rewards bi-modalityofprojecteddistribution PCA rewardsmaximumvariancebetweenelements ICA 1st independent component PCA 1st principalcomponent

  30. Summary

  31. Relation To Other Topics Today data-preprocessing Whitening via covariance => Identity Data representation/Feature extraction “Model-free” analysis Well: NO! We have seen the model assumptions in probabilistic PCA Dimensionality reduction Via projection on basis vectors carrying the most variance/leaving the smallest error At least for linear models, not for kernel PCA The matrix

  32. Kernel PCA • Instead ofthe sample covariancematrix, wenowconsider a covariancematrix in a featurespace • As always, thekerneltrickof not computing in the high-dimensional featurespaceworks, becausethecovariancematrixonlyneedsscalarproductsofthe

  33. Kernel PCA – Example: Gaussiankernel • Kernel PCA does not enabledimensionalityreduction via • is a manifold in featurespace, not a linear subspace • The PCA projectsontosubspaces in featurespacewithelements • These elementstypically not lie in , so theirpre-images ) will not be in dataspace

More Related