1 / 46

Generative (Bayesian) modeling

Generative (Bayesian) modeling. 04/04/2016. Slides by (credit to ): David M. Blei Andrew Y. Ng, Michael I. Jordan , Ido Abramovich , L . Fei -Fe i, P . Perona , J . Sivic , B . Russell, A. Efros , A . Zisserman , B . Freeman , Tomasz Malisiewicz , Thomas Huffman,

wturnbull
Télécharger la présentation

Generative (Bayesian) modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generative (Bayesian) modeling 04/04/2016

  2. Slides by (credit to): David M. Blei Andrew Y. Ng, Michael I. Jordan, IdoAbramovich, L. Fei-Fei, P. Perona,J. Sivic, B. Russell, A. Efros, A. Zisserman, B. Freeman,Tomasz Malisiewicz, Thomas Huffman, Tom Landauer and Peter Foltz, Melanie Martin, Hsuan-Sheng Chiu, HaiyanQiao, Jonathan Huang Thankyou!

  3. Generative modeling • unsupervised learning • … beyond clustering • How can we describe/model the world for the computer? • Bayesian networks!

  4. Bayesian networks • Directed acyclic graphs (DAG) whose nodes represent random variables • Arcs represent (directed) dependence between random variables

  5. Bayesian networks • Filled nodes: observable variables • Empty nodes: hidden (not observable) variables Zi wi1 w2i w3i w4i

  6. Collapsed notation of Bayesian networks • Frames indicates multiplications • E.g. N features and M instances: Zi wi1 w2i w3i w4i

  7. Generative (Bayesian) modeling Find the parameters of the given model which explains/reconstruct the observed data Model „Generative story” DATA

  8. Model „Generative story” • Model = Bayesian network • The structure of the network is given by the human engineer • The form of the nodes’ distribution (conditioned on their parents) is given as well • The parameters of the distributions have to be estimated from data

  9. Parameter estimation in Bayesian network – only observable variables • Bayesian network assumes that the variables only (directly) dependent from their parents → parameter estimation at each node can be carried out separetly • Maximum Likelihood (or Bayesian estimation)

  10. Expectation-Maximisation (EM) • The extension of Maximum Likelihood parameter estimation if hidden variables are present • We search for the parameter vector Φ which maximises the likelihood of the joint of observable variables X and hidden ones Z

  11. Expectation-Maximisation (EM) • Iterativealgorithm. Step l: • (E)xpectationstep: estimatethevalues of Z (calculate expectedvalues) usingΦl • (M)aximizationstep: Maximum likelihoodestimetionbyusing Z

  12. EM example There are two coins. We drop them together but we can observe only the sum of the heads: h(0)=4 h(1)=9 h(2)=2 What is the bias of the coins? Φ1=P1(H), Φ2=P2(H) ?

  13. EM example a single z hidden variable: what is the proportion of the first coin out of h(1)=9 init Φ10=0.2 Φ20=0.5 E-step

  14. EM example M-step

  15. Text classification/clustering • E.g. recognition of documents’ topic or clustering images based on their content • „Bag-of-words model” • The term-document matrix:

  16. Image Bag-of-”words”

  17. N documents: D={d1, … ,dN} • The dictionaryconsists of M words • W={w 1 , … ,w M} • The size of theterm-documentmatrix is N * M and itcontainsthenumber of occurancesof a certainwordinacertaindocument

  18. Drawbacks of the bag-of-words model • Word order is ignored • Synonyms: Wereferto a concept (object) bymultiplewords, e.g: tired-sleepy → lowrecall • Polysemy: most of wordshavemultiplesenses, pl: bank, chips → lowprecision

  19. Document clustering –unigram model • Let’s assign a „topic” to each document • The topics are hidden variables

  20. Generative story of „unigram model” • Howdocumentsgenerate? • „Drop” a topic (and a size) • Foreachwordpositiondrop a wordaccordingtothetopic’sdistribution TOPIC ... Word Word

  21. Unigram model Zi wi1 w2i w3i w4i Each M documents, • Drop a topic z. • Drop a word (independently from the others) from a multinomial distribution conditiond on z

  22. EM for clustering • E-step • M-step

  23. pLSA ProbabilisticLatentSemanticAnalysis • Weassign a distribution of topicstoeachofthedocument • Topicsstillhave a distrbution over words • The distributionsfoundareinterpretable

  24. Relation to clustering… • A document can belong to multiple clusters • We’re interested in the distribution of topics rather than pushing each doc into a cluster → more flexible

  25. Generative story of pLSA • Howdocumentsgenerate? • Generatethedocument’stopicdistribution • Foreachwordpositiondrop a topicfromthedoc’stopicdistribution • drop a wordaccordingtothetopic’sdistribution TOPICdistribution ... TOPIC TOPIC ... word word

  26. Example money money loan bank DOCUMENT 1: money1 bank1 bank1 loan1river2 stream2bank1 money1river2 bank1 money1 bank1 loan1money1 stream2bank1 money1 bank1 bank1 loan1river2 stream2bank1 money1river2 bank1 money1 bank1 loan1bank1 money1 stream2 .8 loan bank bank loan .2 TOPIC 1 .3 DOCUMENT 2: river2 stream2 bank2 stream2 bank2money1loan1 river2 stream2loan1 bank2 river2 bank2bank1stream2 river2loan1 bank2 stream2 bank2money1loan1river2 stream2 bank2 stream2 bank2money1river2 stream2loan1 bank2 river2 bank2money1bank1stream2 river2 bank2 stream2 bank2money1 river bank .7 river stream river bank stream TOPIC 2

  27. Parameter estimation (model fitting, training) ? DOCUMENT 1: money? bank? bank? loan? river? stream? bank? money? river? bank? money? bank? loan? money? stream? bank? money? bank? bank? loan? river? stream? bank? money? river? bank? money? bank? loan? bank? money? stream? ? TOPIC 1 DOCUMENT 2: river? stream? bank? stream? bank? money?loan? river? stream? loan? bank? river? bank? bank? stream? river?loan? bank? stream? bank? money?loan? river? stream? bank? stream? bank? money?river? stream?loan? bank? river? bank? money?bank? stream? river? bank? stream? bank? money? ? TOPIC 2

  28. pLSA Observable data Topic distributions For documents Term distributions over topics Slide credit: Josef Sivic

  29. Latent semantic analysis (LSA)

  30. Generative story of pLSA • Howdocumentsgenerate? • Generatethedocument’stopicdistribution • Foreachwordpositiondrop a topicfromthedoc’stopicdistribution • drop a wordaccordingtothetopic’sdistribution TOPICdistribution ... TOPIC TOPIC ... word word

  31. pLSA modell For each document d and word position: • For each word position drop a topic from the doc’s topic distribution • drop a word according to the topic’s distribution d zd1 zd2 zd3 zd4 wd1 wd2 wd3 wd4

  32. pLSAforimages (example) z d w N D “eye” Sivic et al. ICCV 2005

  33. pLSA – parameter estimation

  34. pLSA – E-step What is the expected value of hidden variables (topics z) if the parameter values are fixed

  35. pLSA – M-step We use the values of hidden hidden variables p(z|d,w)

  36. EM algorithm • It can converge to local optimum • Stopping condition?

  37. Approximate inference • The E-step in huge networks is unfeasible • There are approaches which are fast but do only approximate inference (=E-step) rather than exact one • The most popular approximate inferece method is: • Drop samples according to Bayesian network • The average of the samples can be used as expected values of hidden variables

  38. Markov Chain Monte Carlo method (MCMC) • MCMC is an approximate inference method • The samples are not independent from each other but they are generated one by one based on the previous sample (form a chain) • Gibbs sampling is the most famous MCMC method: • The next sample is generated by fixing all but variables and drop a value for the non-fixed one conditioned on the other ones

  39. Outlook

  40. Drawbacks of pLSA • It can be recalculated from scratch if a new document arrives • The number of parameters increases as the function of the number of instances • d is just an index, it doesn’t fit well into the generative story

  41. 1990 1999 2003

  42. 2010

  43. An even more complex task Recogniseobjectsinsideimageswithoutanysupervision Aftertrainingthemodelcan be appliedforunknownimagesaswell

  44. Summary • Generative (Bayesian) modelingenablestodefineanydescription/model of theworldwithanycomplexity (clustering is only an exampleforit) • EM algorithm is a generaltoolforsolvingparameterestimationproblemswherelatentvariables is incorporated

More Related