1 / 12

Probabilistic Inference in PRISM

Probabilistic Inference in PRISM. Taisuke Sato Tokyo Institute of Technology. Problem. Statistical machine learning is a labor-intensive process: { modeling  learning  evaluation}* of trial-and-error

lazar
Télécharger la présentation

Probabilistic Inference in PRISM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Inference in PRISM Taisuke Sato Tokyo Institute of Technology

  2. Problem • Statistical machine learning is a labor-intensive process: • {modeling learning  evaluation}* of trial-and-error • Pains of deriving and implementing model-specific learning algorithms and model-specific probabilistic inference model-specific learning algorithms ... Model 1 Model 2 Model n EM2 EM1 EMn EM VB MCMC ...

  3. Our solution • Develop a high-level modeling language that offers universal learning and inference methods applicable to every model ... Model 1 Model 2 Model n modeling language EM VB MCMC ... • The user concentrates on modeling and the rest (learning and inference) is taken care of by the system

  4. PRISM(http://sato-www.cs.titech.ac.jp/prism/) Probabilistic models • Logic-based high-level modeling language Bayesian network New model HMM PCFG ... PRISM system EM/MAP VT VBVT VB MCMC Learning methods • Its generic inference/learning methods subsume standard algorithms such as FB for HMMs and BP for Bayesian networks

  5. Basic ideas • Semantics • program = Turing machine + probabilistic choice • + Dirichlet prior • denotation = a probability measure over possible worlds • Propositionalized probability computation (PPC) • programs written at predicate logic level • probability computation at propositional logic level • Dynamic programming for PPC • proof search generates a directed graph (explanationgraph) • Probabilities are computed from bottom to top in the graph • Discriminative use • generatively define a model by a PRISM program and descriminatively use it for better prediction performance

  6. b o ABO blood type program msw(abo,a) is true with prob. 0.5 values(abo,[a,b,o],[0.5,0.2,0.3]). btype(X):- gtype(Gf,Gm), pg_table(X,[Gf,Gm]). pg_table(X,GT):- ((X=a;X=b),(GT=[X,o];GT=[o,X];GT=[X,X]) ; X=o,GT=[o,o] ; X=ab,(GT=[a,b];GT=[b,a])). gtype(Gf,Gm):- msw(abo,Gf),msw(abo,Gm). father mother a b a o AB A child B probabilistic primitivessimulate gene inheritance from father (left) and mother (right)

  7. Propositionalized probability computation Explanation graph for btype(a) that explains how btype(a) is proved by probabilistic choice made by msw-atoms btype(a) <=> gtype(a,a) v gtype(a,o) v gtype(o,a) gtype(a,a) <=> msw(abo,a) & msw(abo,a) gtype(a,o) <=> msw(abo,a) & msw(abo,o) gtype(o,a) <=> msw(abo,o) & msw(abo,a) 0.55 Sum-product computation of probabilities in a bottom-up manner using probabilities assigned to mswatoms 0.15 0.15 0.25 0.25 0.5 0.5 0.15 0.5 0.3 PPC+DP subsumes forward-backward, belief propagation, inside-outside computation Expl. graph is acyclic and dynamic programming (DP) is possible 0.15 0.3 0.5

  8. Learning • A program defines a joint distributionP(x,y|q) where x hidden and y observed • P(msw(abo,a),..btype(a),… |qa,qb,qo) where qa+qb+qo=1 • Learning q from observed data y by maximizing • P(y|q)  MLE/MAP • P(x*,y|q) where x* = argmax_x P(x,y|q)  VT • From a Bayesian point of view, a program defines marginal likelihood ∫P(x,y|q,a) dq • We wish to compute • predictive distribution = ∫P(x|y,q,a) dq • marginal likelihood P(y|a) = Sx∫P(x,y|q,a) dq • Both need approximation • Variational Bayes (VB)  VB, VB-VT • MCMC  Metropolis-Hastings

  9. Sample session 1- Expl. graph and prob. computation built-in predicate | ?- prism(blood) loading::blood.psm.out | ?- show_sw Switch gene: unfixed_p: a (p: 0.500000000) b (p: 0.200000000) o (p: 0.300000000) | ?- probf(btype(a)) btype(a) <=> gtype(a,a) v gtype(a,o) v gtype(o,a) gtype(a,a) <=> msw(gene,a) & msw(gene,a) gtype(a,o) <=> msw(gene,a) & msw(gene,o) gtype(o,a) <=> msw(gene,o) & msw(gene,a) | ?- prob(btype(a),P) P = 0.55

  10. Sample session 2 - MLE and Viterbi inference | ?- D=[btype(a),btype(a),btype(ab),btype(o)],learn(D) Exporting switch information to the EM routine ... done #em-iters: 0(4) (Converged: -4.965121886) Statistics on learning: Graph size: 18 Number of switches: 1 Number of switch instances: 3 Number of iterations: 4 Final log likelihood: -4.965121886 | ?- prob(btype(a),P) P = 0.598211 | ?- viterbif(btype(a)) btype(a) <= gtype(a,a) gtype(a,a) <= msw(gene,a) & msw(gene,a)

  11. Sample session 3- Bayes inference by MCMC | ?- D=[btype(a), btype(a), btype(ab), btype(o)], marg_mcmc_full(D,[burn_in(1000),end(10000),skip(5)],[VFE,ELM]), marg_exact(D,LogM) VFE = -5.54836 ELM = -5.48608 LogM = -5.48578 |?- D=[btype(a), btype(a), btype(ab) ,btype(o)], predict_mcmc_full(D,[btype(a)],[[_,E,_]]), print_graph(E,[lr('<=')]) btype(a) <= gtype(a,a) gtype(a,a) <= msw(gene,a) & msw(gene,a)

  12. Summary • PRISM = Probabilistic Prolog for statistical machine learning • Forward sampling • Exact probability computation • Parameter learning • MLE/MAP, • VT • Bayesian inference • VB • VBVT • MCMC • Viterbi inference • model core (BIC,Cheesman-Stutz,VFE) • smoothing • Current version 2.1

More Related