Probabilistic Inference in PRISM

Probabilistic Inference in PRISM Taisuke Sato Tokyo Institute of Technology

Problem • Statistical machine learning is a labor-intensive process: • {modeling learning  evaluation}* of trial-and-error • Pains of deriving and implementing model-specific learning algorithms and model-specific probabilistic inference model-specific learning algorithms ... Model 1 Model 2 Model n EM2 EM1 EMn EM VB MCMC ...

Our solution • Develop a high-level modeling language that offers universal learning and inference methods applicable to every model ... Model 1 Model 2 Model n modeling language EM VB MCMC ... • The user concentrates on modeling and the rest (learning and inference) is taken care of by the system

PRISM(http://sato-www.cs.titech.ac.jp/prism/) Probabilistic models • Logic-based high-level modeling language Bayesian network New model HMM PCFG ... PRISM system EM/MAP VT VBVT VB MCMC Learning methods • Its generic inference/learning methods subsume standard algorithms such as FB for HMMs and BP for Bayesian networks

Basic ideas • Semantics • program = Turing machine + probabilistic choice • + Dirichlet prior • denotation = a probability measure over possible worlds • Propositionalized probability computation (PPC) • programs written at predicate logic level • probability computation at propositional logic level • Dynamic programming for PPC • proof search generates a directed graph (explanationgraph) • Probabilities are computed from bottom to top in the graph • Discriminative use • generatively define a model by a PRISM program and descriminatively use it for better prediction performance

b o ABO blood type program msw(abo,a) is true with prob. 0.5 values(abo,[a,b,o],[0.5,0.2,0.3]). btype(X):- gtype(Gf,Gm), pg_table(X,[Gf,Gm]). pg_table(X,GT):- ((X=a;X=b),(GT=[X,o];GT=[o,X];GT=[X,X]) ; X=o,GT=[o,o] ; X=ab,(GT=[a,b];GT=[b,a])). gtype(Gf,Gm):- msw(abo,Gf),msw(abo,Gm). father mother a b a o AB A child B probabilistic primitivessimulate gene inheritance from father (left) and mother (right)

Propositionalized probability computation Explanation graph for btype(a) that explains how btype(a) is proved by probabilistic choice made by msw-atoms btype(a) <=> gtype(a,a) v gtype(a,o) v gtype(o,a) gtype(a,a) <=> msw(abo,a) & msw(abo,a) gtype(a,o) <=> msw(abo,a) & msw(abo,o) gtype(o,a) <=> msw(abo,o) & msw(abo,a) 0.55 Sum-product computation of probabilities in a bottom-up manner using probabilities assigned to mswatoms 0.15 0.15 0.25 0.25 0.5 0.5 0.15 0.5 0.3 PPC+DP subsumes forward-backward, belief propagation, inside-outside computation Expl. graph is acyclic and dynamic programming (DP) is possible 0.15 0.3 0.5

Learning • A program defines a joint distributionP(x,y|q) where x hidden and y observed • P(msw(abo,a),..btype(a),… |qa,qb,qo) where qa+qb+qo=1 • Learning q from observed data y by maximizing • P(y|q)  MLE/MAP • P(x*,y|q) where x* = argmax_x P(x,y|q)  VT • From a Bayesian point of view, a program defines marginal likelihood ∫P(x,y|q,a) dq • We wish to compute • predictive distribution = ∫P(x|y,q,a) dq • marginal likelihood P(y|a) = Sx∫P(x,y|q,a) dq • Both need approximation • Variational Bayes (VB)  VB, VB-VT • MCMC  Metropolis-Hastings

Sample session 1- Expl. graph and prob. computation built-in predicate | ?- prism(blood) loading::blood.psm.out | ?- show_sw Switch gene: unfixed_p: a (p: 0.500000000) b (p: 0.200000000) o (p: 0.300000000) | ?- probf(btype(a)) btype(a) <=> gtype(a,a) v gtype(a,o) v gtype(o,a) gtype(a,a) <=> msw(gene,a) & msw(gene,a) gtype(a,o) <=> msw(gene,a) & msw(gene,o) gtype(o,a) <=> msw(gene,o) & msw(gene,a) | ?- prob(btype(a),P) P = 0.55

Sample session 2 - MLE and Viterbi inference | ?- D=[btype(a),btype(a),btype(ab),btype(o)],learn(D) Exporting switch information to the EM routine ... done #em-iters: 0(4) (Converged: -4.965121886) Statistics on learning: Graph size: 18 Number of switches: 1 Number of switch instances: 3 Number of iterations: 4 Final log likelihood: -4.965121886 | ?- prob(btype(a),P) P = 0.598211 | ?- viterbif(btype(a)) btype(a) <= gtype(a,a) gtype(a,a) <= msw(gene,a) & msw(gene,a)

Sample session 3- Bayes inference by MCMC | ?- D=[btype(a), btype(a), btype(ab), btype(o)], marg_mcmc_full(D,[burn_in(1000),end(10000),skip(5)],[VFE,ELM]), marg_exact(D,LogM) VFE = -5.54836 ELM = -5.48608 LogM = -5.48578 |?- D=[btype(a), btype(a), btype(ab) ,btype(o)], predict_mcmc_full(D,[btype(a)],[[_,E,_]]), print_graph(E,[lr('<=')]) btype(a) <= gtype(a,a) gtype(a,a) <= msw(gene,a) & msw(gene,a)

Summary • PRISM = Probabilistic Prolog for statistical machine learning • Forward sampling • Exact probability computation • Parameter learning • MLE/MAP, • VT • Bayesian inference • VB • VBVT • MCMC • Viterbi inference • model core (BIC,Cheesman-Stutz,VFE) • smoothing • Current version 2.1

Probabilistic Inference in PRISM

Probabilistic Inference in PRISM

Presentation Transcript

Exact and approximate inference in probabilistic graphical models

Representation, Inference and Learning in Relational Probabilistic Languages

Exact and approximate inference in probabilistic graphical models

Probabilistic Inference Lecture 3

Probabilistic Lexical Models for Textual Inference

Probabilistic Inference Lecture 2

Probabilistic Inference Lecture 5

Probabilistic inference

Probabilistic Inference Lecture 7

Probabilistic Inference Lecture 1

Probabilistic Inference in Multi-Agent Systems

Probabilistic Inference

Structured Probabilistic Inference in an Embodied Construction Grammar

Lifted First-Order Probabilistic Inference

On Distributing Probabilistic Inference

Probabilistic Inference in Distributed Systems

Probabilistic Inference

Probabilistic Inference: Conscious and Unconscious

Probabilistic Inference

First-Order Probabilistic Inference

Probabilistic Inference: Conscious and Unconscious