html5-img
1 / 53

The Center for Language and Speech Processing @ JHU, Baltimore, USA Sep. 7, 2010

Youssef El Massaoudi. Fabian Hadiji. Scott Sanner. Sriraam Natarajan. Babak Ahmadi. Lifted Message Passing *. E. The Center for Language and Speech Processing @ JHU, Baltimore, USA Sep. 7, 2010. A. (*) Many thanks to many people for making their slides publically available.

eunice
Télécharger la présentation

The Center for Language and Speech Processing @ JHU, Baltimore, USA Sep. 7, 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Youssef El Massaoudi Fabian Hadiji Scott Sanner Sriraam Natarajan Babak Ahmadi Lifted Message Passing* E The Center for Language and Speech Processing @ JHU, Baltimore, USA Sep. 7, 2010 A (*) Manythankstomanypeopleformakingtheirslidespublicallyavailable

  2. Rorschach Test

  3. Etzioni’s Rorschach Test for Computer Scientists

  4. Moore’s Law?

  5. Storage Capacity?

  6. Number of Facebook Users?

  7. Number of Scientific Publications?

  8. Number of Web Pages?

  9. Number of Actions?

  10. Computing 2020: Science in an Exponential World “The amount of scientific data is doubling every year” [Szalay,Gray; Nature 440, 413-414 (23 March 2006) ] How to deal with millions of images ? How to deal with millions of inter-related research papers ? How to accumulate general knowledge automatically from the Web ? How to deal with billions of shared users’ perceptions stored at massive scale ? How do realize the vision of social search?

  11. Artificial Intelligence in an Exponential World [Fergus et al. 30(11) 2008; Halevy et al., IEEE Intelligent Systems, 24 2009] Most efforthas gone into the modeling part How much can the data itself help us to solve a problem? Machine Learning = Data+ Model AI = Structured (Data + Model +Reasoning) • Real worldisstructured in termsofobjects and relations • Relational knowledge can reveal additional correlations between variables of interest . Abstraction allows one to compactly model general knowledge and to move to complex inference

  12. http://www.cs.washington.edu/research/textrunner/ [Etzioni et al. ACL08] Object Uncertainty Relation Object “Programs will consume, combine, and correlate everything in the universe of structured information and help users reason over it.” [S. Parastatidis et al., Communications of the ACM Vol. 52(12):33-37 ]

  13. So, the Real World isComplexandUncertain • Information overload • Incompleteinformation • Contradictoryinformation • Many sources and modalities • Rapid change How can computer systems handle these ?

  14. (First-order) LogichandlesComplexity E.g., rules of chess (which is a tiny problem): 1 page in first-order logic, ~100000 pages in propositional logic, ~100000000000000000000000000000000000000 pages as atomic-state model daugther-of(cecily,john) daugther-of(lily,tom) … Manytypesofentities Relations betweenthem Arbitraryknowledge Explicit enumeration Logic true/false 19th C 5th C B.C. atomic propositional first-order/relational

  15. ProbabilityhandlesUncertainty 17th C 20th C Probability Sensor noise Human error Inconsistencies Unpredictability Manytypesofentities Relations betweenthem Arbitraryknowledge Explicit enumeration Logic true/false 19th C 5th C B.C. atomic propositional first-order/relational

  16. Will Traditional AI Scale ? “Scaling up the environment will inevitablyovertax the resources of the current AIarchitecture.” 17th C 20th C Probability Sensor noise Human error Inconsistencies Unpredictability Manytypesofentities Relations betweenthem Arbitraryknowledge Explicit enumeration Logic true/false 19th C 5th C B.C. atomic propositional first-order/relational

  17. (*)First StarAIworkshopat AAAI10;co-chaired with S. Russell, L. Kaelbling, A.Halevy, S. Natarajan, and L. Milhalkova Statistical Relational AI (StarAI*) Natural domain modeling: objects, properties, relations Compact, natural models Properties of entities can depend on properties of related entities Generalization over a variety of situations Let‘sdealwithuncertainty, objects, andrelationsjointly … SAT Planning Probability Statistics Learning Logic Graphs Trees Search CV Robotics … unifies logical and statistical AI, … solid formal foundations, … is of interest to many communities.

  18. StarAI / SRL Key Dimensions • Logical languageFirst-order logic, Horn clauses, frame systems • Probabilistic languageBayesian networks, Markov networks, PCFGs • Type of learning • Generative / Discriminative • Structure / Parameters • Knowledge-rich / Knowledge-poor • Type of inference • MAP / Marginal • Full grounding / Partial grounding / Lifted

  19. [Richardson, Domingos MLJ 62(1-2): 107-136, 2006] Markov Logic Networks Suppose we have constants: alice, bob and p1 co_author(bob,alice) co_author(alice,bob) co_author(alice,alice) co_author(bob,bob) author(p1,alice) smart(alice) author(p1,bob) smart(bob) high_quality(p1) accepted(p1)

  20. Find setofgeneralrules o mutagenic(X) :- atom(X,A,c),charge(X,A,0.82) mutagenic(X) :- atom(X,A,n),... n c c c c c c ILP= Machine Learning +Logic Programming[Muggleton, De Raedt JLP96] ExamplesE pos(mutagenic(m1)) neg(mutagenic(m2)) pos(mutagenic(m3)) ... Background KnowledgeB molecule(m1) atom(m1,a11,c) atom(m1,a12,n) bond(m1,a11,a12) charge(m1,a11,0.82) ... molecule(m2) atom(m2,a21,o) atom(m2,a22,n) bond(m2,a21,a22) charge(m2,a21,0.82) ...

  21. :- atom(X,A,c) :- atom(X,A,c),bond(A,B) Coverage = 0.5,0.7 Coverage = 0.8 :- atom(X,A,n) :- atom(X,A,n),charge(A,0.82) Coverage = 0.6,0.3 Coverage = 0.6 :- atom(X,A,f) 0 Coverage = 0.4,0.6  1 1   … Example ILP Algorithm: FOIL [Quinlan MLJ 5:239-266, 1990] mutagenic(X) :-atom(X,A,n),charge(A,0.82) mutagenic(X) :-atom(X,A,c),bond(A,B) … :- true Someobjectivefunction, e.g. percentageofcovered positive examples

  22. 0  1 1   … Probabilistic ILP aka SRL [De Raedt, K ALT04] =0.882 mutagenic(X) :-atom(X,A,n),charge(A,0.82) • Traverses the hypotheses space a la ILP • Replaces ILP’s 0-1 covers relation by a “smooth”, probabilistic one [0,1] mutagenic(X) :-atom(X,A,c),bond(A,B) …

  23. Today wecan … • … learn probabilistic relational models automatically from millions of inter-related objects • … generate optimal plans and learn to act optimally in uncertain environments involving millions of objects and relations among them • … perform lifted probabilistic inference avoiding explicit state enumeration by manipulating first-order state representations directly • … exploit shared factors to speed up message-passing algorithms for relational inference but also for classical propositional inference such as solving SAT problems

  24. Distributions can naturally be represented as factor graphs • Eachcircledenotes a (random) variable, each box denotes a factor (potential) • Thereis an edgebetween a circle and a box ifthe variable is in thedomain/scopeofthefactor unnormalized !

  25. Factor Graphs from Graphical Models

  26. Variable Elimination popular Sum out non-query variables one by one Example: Inviting n people to a workshop … attends(p2) attends(p1) attends(pn) start series Time is linear in number of invitees n  1(pop, att(p1)) 2(att(p1), ser) att(p1) (pop, ser) Does notexploit symmetries encoded in the structure of the model

  27. Parameterized Factors (Pacfactors) [Pfeffer et al. 1999; Poole 2003; de Salvo Braz et al. 2005] Parfactors popular X. 1(popular, attends(X)) … attends(p1) attends(p2) attends(pn) X. 2(attends(X), series) start series So, let’s exploit symmetries revealed by relational model. This is called lifted inference!

  28. First-Order Variable Elimination [Poole 2003; de Salvo Braz et al. 2005] popular Sum out all attends(X) variables at once X. 1(popular, attends(X)) … attends(p2) attends(p1) attends(pn) X. 2(attends(X), series) start series X. (popular, series) Time is constant in n (popular, series)n

  29. Symmetry Within Factors [Milch,Zettlemoyer, Haims,K, Kaelbling AAAI08] … • Values of counting formula are histograms counting how many objects X yield each possible value of attends(X) • Only n+1 histograms, e.g., [50, 0], [49, 1], …, [0, 50] • Factor size now 2  (n+1): linear in n attends(p1) attends(p2) attends(pn) (overflow, #X[attends(X)]) Size of naïve factor representation: 2  2n overflow counting formula

  30. Example: Competing Workshops W X. (hot(W), att(X)) … hot(w1) hot(w2) hot(wm) … #W[hot(W)] hot(w1) hot(w2) hot(wm) … att(p1) … att(p2) att(pn) att(p2) att(p1) att(pn) start series start series Can’t sum out attends(X) without joining all the hot(W) variables Create counting formula on hot(W), then sum out attends(X) at lifted level Conversion to counting formulas creates new opportunities for lifted elimination 30

  31. Results: Competing Workshops Whataboutapproximateinferenceapproaches? These exact inference approaches are rather complex so far do not easily scale to realistic domains, and hence have only been applied to rather small artificial problems 1

  32. How do youspendyour spare time? YouTube like media portals have changed the way users access media content in the Internet Every day, millions of people visit social media sites such as Flickr, YouTube, and Jumpcut, among others, to share their photos and videos, … while others enjoy themselves by searching, watching, commenting, and rating the photos and videos; what your friends like will bear great significance for you.

  33. How do youefficientlybroadcastinformation?

  34. Content Distribution usingStochasticPolicies[Bickson et al. WDAS04] Large (distributed) networks, so Bickson et al. proposetouse (loopy) belief propagation

  35. The Sum-ProductAlgorithmaka Belief Propagation • Iterative process in which neighboring variables “talk” to each other, passing messages such as: “I (variable x3) think that you (variable x2) belong in these states with various likelihoods…”

  36. Loopy Belief Propagation • After enough iterations, this series of conversations is likely to converge to a consensus that determines the marginal probabilities of all the variables. • Sum-Product/BP • (1) update messagesuntilconvergence • (2) computesinglenodemarginals • Variantsexistforsolving • SAT problems, • systemsof linear equations, • matchingproblemsand • forabitrarydistributions (based on sampling) A lotofsharedfactors, so uselifted belief propagation

  37. http://www-kd.iai.uni-bonn.de/index.php?page=software_details&id=16http://www-kd.iai.uni-bonn.de/index.php?page=software_details&id=16 Lifted Belief Propagation [Singla, Domingos AAAI08, K, Ahmadi, Natarajan UAI09] Countingsharedfactorscanresult in greatefficiencygainsfor (loopy) belief propagation identical Sharedfactorsappearmoreoftenthanyouthink in relevant real worldproblems

  38. Social Network Analysis

  39. Social Network Analysis

  40. Social Network Analysis

  41. http://www-kd.iai.uni-bonn.de/index.php?page=software_details&id=16http://www-kd.iai.uni-bonn.de/index.php?page=software_details&id=16 Step 1: Compression

  42. http://www-kd.iai.uni-bonn.de/index.php?page=software_details&id=16http://www-kd.iai.uni-bonn.de/index.php?page=software_details&id=16 Step 2: Modified Belief Propagation

  43. LiftedFactoredFrontier 20 peopleover 10 time steps. Max numberoffriends 5. Cancerneverobserved. Time steprandomlyselected.

  44. LowerBound on Model Counts of CNF • BPCount [Kroc et al 08] • BP usedtoestimatemarginals • Provablebound

  45. Model Counting SatisfiedbyLifted Message Passing?

  46. LiftedSatisfiability[Hadiji, K, Ahmadi StarAI10] • Warning and surveypropagationcan also belifted • Enablesliftedtreatmentofboth prob. and det. knowledge

  47. [O. Shental, et al ISIT-08] [Bayati et al. Trans. on Information Theory 54(3): 1241-1251, 2008, Bayati et al. ICDM09] [Ongoingwork] Lifted Linear Equations LiftedMatching NetAlignBP, Min-Sum, ... can also belifted • Gaussian Belief Propagation can also belifted • Enablesliftedpage rank, HITS, Kalman filters, … Image due to David Gleich, Stanford

  48. Content Distribution (Gnutella): Lifted BP vs. BP

  49. Message Errors totheRescue! Ihler et al. 05: BP message errors decay along paths LBP mayspuriouslyassumesome nodes send and receive different messages and, hence, produce pessimistic lifted network Make use of decaying message errors already at lifting time

  50. InformedLifted Belief Propagation [El Massaoudi, K, Ahmadi, Hadiji AAAI10] Iterate Refine Lifting ModifiedBP

More Related