html5-img
1 / 65

Probabilistic Models of Relational Data

Probabilistic Models of Relational Data. Daphne Koller Stanford University Joint work with:. Ben Taskar. Pieter Abbeel. Lise Getoor. Eran Segal. Nir Friedman. Avi Pfeffer. Ming-Fai Wong. Why Relational?.

finna
Télécharger la présentation

Probabilistic Models of Relational Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Ben Taskar Pieter Abbeel Lise Getoor Eran Segal Nir Friedman Avi Pfeffer Ming-Fai Wong

  2. Why Relational? • The real world is composed of objects that have properties and are related to each other • Natural language is all about objects and how they relate to each other • “George got an A in Geography 101”

  3. Smart_Jane & easy_CS101  GetA_Jane_CS101 Smart_Mike & easy_Geo101  GetA_Mike_Geo101 Smart_Jane & easy_Geo101  GetA_Jane_Geo101 Smart_Rick & easy_CS221  GetA_Rick_C Attribute-Based Worlds Smart students get A’s in easy classes • World = assignment of values to attributes / truth values to propositional symbols

  4. Object-Relational Worlds x,y(Smart(x) & Easy(y) & Take(x,y)  Grade(A,x,y)) • World = relational interpretation: • Objects in the domain • Properties of these objects • Relations (links) between objects

  5. C student Why Probabilities? • All universals are false • Smart students get A’s in easy classes • True universals are rarely useful • Smart students get either A, B, C, D, or F (almost) The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful … Therefore the true logic for this world is the calculus of probabilities … James Clerk Maxwell

  6. course difficulty student intell. grade Probable Worlds • Probabilistic semantics: • A set of possible worlds • Each world associated with a probability hard smart A hard weak A easy smart A easy weak A hard smart B hard weak B easy smart B easy weak B hard smart C hard weak C easy smart C easy weak C

  7. Probabilistic Epistemic state Categorical Representation: Design Axes n-gram models HMMs Prob. CFGs Bayesian nets Markov nets First-order logic Relational databases Propositional logic CSPs Automata Grammars Attributes Objects Sequences World state

  8. Outline • Bayesian Networks • Representation & Semantics • Reasoning • Probabilistic Relational Models • Collective Classification • Undirected discriminative models • Collective Classification Revisited • PRMs for NLP

  9. CPD P(G|D,I) A B C Bayesian Networks Difficulty Intelligence Grade nodes = variables edges = direct influence SAT Letter Graph structure encodes independence assumptions: Letter conditionally independent of Intelligence given Grade

  10. D I G L S BN semantics • Compact & natural representation: • nodes have  k parents  2kn vs. 2n params • parameters natural and easy to elicit conditional independencies in BN structure local probability models full joint distribution over domain + =

  11. Reasoning using BNs Probability theory is nothing but common sense reduced to calculation. Pierre Simon Laplace Difficulty Intelligence Grade SAT Letter SAT Letter Full joint distribution specifies answer to any query: P(variable | evidence about others)

  12. B A C D E F BN Inference • BN Inference is NP-hard • Structure can use graph structure: • Graph separation  conditional independence • Do separate inference in parts • Results combined over interface. • Complexity: exponential in largest separator • Structured BNs allow effective inference • Exact inference in dense BNs is intractable

  13. Approximate BN Inference • Belief propagation is an iterative message passing algorithm for approximate inference in BNs • Each iteration (until “convergence”): • Nodes pass “beliefs” as messages to neighboring nodes • Cons: • Limited theoretical guarantees • Might not converge • Pros: • Linear time per iteration • Works very well in practice, even for dense networks

  14. Outline • Bayesian Networks • Probabilistic Relational Models • Language & Semantics • Web of Influence • Collective Classification • Undirected discriminative models • Collective Classification Revisited • PRMs for NLP

  15. Intell_Jane Diffic_CS101 Intell_George Diffic_CS101 Grade_Jane_CS101 Grade_George_CS101 Intell_George Diffic_Geo101 Grade_George_Geo101 Bayesian Networks: Problem • Bayesian nets use propositional representation • Real world has objects, related to each other Intelligence Difficulty These “instances” are not independent A C Grade

  16. Probabilistic Relational Models • Combine advantages of relational logic & BNs: • Natural domain modeling: objects, properties, relations • Generalization over a variety of situations • Compact, natural probability models • Integrate uncertainty with relational model: • Properties of domain entities can depend on properties of related entities • Uncertainty over relational structure of domain

  17. Teaching-ability Teaching-ability Intelligence Welcome to Geo101 Difficulty Welcome to CS101 Intelligence Difficulty St. Nordaf University Prof. Jones Prof. Smith Teaches Teaches Grade In-course Registered Satisfac George Grade Registered Satisfac In-course Grade Registered Jane Satisfac In-course

  18. Registration Grade Satisfaction Relational Schema • Specifies types of objects in domain, attributes of each type of object & types of relations between objects Classes Student Professor Intelligence Teaching-Ability Teach Take Attributes Relations In Course Difficulty

  19. Professor Teaching-Ability Student Intelligence Course Difficulty A B C Reg Grade Satisfaction Probabilistic Relational Models • Universals: Probabilistic patterns hold for all objects in class • Locality: Represent direct probabilistic dependencies • Links define potential interactions [K. & Pfeffer; Poole; Ngo & Haddawy]

  20. Teaching-ability Teaching-ability Grade Intelligence Satisfac Welcome to Geo101 Grade Difficulty Satisfac Welcome to CS101 Intelligence Grade Difficulty Satisfac PRM Semantics • Instantiated PRM BN • variables: attributes of all objects • dependencies: determined by links & PRM Prof. Jones Prof. Smith George Jane

  21. C Welcome to Geo101 Welcome to A low high CS101 The Web of Influence easy / hard low / high

  22. Outline • Bayesian Networks • Probabilistic Relational Models • Collective Classification & Clustering • Learning models from data • Collective classification of webpages • Undirected discriminative models • Collective Classification Revisited • PRMs for NLP

  23. Reg Course Student Learning PRMs D Relational Database Learner Expert knowledge [Friedman, Getoor, K., Pfeffer]

  24. Learning PRMs • Parameter estimation: • Probabilistic model with shared parameters • Grades for all students share same model • Can use standard techniques for max-likelihood or Bayesian parameter estimation • Structure learning: • Define scoring function over structures • Use combinatorial search to find high-scoring structure

  25. Tom Mitchell Professor Project-of WebKB Project Advisor-of Member Sean Slattery Student Web  KB [Craven et al.]

  26. Web Classification Experiments • WebKB dataset • Four CS department websites • Bag of words on each page • Links between pages • Anchor text for links • Experimental setup • Trained on three universities • Tested on fourth • Repeated for all four combinations

  27. Page Naïve Bayes Category ... WordN Word1 Standard Classification Categories: faculty course project student other Professor department extract information computer science machine learning … 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 words only

  28. Category ... ... LinkWordN WordN Word1 Exploiting Links Page working with Tom Mitchell … 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 words only link words

  29. Page From- Page Category Category ... ... WordN Word1 WordN Word1 Link Collective Classification To- Exists Classify all pages collectively, maximizing the joint label probability 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Approx. inference: belief propagation [Getoor, Segal, Taskar, Koller] words only link words collective

  30. Students Courses A B C P(Registration.Grade | Course.Difficulty, Student.Intelligence) Learning w. Missing Data: EM [Dempster et al. 77] low / high easy / hard

  31. Discovering Hidden Types Internet Movie Database http://www.imdb.com

  32. Actor Director Movie Rating Genres #Votes Year MPAA Rating Discovering Hidden Types Type Type Type [Taskar, Segal, Koller]

  33. Directors Movies Actors Alfred Hitchcock Stanley Kubrick David Lean Milos Forman Terry Gilliam Francis Coppola Wizard of Oz Cinderella Sound of Music The Love Bug Pollyanna The Parent Trap Mary Poppins Swiss Family Robinson Sylvester Stallone Bruce Willis Harrison Ford Steven Seagal Kurt Russell Kevin Costner Jean-Claude Van Damme Arnold Schwarzenegger Terminator 2 Batman Batman Forever GoldenEye Starship Troopers Mission: Impossible Hunt for Red October Steven Spielberg Tim Burton Tony Scott James Cameron John McTiernan Joel Schumacher Anthony Hopkins Robert De Niro Tommy Lee Jones Harvey Keitel Morgan Freeman Gary Oldman … … Discovering Hidden Types

  34. Outline • Bayesian Networks • Probabilistic Relational Models • Collective Classification & Clustering • Undirected Discriminative Models • Markov Networks • Relational Markov Networks • Collective Classification Revisited • PRMs for NLP

  35. Acyclicity constraint limits expressive power: Two objects linked to by a student probably not both professors Allow arbitrary patterns over sets of objects & links Solution: Undirected Models Directed Models: Limitations • Acyclicity forces modeling of all potential links: • Network size O(N2) • Inference is quadratic • Influence flows over existing links, exploiting link graph sparsity • Network size O(N) • Generative training: • Train to fit all of data, not to maximize accuracy • Allow discriminative training: • Max P (labels | observations) [Lafferty, McCallum, Pereira]

  36. Compatibility (A,B,C) ABC Markov Networks Alice Eve Betty Chris Dave Graph structure encodes independence assumptions: Chris conditionally independent of Eve given Alice & Dave

  37. Student Reg Template potential Intelligence Grade Course Study Group Difficulty Reg2 Student2 Grade Intelligence Relational Markov Networks • Universals: Probabilistic patterns hold for all groups of objects • Locality: Represent local probabilistic dependencies • Sets of links give us possible interactions [Taskar, Abbeel, Koller ‘02]

  38. Grade Intelligence Welcome to Difficulty Geo101 Grade Intelligence Grade Difficulty Intelligence Grade RMN Semantics Instantiated RMN  MN • variables: attributes of all objects • dependencies: determined by links & RMN Geo Study Group George Welcome to CS101 CS Study Group Jane Jill

  39. Outline • Bayesian Networks • Probabilistic Relational Models • Collective Classification & Clustering • Undirected Discriminative Models • Collective Classification Revisited • Discriminative training of RMNs • Webpage classification • Link prediction • PRMs for NLP

  40. (Reg1.Grade,Reg2.Grade) Grade Grade Intelligence Intelligence Grade Grade Intelligence Intelligence Grade Grade Intelligence Intelligence Grade Grade Learning RMNs • Parameter estimation is not closed form • Convex problem  unique global maximum Maximize L = log P(Grades,Intelligence|Difficulty) easy / hard ABC low / high Grade Difficulty Intelligence Grade Intelligence Grade Difficulty Intelligence Grade

  41. Page Category ... ... LinkWordN WordN Word1 Flat Models Logistic Regression P(Category|Words)

  42. To- Page From- Page Category Category ... ... WordN WordN Word1 Word1 Link Exploiting Links 42.1% relative reduction in error relative to generative approach

  43. Students Faculty W1 C Wn S S Courses More Complex Structure

  44. Collective Classification: Results 35.4% relative reduction in error relative to strong flat approach

  45. Scalability • WebKB data set size • 1300 entities • 180K attributes • 5800 links • Network size / school: • Directed model • 200,000 variables • 360,000 edges • Undirected model • 40,000 variables • 44,000 edges • Difference in training time decreases substantially when • some training data is unobserved • want to model with hidden variables Training Classification Directed models Undirected models

  46. Member Advisor-of Member Predicting Relationships Tom Mitchell Professor WebKB Project • Even more interesting are the relationships between objects • e.g., verbs are almost always relationships Sean Slattery Student

  47. Flat Model To- Page From- Page ... ... Word1 WordN Word1 WordN Type Rel NONE advisor instructor TA member project-of ... LinkWordN LinkWord1

  48. ... ... ... ... ... ... Flat Model

  49. Collective Classification: Links To- Page From- Page Category Category ... ... Word1 WordN Word1 WordN Type Rel ... LinkWordN LinkWord1

  50. ... ... ... ... ... ... Link Model

More Related