Probabilistic Models of Relational Data

Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Ben Taskar Pieter Abbeel Lise Getoor Eran Segal Nir Friedman Avi Pfeffer Ming-Fai Wong

Why Relational? • The real world is composed of objects that have properties and are related to each other • Natural language is all about objects and how they relate to each other • “George got an A in Geography 101”

Smart_Jane & easy_CS101  GetA_Jane_CS101 Smart_Mike & easy_Geo101  GetA_Mike_Geo101 Smart_Jane & easy_Geo101  GetA_Jane_Geo101 Smart_Rick & easy_CS221  GetA_Rick_C Attribute-Based Worlds Smart students get A’s in easy classes • World = assignment of values to attributes / truth values to propositional symbols

Object-Relational Worlds x,y(Smart(x) & Easy(y) & Take(x,y)  Grade(A,x,y)) • World = relational interpretation: • Objects in the domain • Properties of these objects • Relations (links) between objects

C student Why Probabilities? • All universals are false • Smart students get A’s in easy classes • True universals are rarely useful • Smart students get either A, B, C, D, or F (almost) The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful … Therefore the true logic for this world is the calculus of probabilities … James Clerk Maxwell

course difficulty student intell. grade Probable Worlds • Probabilistic semantics: • A set of possible worlds • Each world associated with a probability hard smart A hard weak A easy smart A easy weak A hard smart B hard weak B easy smart B easy weak B hard smart C hard weak C easy smart C easy weak C

Probabilistic Epistemic state Categorical Representation: Design Axes n-gram models HMMs Prob. CFGs Bayesian nets Markov nets First-order logic Relational databases Propositional logic CSPs Automata Grammars Attributes Objects Sequences World state

Outline • Bayesian Networks • Representation & Semantics • Reasoning • Probabilistic Relational Models • Collective Classification • Undirected discriminative models • Collective Classification Revisited • PRMs for NLP

CPD P(G|D,I) A B C Bayesian Networks Difficulty Intelligence Grade nodes = variables edges = direct influence SAT Letter Graph structure encodes independence assumptions: Letter conditionally independent of Intelligence given Grade

D I G L S BN semantics • Compact & natural representation: • nodes have  k parents  2kn vs. 2n params • parameters natural and easy to elicit conditional independencies in BN structure local probability models full joint distribution over domain + =

Reasoning using BNs Probability theory is nothing but common sense reduced to calculation. Pierre Simon Laplace Difficulty Intelligence Grade SAT Letter SAT Letter Full joint distribution specifies answer to any query: P(variable | evidence about others)

B A C D E F BN Inference • BN Inference is NP-hard • Structure can use graph structure: • Graph separation  conditional independence • Do separate inference in parts • Results combined over interface. • Complexity: exponential in largest separator • Structured BNs allow effective inference • Exact inference in dense BNs is intractable

Approximate BN Inference • Belief propagation is an iterative message passing algorithm for approximate inference in BNs • Each iteration (until “convergence”): • Nodes pass “beliefs” as messages to neighboring nodes • Cons: • Limited theoretical guarantees • Might not converge • Pros: • Linear time per iteration • Works very well in practice, even for dense networks

Outline • Bayesian Networks • Probabilistic Relational Models • Language & Semantics • Web of Influence • Collective Classification • Undirected discriminative models • Collective Classification Revisited • PRMs for NLP

Intell_Jane Diffic_CS101 Intell_George Diffic_CS101 Grade_Jane_CS101 Grade_George_CS101 Intell_George Diffic_Geo101 Grade_George_Geo101 Bayesian Networks: Problem • Bayesian nets use propositional representation • Real world has objects, related to each other Intelligence Difficulty These “instances” are not independent A C Grade

Probabilistic Relational Models • Combine advantages of relational logic & BNs: • Natural domain modeling: objects, properties, relations • Generalization over a variety of situations • Compact, natural probability models • Integrate uncertainty with relational model: • Properties of domain entities can depend on properties of related entities • Uncertainty over relational structure of domain

Teaching-ability Teaching-ability Intelligence Welcome to Geo101 Difficulty Welcome to CS101 Intelligence Difficulty St. Nordaf University Prof. Jones Prof. Smith Teaches Teaches Grade In-course Registered Satisfac George Grade Registered Satisfac In-course Grade Registered Jane Satisfac In-course

Registration Grade Satisfaction Relational Schema • Specifies types of objects in domain, attributes of each type of object & types of relations between objects Classes Student Professor Intelligence Teaching-Ability Teach Take Attributes Relations In Course Difficulty

Professor Teaching-Ability Student Intelligence Course Difficulty A B C Reg Grade Satisfaction Probabilistic Relational Models • Universals: Probabilistic patterns hold for all objects in class • Locality: Represent direct probabilistic dependencies • Links define potential interactions [K. & Pfeffer; Poole; Ngo & Haddawy]

Teaching-ability Teaching-ability Grade Intelligence Satisfac Welcome to Geo101 Grade Difficulty Satisfac Welcome to CS101 Intelligence Grade Difficulty Satisfac PRM Semantics • Instantiated PRM BN • variables: attributes of all objects • dependencies: determined by links & PRM Prof. Jones Prof. Smith George Jane

C Welcome to Geo101 Welcome to A low high CS101 The Web of Influence easy / hard low / high

Outline • Bayesian Networks • Probabilistic Relational Models • Collective Classification & Clustering • Learning models from data • Collective classification of webpages • Undirected discriminative models • Collective Classification Revisited • PRMs for NLP

Reg Course Student Learning PRMs D Relational Database Learner Expert knowledge [Friedman, Getoor, K., Pfeffer]

Learning PRMs • Parameter estimation: • Probabilistic model with shared parameters • Grades for all students share same model • Can use standard techniques for max-likelihood or Bayesian parameter estimation • Structure learning: • Define scoring function over structures • Use combinatorial search to find high-scoring structure

Tom Mitchell Professor Project-of WebKB Project Advisor-of Member Sean Slattery Student Web  KB [Craven et al.]

Web Classification Experiments • WebKB dataset • Four CS department websites • Bag of words on each page • Links between pages • Anchor text for links • Experimental setup • Trained on three universities • Tested on fourth • Repeated for all four combinations

Page Naïve Bayes Category ... WordN Word1 Standard Classification Categories: faculty course project student other Professor department extract information computer science machine learning … 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 words only

Category ... ... LinkWordN WordN Word1 Exploiting Links Page working with Tom Mitchell … 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 words only link words

Page From- Page Category Category ... ... WordN Word1 WordN Word1 Link Collective Classification To- Exists Classify all pages collectively, maximizing the joint label probability 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Approx. inference: belief propagation [Getoor, Segal, Taskar, Koller] words only link words collective

Students Courses A B C P(Registration.Grade | Course.Difficulty, Student.Intelligence) Learning w. Missing Data: EM [Dempster et al. 77] low / high easy / hard

Discovering Hidden Types Internet Movie Database http://www.imdb.com

Actor Director Movie Rating Genres #Votes Year MPAA Rating Discovering Hidden Types Type Type Type [Taskar, Segal, Koller]

Directors Movies Actors Alfred Hitchcock Stanley Kubrick David Lean Milos Forman Terry Gilliam Francis Coppola Wizard of Oz Cinderella Sound of Music The Love Bug Pollyanna The Parent Trap Mary Poppins Swiss Family Robinson Sylvester Stallone Bruce Willis Harrison Ford Steven Seagal Kurt Russell Kevin Costner Jean-Claude Van Damme Arnold Schwarzenegger Terminator 2 Batman Batman Forever GoldenEye Starship Troopers Mission: Impossible Hunt for Red October Steven Spielberg Tim Burton Tony Scott James Cameron John McTiernan Joel Schumacher Anthony Hopkins Robert De Niro Tommy Lee Jones Harvey Keitel Morgan Freeman Gary Oldman … … Discovering Hidden Types

Outline • Bayesian Networks • Probabilistic Relational Models • Collective Classification & Clustering • Undirected Discriminative Models • Markov Networks • Relational Markov Networks • Collective Classification Revisited • PRMs for NLP

Acyclicity constraint limits expressive power: Two objects linked to by a student probably not both professors Allow arbitrary patterns over sets of objects & links Solution: Undirected Models Directed Models: Limitations • Acyclicity forces modeling of all potential links: • Network size O(N2) • Inference is quadratic • Influence flows over existing links, exploiting link graph sparsity • Network size O(N) • Generative training: • Train to fit all of data, not to maximize accuracy • Allow discriminative training: • Max P (labels | observations) [Lafferty, McCallum, Pereira]

Compatibility (A,B,C) ABC Markov Networks Alice Eve Betty Chris Dave Graph structure encodes independence assumptions: Chris conditionally independent of Eve given Alice & Dave

Student Reg Template potential Intelligence Grade Course Study Group Difficulty Reg2 Student2 Grade Intelligence Relational Markov Networks • Universals: Probabilistic patterns hold for all groups of objects • Locality: Represent local probabilistic dependencies • Sets of links give us possible interactions [Taskar, Abbeel, Koller ‘02]

Grade Intelligence Welcome to Difficulty Geo101 Grade Intelligence Grade Difficulty Intelligence Grade RMN Semantics Instantiated RMN  MN • variables: attributes of all objects • dependencies: determined by links & RMN Geo Study Group George Welcome to CS101 CS Study Group Jane Jill

Outline • Bayesian Networks • Probabilistic Relational Models • Collective Classification & Clustering • Undirected Discriminative Models • Collective Classification Revisited • Discriminative training of RMNs • Webpage classification • Link prediction • PRMs for NLP

(Reg1.Grade,Reg2.Grade) Grade Grade Intelligence Intelligence Grade Grade Intelligence Intelligence Grade Grade Intelligence Intelligence Grade Grade Learning RMNs • Parameter estimation is not closed form • Convex problem  unique global maximum Maximize L = log P(Grades,Intelligence|Difficulty) easy / hard ABC low / high Grade Difficulty Intelligence Grade Intelligence Grade Difficulty Intelligence Grade

Page Category ... ... LinkWordN WordN Word1 Flat Models Logistic Regression P(Category|Words)

To- Page From- Page Category Category ... ... WordN WordN Word1 Word1 Link Exploiting Links 42.1% relative reduction in error relative to generative approach

Students Faculty W1 C Wn S S Courses More Complex Structure

Collective Classification: Results 35.4% relative reduction in error relative to strong flat approach

Scalability • WebKB data set size • 1300 entities • 180K attributes • 5800 links • Network size / school: • Directed model • 200,000 variables • 360,000 edges • Undirected model • 40,000 variables • 44,000 edges • Difference in training time decreases substantially when • some training data is unobserved • want to model with hidden variables Training Classification Directed models Undirected models

Member Advisor-of Member Predicting Relationships Tom Mitchell Professor WebKB Project • Even more interesting are the relationships between objects • e.g., verbs are almost always relationships Sean Slattery Student

Flat Model To- Page From- Page ... ... Word1 WordN Word1 WordN Type Rel NONE advisor instructor TA member project-of ... LinkWordN LinkWord1

... ... ... ... ... ... Flat Model

Collective Classification: Links To- Page From- Page Category Category ... ... Word1 WordN Word1 WordN Type Rel ... LinkWordN LinkWord1

... ... ... ... ... ... Link Model

Probabilistic Models of Relational Data

Probabilistic Models of Relational Data

Presentation Transcript

Learning Probabilistic Relational Models

Probabilistic models

Probabilistic Models

Relational Data Models

A Quick Romp Through Probabilistic Relational Models

Probabilistic Models

Identification of overlapping biclusters using Probabilistic Relational Models

Probabilistic Models for Relational Data

Hierarchical Probabilistic Relational Models for Collaborative Filtering

Identification of overlapping biclusters using Probabilistic Relational Models

Identifying co-regulation using Probabilistic Relational Models

Probabilistic Models of Object-Relational Domains

Learning Probabilistic Relational Models

Probabilistic Models

Data Models and Relational Databases

Probabilistic Models

Practical Probabilistic Relational Learning

A Quick Romp Through Probabilistic Relational Models

Discriminative Probabilistic Models for Relational Data

Probabilistic Models

Relational Models

Probabilistic Models