Scaling Textual Inference to the Web

Scaling Textual Inference to the Web Stefan Schoenmackers, Oren Etzioni, and Daniel S. Weld Presented by Kristine Monteith CS 652 - 5/8/09

The Problem • Lots of information on the web, but answers to questions aren’t always stated explicitly • Query: “What vegetables help prevent osteoporosis?” • Not going to find “Kale prevents osteoporosis” • Need to infer this from: • kale is a vegetable • kale contains calcium • calcium helps prevent osteoporosis

Overview • HOLMES Architecture (performs textual inference) • Scaling Inference to the Web • Experimental Results • Related Work

The HOLMES Architecture Information from Knowledge Bases e.g. IsHighIn(kale, calcium), Prevents(calcium, osteoporosis) Inference Rules e.g. Prevents(X,Z) :- IsHighIn(X,Z) ^ Prevents(Y,Z) Queries e.g. query(X) :- IS-A(X,vegetable) ^ Prevents(X,osteoporosis)

Partial proof tree (DAG) for the query “What vegetables help prevent osteoporosis?”

Incremental Expansion • Exact probabilistic inference is NP-complete • To deal with this, HOLMES • Uses approximate methods (loopy belief propagation) • Focused queries help keep probabilistic inference manageable • Creates networks incrementally (searches for additional proof trees and updates the network if there is more time) • Exploits standard Datalog optimization (e.g. only expands proofs of recently added nodes)

Markov Logic Inference Rules 1. Observed relations are likely to be true: • R(X,Y) :- ObservedInCorpus(X, R, Y) 2. Synonym substitution preserves meaning: • RTR(X’,Y) :- RTR(X,Y) ^ Synonym(X, X’) • RTR(X,Y’) :- RTR(X,Y) ^ Synonym(Y, Y’) 3. Generalizations preserve meaning: • RTR(X’,Y) :- RTR(X,Y) ^ IS-A(X, X’) • RTR(X,Y’) :- RTR(X,Y) ^ IS-A(Y, Y’) 4. Transitivity of Part Meronyms: • RTR(X,Y’) :- RTR(X,Y) ^ Part-Of(Y, Y’) where RTR matches ‘* in’ (e.g., ‘born in’).

Scaling Inference to the Web • In order to scale Textual Inference to the web, it has to scale linearly • Assumptions: • Number of ground assertions |A| grows linearly with size of corpus (True for assertions extracted by TextRunner) • Size of every proof tree is bounded by some constant m (Seems to be true in practice, could be enforced by terminating search for proof trees at a certain depth) • Need to show that constructing proof trees takes O(|A|) time

Constructing proof trees in O(|A|) time • Using function free horn clauses means that logical inference can be done in polynomial time • Still not good enough to scale to the Web • Need to ensure two more things: • Number of different types of proofs doesn’t grow too quickly (e.g. Fixed number of rules results in a constant number of first-order search trees) • Number of tuples participating in each relation doesn’t grow too quickly

Approximately Pseudo-Functional

Experimental Results • Uses two knowledge bases: • TextRunner (183 million ground assertions from 117 million web pages) • WordNet (159 thousand manually created IS-A, Part-Of, and Synonym assertions) • Twenty queries in three domains • Geography • Business • Nutrition

Geography Queries • “Who was born in one of the following countries?” • Q(X) :- BornIn(X,{country}) • Possible countries: France, Germany, China, Thailand, Kenya, Morocco, Peru, Columbia, Guatemala • Example: • Ground assertion:BornIn(Alberto Fujimori, Lima) • Background knowledge: LocatedIn(Lima, Peru) • New conclusion: BornIn(Alberto Fujimori, Peru)

Business Queries • Which companies are acquiring software companies? • Q(X) :- Acquired(X, Y)^ Develops(Y, ‘software’) • This query tests HOLMES’s ability to scalably join a large number of assertions from multiple pages. • Which companies are headquartered in the USA? • Q(X) :- HeadquarteredIn(X, ‘USA’) ^ IS-A(X, ‘company’) • Join on HeadquarteredIn and IS-A • Transitive inference: • Seattle is PartOf Washington which is PartOfthe USA • Microsoft IS-A software company which IS-A company

Nutrition Queries • “What foods prevent disease?” • Q(X, {disease}) :- Prevents(X, {disease}) ^ IS-A(X, {food}) • Possible foods: fruit, vegetable, grain • Possible diseases: anemia, scurvy, or osteoporosis.

Effect of Inference on Recall • Baseline: Number of query answers derived from information explicitly stated in the Knowledge Bases (TextRunner and WordNet) • Inference increases the number of query answers by 102% for the Geography domain, and considerable more for the other two domains

Prevalence of APF Relations • Examined 500 binary relations selected randomly from TextRunners assertions • Largest two relations had over 1.25 million unique instances • 52% of the relations had more than 10,000 instances • Found most of the smallest value Kmin such that the relation was APF with degree Kmin • 80% of relations were APF with degree less than 496

Related Work • Van Durme and Schubert (2008) • Use highly expressive representations (e.g. negation, temporal information) • HOLMES is less expressive but more scalable • Open-domain Question-Answering Systems • Attempt to find individual documents or sentences containing the answer • HOLMES can infer from multiple texts, but is not well suited to answering more abstract or open-ended questions • Statistical Relational Learning • Techniques for combining logical and probabilistic inference • HOLMES uses more restrictive inference rules, but again is more scalable

Conclusions 1. We introduce and evaluate the HOLMES system, which leverages KBMC methods in order to scale a class of TI methods to the Web. 2. We define the notion of Approximately Pseudo-Functional (APF) relations and prove that, for a APF relations, HOLMES’s inference time increases linearly with the size of the input corpus. We show empirically that APF relations appear to be prevalent in our Web corpus and that HOLMES’s runtime does scale linearly with the size of its input taking only a few CPU minutes when run over 183 million distinct ground assertions. 3. We present experiments demonstrating that, for a set of queries in the domains of geography, business, and nutrition, HOLMES substantially improves the quality of answers (measured by AuC) relative to a “no inference” baseline.

Questions???

Scaling Textual Inference to the Web

Scaling Textual Inference to the Web

Presentation Transcript

The textual metafunction

Knowledge Representation and Inference Models for Textual Entailment

Two Related Approaches to the Problem of Textual Inference

Natural Logic for Textual Inference

Web Performance Optimization: scaling the witchcrafts

Scaling for Web

Probabilistic Lexical Models for Textual Inference

INFERENCE + TEXTUAL EVIDENCE = WELL SUPPORTED ANSWER .

Robust Local Textual Inference

Scaling Up Graphical Model Inference

Web Based Probabilistic Textual Entailment

Scaling Databases for the Web

Textual entailment inference in machine translation

Scaling Web Services

Robust Textual Inference via Graph Matching

Benchmarking Textual Annotation Tools for the Semantic Web

Scaling Personalized Web Search

Inference Web: Enabling Accountable Web Applications

Natural Logic for Textual Inference

Textual entailment inference in machine translation

Incorporating Discourse Information within Textual Entailment Inference