1 / 26

Probabilistic RDF

Probabilistic RDF. Octavian Udrea 1 V.S. Subrahmanian 1 Zoran Majkić 2 1 University of Maryland College Park 2 University “La Sapienza”, Rome, Italy. Motivation. Not all information on the Web is easily expressible in “classic” models (i.e., relational) RDF extraction from text

nicola
Télécharger la présentation

Probabilistic RDF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic RDF Octavian Udrea1 V.S. Subrahmanian1 Zoran Majkić2 1University of Maryland College Park 2University “La Sapienza”, Rome, Italy

  2. Motivation • Not all information on the Web is easily expressible in “classic” models (i.e., relational) • RDF extraction from text • STORY is the first, very successful prototype • Need to extend RDF with temporal, uncertainty components • Goal: build a logical model of RDF with uncertainty and provide query algorithms

  3. The Probabilistic RDF idea • An RDF theory is a set of triples (subject, property, value) • (USA hasCapital Washington DC), • (Washington DC hasPopulation 500,000) • Probabilistic RDF extends this model with uncertainty over the set of values. • (USA hasCapital {(Washington DC, 0.95), (State of Washington, 0.05)})

  4. Probabilistic RDF example Extracted based on www.wrongdiagnosis .com

  5. Probabilistic RDF example

  6. Probabilistic RDF example

  7. Probabilistic RDF example

  8. Probabilistic RDF syntax • Schema uncertainty: • (c subClassOf (C,δ)) • ΣdЄCδ(d) <= 1 • Class-instance uncertainty: • (x rdf:type (C,δ)) • ΣdЄCδ(d) <= 1 • Instance-based uncertainty: • (x p (Y, δ)) • ΣyЄYδ(y) <= 1

  9. Probabilistic RDF syntax • Sanity requirements • (c subClassOf (C1,δ1)), ((c subClassOf (C2,δ2)) => (C1 = C2 and δ1 = δ2) or C1 ∩ C2 = Ø • Same applies for other types of uncertainty • Transitive properties • Simple inferential capability • Examples: associatedWith, controlledBy • P-path: • A set of triples connected by transitive properties

  10. Example p-path

  11. P-path semantics and t-norms • We cannot generally assume independence between triples on a transitive path • Flu, AcuteBronchitis, Pneumonia • T-norms are used to express the user’s knowledge of the relationship between triples •  is associative, commutative • 0  x = 0, 1  x = x • x <= y, z <= w => x  z <= y  w • P-Path probability: t-norm applied to individual probabilities on the path

  12. Example p-path (Flu, associatedWith, (Pneumonia, 0.455)) w.r.t. the product t-norm

  13. pRDF semantics • A world W is a set of simple triples (with no probabilities) • An interpretation I associates a probability to each world • I satisfies a pRDF theory: • For each (s, p, (V,δ)), δ(v) <= Σ I(W), where W contains (s,p,v) • Same applies to paths w.r.t. to a given t-norm

  14. pRDF semantics • A theory is consistent iff it has a satisfying interpretation • Every pRDF theory is consistent • Entailment: T entails T’ iff every satisfying interpretation of T satisfies T’ • Closure of a theory: The entire set of triples entailed by the theory • Maximal w.r.t. the probability values

  15. pRDF fixpoint semantics • The closure operator Δ adds exactly one entailed triple at each step (Flu associatedWith, (Acute Bronchitis, .7)) and (Acute Bronchitis associatedWith (Pneumonia, .65)) yields: (Flu associatedWith, (Pneumonia, 0.455)) w.r.t. the product t-norm • Δ has a fixpoint which is the theory closure.

  16. pRDF query processing • We will consider only simple queries: a triple with a variable term • Example (? associatedWith Pneumonia 4) • What is associated with Pneumonia with probability above .4? • Simple method: • Compute the closure • Select any triple in the closure that matches the query • VERY expensive computationally

  17. pRDF query processing • Set of algorithms for answering simple queries and conjunctions: • pRDF_Subject, pRDF_Property, …, pRDF_conjunction • Central idea: • Apply Δ in only those directions that yield tuples relevant to the query • Cut off path computations when the threshold can no longer be reached. • min(current_probability, threshold)

  18. Experimental results • Implementation • Java, 1700 LOC • Disk-based storage for pRDF theories • Synthetically generated datasets • According to varying underlying distributions • Datasets extracted from Web sources

  19. Experimental questions • Does the underlying distribution affect query running time? • From a practical point of view, which are the “fastest” types of queries? • How does running time vary with the number of atoms in a conjunction? • What other theory-dependent factors affect running time? • Theory width • Number of properties

  20. Query running time (Poisson)

  21. Query running time (zipf)

  22. Conjunctive queries running time

  23. Dependence on property width

  24. Number of properties

  25. Take away points • RDF syntax with uncertainty • Model-theory and fixpoint semantics for pRDF • Efficient query algorithms for pRDF

  26. The end http://om.umiacs.umd.edu/ Thank you! Questions & comments

More Related