1 / 42

What’s Wrong With current Semantic Web Reasoning (and how to fix it)

What’s Wrong With current Semantic Web Reasoning (and how to fix it). This talk (and this workshop). Current state of Web Reasoning? What's wrong with it? What are we going to do about it? LarKC: one large new effort to do something about it.

rehan
Télécharger la présentation

What’s Wrong With current Semantic Web Reasoning (and how to fix it)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What’s Wrong With current Semantic Web Reasoning(and how to fix it)

  2. This talk (and this workshop) • Current state of Web Reasoning? • What's wrong with it? • What are we going to do about it? • LarKC: one large new effort to do something about it

  3. What’s wrong with current SemWeb reasoning methods ?

  4. Characteristics of current Semantic Web reasoning • centralised, algorithmic, boolean, deterministic • examples of current attempts at scaleability: • identify subsets of OWL • OWL DL • OWL DLP • OWL Horst • identify alternative semantics for OWL • e.g. LP-style semantics • scaleability by muscle-power

  5. Scalability by muscle power

  6. Moving in the right direction: New BigOWLIM (OntoText, Sirma) • 4 switchable inference modes (owl-max,owl-horst-,rdfs-s, optimised rdf-s, none) • custom rules for definable semantics • < 100ms query performance on billlion triples(but 34hrs upload) • http://www.ontotext.com/owlim/OWLIMPres.pdf

  7. Why we need“something different” Gartner (May 2007, G00148725): "By 2012, 70% of public Web pages will have some level of semantic markup, 20% will use more extensive Semantic Web-based ontologies” • Semantic Technologies at Web Scale? • 20% of 30 billion pages @ 1000 triples per page = 6 trillion triples • 30 billion and 1000 are underestimates, imagine in 6 years from now… • data-integration and semantic search at web-scale? • Inference will have to become distributed, heuristic, approximate, probabilisticnot centralised, algorithmic, boolean, deterministic

  8. “Show me all liver toxicity associated with compounds with similar structure” Show me all liver toxicity associated with the target or the pathway. “Show me all liver toxicity from the public literature and internal reports that are related to the drug class, disease and patient population” Genetics Chemistry LITERATURE Current NCBI: linking but no inference Why we need“something different” FDA white paper Innovation or Stagnation (March 2004): “developers have no choice but to use the tools of the last century to assess this century's candidate solutions.” “industry scientists often lack cross-cutting information about an entire product area, or information about techniques that may be used in areas other than theirs” • Problem: pharmaceutical R&D in early clinical development is stagnating “Show me any potential liver toxicity associated with the compound’s drug class, target, structure and disease.” (Q1Q2Q3)

  9. Is public transportation where the people are? Which landmarks attract more people? Where are people concentrating? Where is traffic moving? Why we need“something different” • Our cities face many challenges • Urban Computing is the ICT way to address them

  10. What’s wrong with current Semantic Web Reasoning • Properties of current inference techniques: Based on logic as guiding paradigm: • Exact • Abrupt • Expensive

  11. Current inference is exact • “yes” or “no” • not: “allmost”, “not by a long shot”, “yes, except for a few”, etc(think of subClassof) • This was OK, as long asontologies were clean: • hand-crafted • well-designed • carefully populated • well maintained • etc

  12. Current inference is exact But current ontologies are sloppy (and will be increasingly so) • made by non-experts • made by machines: • scraping from • file-hierarchies, • mail-folders • todo-lists & phone-books on PDA’s • machine learing from examples

  13. Sloppy ontologies need sloppy inference

  14. Sloppy ontologies need sloppy inference “almost subClassOf”

  15. Combined ontologiesneed sloppy inference Mapping ontologies is almost always messypost-doc  young-researcher “almost equal”

  16. Properties of current inference techniques Based on logic as guiding paradigm: • Exact approximate • Abrupt • Expensive

  17. Current inference is abrupt • nothing……………….. yes! we want gradual answers: • anytime computation • agent can decide how good is good enough(human or machine) • deadline computation • pay for quality • load balancing

  18. Current inference is expensive • approximate answers are cheap • gradual answers are arbitrarily cheap(WYPIWYG)

  19. Properties of current inference techniques Based on logic as guiding paradigm: • Exact approximate • Abrupt gradual • Expensive cheap

  20. What’s wrong with currentSemantic Web Reasoning • obsession with worst-case asymptotic complexity

  21. Who cares about decidability? • Decidability ≈ completeness guarantee to find an answer, or tell you it doesn’t exist, given enough run-time & memory • Sources of incompleteness: • incompleteness of the input data • insufficient run-time to wait for the answer • Completeness is unachievable in practice anyway, regardless of the completeness of the algorithm

  22. Who cares about undecidability? • Undecidability ≠ always guaranteed not to find an answer • Undecidability = not always guaranteed to find an answer • Undecidability may be harmless in many cases; in all cases that matter

  23. Who cares about complexity? • worst-case: may be exponentially rare • asymptotic • ignores constants

  24. What to do instead? • No good framework for “average case” complexity • 2nd best:do more experimental performance profileswith realistic data

  25. What’s wrong with currentSemantic Web Reasoning • obsession with worst-case asymptotic complexity • not even good framework for "average" complexity • obsession with recall & precision Why we need“something different”

  26. Need for approximation • Trade-off recall for precision or vice versa • security: prefer recall • medicin: prefer precision • Trade-off both for speed • Logician’s nightmare: • drop soundness & completeness!

  27. precision (soundness) recall (completeness) A logician’s nightmare (Dieter Fensel) Semantic Web logic IR

  28. What’s wrong with currentSemantic Web Reasoning • obsession with worst-case asymptotic complexity • no good framework for "average" complexity • obsession with recall & precision • no good framework for “good enough” • separation of reasoning and search

  29. Integrating Searchwith Reasoning Search Axioms: a hasType b b subClassOf c Reasoning Conclusion

  30. Summary of analysis • Based on logic, which is strict, abrupt, expensive • Obsession with complexity • Obsession with traditional soundness/completeness & recall/precision • No recognition that different use-cases need different performance trade-offs

  31. The Large Knowledge Collider

  32. Goals of LarKC  • Scaling to infinity • by giving up soundness & completeness • by switching between reasoning and search • Reasoning pipeline • by plugin architecture • Large computing platform • by cluster computing • by wide-area distribution

  33. Scaling to infinity Possible approaches • Markov Logic(probability in the logic, judging truth of formula): • adds a learnable weight to each FOL formula, specifying a probability distribution over Herbrand interpretations (possible worlds) • weighted RDF Graphs(probability as a heuristic, judging relevance of formula): • weighted activation spreading (for selection), • followed by classical inference over selected subgraph • model sampling (probability in the logic):sampling space of all truth assignments, driven by probability of model • and others

  34. Goals of LarKC  • Scaling to infinity • by giving up soundness & completeness • by switching between reasoning and search • Reasoning pipeline • by plugin architecture • Large computing platform • by cluster computing • by wide-area distribution

  35. What is the large Knowledge Collider?Plug-in architecture • Retrieve • Abstract • Select • Reason • Decide

  36. What isthe Large Knowledge Collider • Integrating Reasoning and Search • dynamic, web-scale, and open-world • in a plugable architecture • Combining consortium competence • IR, Cognition • ML, Ontologies • Statistics, ML, Cognition,DB • Logic,DB, Probabilistic Inference • Economics, Decision Theory

  37. Goals of LarKC • Scaling to infinity • by giving up soundness & completeness • by switching between reasoning and search • Reasoning pipeline • by plugin architecture • Large computing platform • by cluster computing • by wide-area distribution 

  38. Two parallel implementations • Medium-size tight cluster parallel computing • ≈ O(102) nodes • fully available • fully reliable • (almost) no bandwidth restrictions • Large scale wide area distributed computing • ≈ O(104) nodes • unpredictable, unreliable, very limited bandwidth • Thinking@home (SETI@home, folding@home)

  39. How & when will others get access to the results • Public releases of LarKC platform • Public APIs enabling others to develop plug-ins • Create Early Access Group • Encourage uptake through Challenge Tasks • Encourage participation through Thinking@home • World Health Org. use-case is public domain data • Give access to best practicethrough contributions to W3C SWBPD, SWEO, HCLS

  40. Who will build this?

  41. Timing • Start in April ’08 • First prototype after 1 year • Limited open access after 2 years • Open access after 2.5 years • Open API’s, competition • First demonstrators after 2.5 years • Run-time 3.5 years

  42. Most important results? “An implemented configurable platform for large scale semantic computing, together with a library of plug-insand APIs enabling development by others, the practicality of which is shown in three demonstrated deploymentsin medical research, drug development and urban computing using mobile data Open to the community come and play with us

More Related