1 / 110

Knowledge Access Semantic technology for KM

ACAI 05 SEKT SUMMER SCHOOL ON KNOWLEDGE TECHNOLOGY. Knowledge Access Semantic technology for KM. John Davies BT Research john.nj.davies@bt.com. Overview. Introduction to the Semantic Web Language stack Semantic Search and Browse Knowledge Sharing

cooper-cruz
Télécharger la présentation

Knowledge Access Semantic technology for KM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ACAI 05 SEKT SUMMER SCHOOL ON KNOWLEDGE TECHNOLOGY Knowledge AccessSemantic technology for KM John Davies BT Research john.nj.davies@bt.com

  2. Overview • Introduction to the Semantic Web • Language stack • Semantic Search and Browse • Knowledge Sharing • Natural Language Generation & Summarisation • Knowledge Delivery via Device Independence • Quiz!

  3. Limitations of the Web today Machine-to-human, not machine-to-machine

  4. The Semantic Web • allowing information to be shared and processed • adding context and structure Tim Berners-Lee • “an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation” • An open platform

  5. Semantic Web „The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work inco-operation.“[Berners-Lee et al., 2001]

  6. „W3C Semantic Web Standardization:Work on Web Ontology Language (OWL)“ „W3C standardization of Semantic Web startsWork on Resource Description Framework (RDF) Work on RDF Schema (RDFS)“ „Research projects on Web Ontologiesstart EU : On-To-Knowledge (01/00)and US (DARPA): DAML (07/00)“ „Web data transfer larger than FTP data transfer“ „Kifer, Lausen, Woo, Logical foundations of object-oriented and frame-based languages“ „A. Borgida, On the relative expressiveness of description Logics and predicate logic“ „W3C Standardization of XML starts“ ... Semantic Web HISTORY 10.2.2004: Resource Description Framework (RDF) Web Ontology Language (OWL) become W3C recommendations [ Source: http://www.zakon.org/robert/internet/timeline/ ]

  7. Semantic Web Layers Entailment of the Implicit Explicit Semantics Relational Distributed Data Data Exchange

  8. Where we are Today: the Syntactic Web [Hendler & Miller 02]

  9. i.e. the Syntactic Web is… • A place where • computers do the presentation (easy) and • people do the linking and interpreting (hard). • Why not get computers to do more of the hard work? [Goble 03]

  10. e.g. Barn Owl Hard Work using the Syntactic Web… • Complex queries involving background knowledge • Find information about “animals that use sonar but are not either bats, dolphins or whales” • Locating information in data repositories • Travel enquiries • Prices of goods and services • Results of human genome experiments • Delegating complex tasks to web “agents” • Book me a holiday next weekend somewhere warm, not too far away, and where they speak French or English

  11. Motivation – Knowledge Management Knowledge workers are overwhelmed with information: • from intranets, emails, external newslines … • but may still lack the information required They need information identified: • by semantics, not just keywords • by their interests and their task context • in a form appropriate to their current physical context • mobile phone, PDA, blackberry, laptop, …

  12. Knowledge access • context-aware tools for access to semantically-annotated knowledge • search, browse, share, summarise • integrated into day-to-day business processes • automatic knowledge delivery based on current context • activity, location, device, interests • support multiple end-user devices

  13. XML is a first step • Semantic markup • HTML  layout • use bold font • Insert an image here • XML  content • this part of the document is the product price • this document describes a telecommunications service

  14. XML <play> <title>The Life and Death of King John</title> <Dramatis Personae> <persona>The Earl of PEMBROKE</persona> <persona>The Earl of ESSEX</persona> …… </Dramatis Personae> <Stagedir>SCENE England, the Court.</Stagedir> <act>Act 1 <scene>Scene I. <speech> <speaker>John</speaker> <line>Now, Chatillon, what would France with us?</line> </speech>

  15. QuizXML • Standard search engine • WWW pages indexed • maps keywords to WWW pages • QuizXML • A finer-grained index • maps keywords to documents and the XML tags in which they occur

  16. QuizXML demo

  17. XML is a first step • Metadata (with limitations) • within documents, not across documents • prescriptive, not descriptive • No commitment on vocabulary and modelling primitives (subclass, instance, etc) <vehicle> <car>ford <engine>xyz123-4</engine> <model>mondeo></mondeo> </car> </vehicle> • RDF and ontologies are the next step

  18. What are Ontologies? • Ontologies provide a shared and common understanding of a domain (medicine, finance, …) • a shared specification of a conceptualisation • ‘Concept map’ • A simple example - Yahoo • Business&Economy > Finance > Banking • for WWW, defined using RDF(S) & OWL

  19. Taxonomies Animals Vertebrates Invertebrates ….. Insects Reptiles Arachnids Mammals

  20. Contractor advises funds Ontology of People and their Roles Employee Expert Analyst Manager Programme Mgr Project Mgr

  21. Structure of an Ontology Typically two distinct components: • Names for important concepts and relationships in the domain • Elephantis a concept whose members are a kind of animal • Herbivore is a concept whose members are those animals who eat only plants • Background knowledge/constraints on the domain • Adult_Elephantsweigh at least 2,000 kg • No individual can be both a Herbivore and a Carnivore

  22. Why develop an ontology? • Define web resources more precisely and make them amenable to machine processing • Make domain assumptions explicit • Easier to change domain assumptions • Easier to understand and update legacy data • Separate domain and operational knowledge • Re-use separately • A community reference for applications • To share a consistent understanding of what information means

  23. Ontologies - Some Examples • General purpose ontologies: • The Upper Cyc Ontology, http://www.cyc.com/cyc-2-1/index.html • IEEE Standard Upper Ontology, http://suo.ieee.org/ • Domain and application-specific ontologies: • RDF Site Summary RSS, http://groups.yahoo.com/group/rss-dev/files/schema.rdf • Dublin Core, http://dublincore.org/ • UMLS, http://www.nlm.nih.gov/research/umls/ • Open Biological Ontologies: http://obo.sourceforge.net/ • FOAF – www.foaf.org • Ontologies in a wider sense • Agrovoc, http://www.fao.org/agrovoc/ • UNSPSC, http://eccma.org/unspsc/ • DAML.org library http://www.daml.org/

  24. Ontology and Logic • Reasoning over ontologies • Inferencing capabilities X is author of Y  Y is written by X X co-wrote D; Y co-wrote D  X and Y collaborate Cars are a kind of vehicle; Vehicles have 2 or more wheels  Cars have 2 or more wheels

  25. RDF and RDF-S • W3C standards • RDF-S defines the ontology • classes and their properties and relationships • There are books and authors. Authors write books. • RDF defines the instances of these classes and their properties • Mark Twain is an author • Mark Twain wrote “Adventures of Tom Sawyer” • “Adventures of Tom Sawyer” is a book

  26. An example RDF Schema Annotation of WWW resources and semantic links domain range Writer Book hasWritten subClassOf FamousWriter type Schema(RDFS) Data(RDF) “25/12/68” type DoB hasWritten books.com/ISBN00010475 /twain.com/mark

  27. RDF hasName (‘http://www.famouswriters.org/twain/mark’, “Mark Twain”) hasWritten (‘http://www.famouswriters.org/twain/mark’, ‘http://www.books.org/ISBN00001047582’) title (‘http://www.books.org/ISBN00001047582’, “The Adventures of Tom Sawyer”) XML version: <rdf:Description rdf:about=http://www.famouswriters.org/twain/mark> <s:hasName>Mark Twain</s:hasName> <s:hasWritten rdf:resource=http://www.books.org/ISBN0001047/> </rdf:Description>

  28. QuizRDF • Searching RDF-annotated web resources

  29. RDF metadata annotations Data (WWW document) Annotation (metadata) Lost information • Subjective • One of several interpretations • Not exhaustive RDF

  30. RDF as an Enrichment Text Annotation RDF Text

  31. Precision and recall - the IR dilemma • Trade-off between precision and recall • recall - how many of relevant were found • precision - how many of found were relevant • Holy grail: high precision & high recall • QuizRDF offers both • separately • closely-coupled

  32. Indexing: data model

  33. Multidimensional Indexing • “Traditional” search engine indexing term  {documents} “employee”  {URI1, URI3, URI9} “miller”  {URI3, URI7} • QuizRDF indexing <literal,class,property>  {URIs} <“george”, Employee, first_name>  {URI2} <“miller”, Employee, last_name>  {URI1, URI3} <“miller”, Employee, >  {URI1, URI3, URI7}

  34. QuizRDF demo

  35. Precision Recall RDF Text Two Retrieval Channels Browser interface Keyword query RQL • Precise • Machine readable • Subjective • Incomplete • Higher precision • Original content • “Complete” • Imprecise • Higher recall

  36. Contribution • Combination of • User familiar keyword search • More precise RDF querying • Data and metadata as complementary • Low threshold, high ceiling • Works on non-RDF information • Exploits RDF where it exists • Integrates browsing and querying • Fits users’ info seeking behavior

  37. Conclusions about RDF(S) • Next step up from plain XML: • (small)ontological commitmentto modelingprimitives • possible to define domainvocabulary • limited reasoning • subsumption, but no transitivity, symmetry, … • limited expressive power • no cardinality constraints, equality, disjointness, …

  38. Web Ontology Language Requirements Desirable features identified for Web Ontology Language: • Extends existing Web standards • Such as XML, RDF, RDFS • Easy to understand and use • Should be based on familiar KR idioms • Formally specified • Of “adequate” expressive power • Possible to provide automated reasoning support

  39. OWL Language • OWL is based on Description Logics knowledge representation formalism • OWL (DL) benefits from many years of DL research: • Well defined semantics • Formal properties well understood (complexity, decidability) • Known reasoning algorithms • Implemented systems (highly optimised) • Three species of OWL • OWL Full – maximum expressivity, undeciable • OWL DL – based on SHIQ DL, decidable • OWL Lite - subset of OWL DL, most efficient reasoning

  40. Why OWL? • OWL = Web Ontology Language • Owl’s superior intelligence is known throughout the Hundred Acre Wood, as are his talents for Writing, Spelling, other Educated and Special tasks. • "My spelling is Wobbly. It's good spelling, but it Wobbles, and the letters get in the wrong places."

  41. QuizOWL!

  42. Re-cap • XML, RDF, OWL language stack • Increasingly sophisticated search • QuizXML • subdocument searching • QuizRDF • browsing by concept and across relations • searching on metadata and full-text • Next steps in semantic search • identification of named entities within documents • Exploitation of world knowledge • KIM (Ontotext)

  43. The KIM Platform • A platform offering services and infrastructure for: • (semi-) automatic semantic annotation • ontology population • semantic indexing and retrieval of content • query and navigation • Based on an Information Extraction technology • Aim: to underpin Semantic Web applications • by providing a metadata generation technology • in a standard, consistent, and scalable framework

  44. Ontologies http://proton.semanticweb.org/ • PROTON - a light-weight upper-level ontology; • 250 NE classes; • 100 relations and attributes; • covers mostly NE classes, and to a smaller degree general concepts;

  45. Ontologies II

  46. KIM World KB • Aims to cover the most popular entities in the world • Entities of general importance … like the ones that appear in the news … • KIM “knows about”: • Organizations, all important sorts of: business, international, political, government, sport, academic… • Specific people, (e.g. Politicians) • Locations: countries, regions, cities, roads, etc.

  47. KIM World KB: Content • Collected from various sources, like geographical and business intelligence gazetteers. • KIM also learnsfrom documents indexed • via GATE information extraction KB scale RDF Statements Small KB Full KB - explicit 444,086 2,248,576 - after inference 1,014,409 5,200,017

  48. KIM Scaling on Data • The Semantic Repository is based on Sesame/OWLIM. • Our practical tests demonstrate a perfect performance on top of: • 1.2M entity descriptions: • about 15M explicit statements; • above 30M statements after forward chaining. • Fulltext indexing with Lucene: • .5M docs, retrieval in milliseconds

  49. Semantic Annotation

  50. Simple Usage: Highlight, Hyperlink, and …

More Related