1 / 21

URI Identity Management for Semantic Web Data Integration and Linkage

URI Identity Management for Semantic Web Data Integration and Linkage. Afraz Jaffri , Hugh Glaser, Ian Millard Electronics and Computer Science University of Southampton. URI Identity Management for Semantic Web Data Integration and Linkage. Presentation Outline. Linked Data

hayes
Télécharger la présentation

URI Identity Management for Semantic Web Data Integration and Linkage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. URI Identity Management for Semantic Web Data Integration and Linkage Afraz Jaffri, Hugh Glaser, Ian Millard Electronics and Computer Science University of Southampton

  2. URI Identity Management for Semantic Web Data Integration and Linkage Presentation Outline • Linked Data • URI Multiplicity • The Problem of Coreference • URI Identity Management Approaches • The Problem with owl:sameAs • The Consistent Reference Service (CRS) • CRS Architecture • A CRS Application: The RKB Explorer • Summary and Future Work SSWS07 - Vilamoura, Potugal

  3. URI Identity Management for Semantic Web Data Integration and Linkage Linked Data • DBpedia has URIs for approximately 2 million entities • Linked datasets contain many overlapping entities • A single entity can have a number of URI’s • Entities are linked using owl:sameAs • Example • <http://dbpedia.org/resource/Berlin> <owl:sameAs> <http://sws.geonames.org/2950159> SSWS07 - Vilamoura, Potugal

  4. URI Identity Management for Semantic Web Data Integration and Linkage More Linked Data • http://www.rkbexplorer.com • Contains URIs for more than 10 million entities • Data relating to people, projects, papers and institutions • A single entity has a number of URIs (even within the same repository) • Entities are linked using CRSs DBLP SSWS07 - Vilamoura, Potugal

  5. URI Identity Management for Semantic Web Data Integration and Linkage URI Multiplicity URIs for ‘Spain’: http://dbpedia.org/resource/Spain http://ww4.wiwiss.fu-berlin.de/factbook/resource/Spain http://sws.geonames.org/2510769 http://www4.wiwiss.fu-berlin.de/eurostat/resource/countries/Espa%C3%Bla URIs for ‘Hugh Glaser’: http://acm.rkbexplorer.com/rdf/resource-P112732 http://citeseer.rkbexplorer.com/rdf/resource-CSP109020 http://citeseer.rkbexplorer.com/rdf/resource-CSP109013 http://citeseer.rkbexplorer.com/rdf/resource-CSP109011 http://citeseer.rkbexplorer.com/rdf/resource-CSP109002 http://dblp.rkbexplorer.com/rdf/resource-27de9959 http://europa.eu/People/#person-0ff816fa http://resist.ecs.soton.ac.uk/wiki/User:hugh_glaser http://www.ecs.soton.ac.uk/info/#person-00021 SSWS07 - Vilamoura, Potugal

  6. URI Identity Management for Semantic Web Data Integration and Linkage It’s all about Identity Tom Anderson – http://www4.wiwiss.fu-berlin.de/dblp/resource/person/109074 Is dc:creator of <http://www4.wiwiss.fu berlin.de/dblp/resource/record/conf/dac/MorettiHNCKABDF01> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ftcs/SaeedLA91> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ftrtft/LemosSA92> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/hybrid/AndersonLFS92> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/iccbss/AndersonFRR03> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/iciap/TruccoARI05> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/icnp/ElySWSA01> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ifip/AndersonRR04> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/sc/BorchersASW95> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/seaai/AndersonH98> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/srds/Anderson86> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/words/AndersonFRR05> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/bell/LiuBFSRA04> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/cj/LemosSA92> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/Anderson01> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/Anderson03> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/ZorianASTI96> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/software/LemosSA95> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/ton/SavageWKA01> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/tse/AndersonBHM85> is dblp:editor of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/sigcomm/2006> University of California, Berkely Professor, Heriot Watt University Tom Andersen - University of Denmark Vice President O-in Design Automation inc. USA Professor, University of Newcastle Lucent Technologies, Illinois University of Washington SSWS07 - Vilamoura, Potugal

  7. URI Identity Management for Semantic Web Data Integration and Linkage Coreference in Information Science • The problem of coreference has existed for many years • Physical Libraries disambiguate authors through Date of Birth • Digital Libraries still have the problem of author disambiguation • Problems caused by variations in naming schemes • e.g. ‘Glaser, H.’ • ‘H. Glaser’ • ‘Glaser, Hugh’ • ‘H. Glazer’ SSWS07 - Vilamoura, Potugal

  8. URI Identity Management for Semantic Web Data Integration and Linkage Coreference in Databases • Coreference Problem referred to as ‘Record Linkage’ • Matching entities between records similar to matching entities between datasets • Database linkage is easier due to imposed schema • Formal theory of Record Linkage proposed by Fellegi & Sunter (1969) • Uses coded agreements between each field (property) to give the probability of record (instance) equivalence • Can be adapted for use on the Semantic Web SSWS07 - Vilamoura, Potugal

  9. URI Identity Management for Semantic Web Data Integration and Linkage Coreference on the Semantic Web • Coreference on the Semantic Web is defined as being the situation where two or more URI’s are used for a single non-information resource • URI usage can change with context • Non-Information resources are hard to define precisely • Examples • ‘Hugh Glaser’ at Southampton vs. ‘Hugh Glaser’ at Imperial • ‘Harry Potter and the Order of the Phoenix’ in Hardbackvs. Softback • ISBN: 978-0747561071978-0747551003 SSWS07 - Vilamoura, Potugal

  10. URI Identity Management for Semantic Web Data Integration and Linkage URI Identity Management Approaches • Use a centralised naming authority to issue URIs for every entity in the world • Let everyone create their own URIs and link them to ‘official’ URIs (using owl:sameAs) • Let everyone create their own URIs and register them at a centralised repository • Let everyone create their own URIs and let them be managed by many decentralised repositories • In all of the above encourage reuse and linking as far as possible SSWS07 - Vilamoura, Potugal

  11. URI Identity Management for Semantic Web Data Integration and Linkage The problem with owl:sameAs • owl:sameAs was designed for a specific purpose • Resources linked with owl:sameAs have the same identity i.e. The subject and object are exactly the same resource • owl:sameAs has been misused for Linking Open Data • Linking can occur between two very different resources, e.g. Tom Anderson • Reasoning with LOD will have unintended consequences SSWS07 - Vilamoura, Potugal

  12. URI Identity Management for Semantic Web Data Integration and Linkage Example <rdf:Descriptionrdf:about=“#URI-1”> <rdf:Descriptionrdf:about=“#URI-2”> <vcard:FN>Hugh Glaser</vcard:FN> <vcard:FN>Hugh Glaser</vcard:FN> <vcard:EMAIL>hg@ecs.soton.ac.uk</vcard:EMAIL><vcard:EMAIL>hg1@soton.ac.uk</vcard:EMAIL> <vcard:ROLE>Reader</vcard:ROLE></rdf> <vcard:ROLE>Lecturer</vcard:ROLE></rdf> Assert <URI-1> <owl:sameAs> <URI-2> SELECT ?x WHERE {<URI-1> vcard:EMAIL ?x} Returns hg1@soton.ac.uk hg@ecs.soton.ac.uk Which email belongs to which role? Using owl:sameAs means that both URI’s become indistinguishable even though they may refer to different entities according to the context in which they are used. SSWS07 - Vilamoura, Potugal

  13. URI Identity Management for Semantic Web Data Integration and Linkage Our Approach • Data (Knowledge) providers publish data (knowledge) • Resources from one provider cannot be guaranteed to be the same as resources from another provider • Knowledge will be published and made dereferenceable at the domain that the publisher has control over • URIs will be constructed from the domain name of the publisher’s site • An intermediate service groups URIs of resources that may be the same • This knowledge is made available upon dereferencing the URI of a resource SSWS07 - Vilamoura, Potugal

  14. URI Identity Management for Semantic Web Data Integration and Linkage The Consistent Reference Service (CRS) • Can be seen as a conventional Knowledge Base • Contains knowledge about the URIs in a repository • URIs referring to the same resource are grouped together in ‘Bundles’ • A Bundle has properties: • Coref:hasEquivalentReference– The URIs in a bundle are grouped together using this predicate • Coref:hasCanonicalReference– One URI in a bundle can be made to be the canonical representation i.e. The preferred URI • Coref:updatedOn– The date of the last update to the bundle SSWS07 - Vilamoura, Potugal

  15. URI Identity Management for Semantic Web Data Integration and Linkage Example of a Bundle @prefix coref: <http://www.resist.ecs.soton.ac.uk/ontology/coref#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <http://citeseer.rkbexplorer.com/crs/coref#bundle1> a coref:Bundle ; coref:hasCanonicalReference <http://citeseer.rkbexplorer.com/rdf/resource-CSP109002> ; coref:hasEquivalentReference <http://citeseer.rkbexplorer.com/rdf/resource-CSP109011> , <http://citeseer.rkbexplorer.com/rdf/resource-CSP109020> , <http://citeseer.rkbexplorer.com/rdf/resource-CSP109013> , <http://citeseer.rkbexplorer.com/rdf/resource-CSP109002> . SSWS07 - Vilamoura, Potugal

  16. URI Identity Management for Semantic Web Data Integration and Linkage CRS Architecture http://southampton.rkbexplorer.com/id/person-00021 Non-Information Resource RESOLVE KB RDF Application RETRIEVE CRS RDF Text/Html RDF/XML http://southampton.rkbexplorer.com/data/person-00021 Information Resource http://southampton.rkbexplorer.com/description/person-00021 Information Resource SSWS07 - Vilamoura, Potugal

  17. URI Identity Management for Semantic Web Data Integration and Linkage Finding all Equivalences • Finding all equivalences (bundles) is up to the application • A separate activity from coreferencing a single data source • Services such as Sindice can perform this function for free • To perform the equivalence closure just follow the crs:hasCRS links • Scalability is ensured by not including all possible bundles in every CRS SSWS07 - Vilamoura, Potugal

  18. URI Identity Management for Semantic Web Data Integration and Linkage The RKB Explorer – A CRS Application • The Resilience Knowledge Base Explorer displays communities of practice for people, projects and publications from the RKB • Uses multiple CRSs to disambiguate people and publications • One CRS per knowledge base ensures scalability • Multiple SPARQL queries • Look yourself up! • www.rkbexplorer.com/explorer SSWS07 - Vilamoura, Potugal

  19. URI Identity Management for Semantic Web Data Integration and Linkage Future Work • Equivalence Mining is a difficult task that requires multiple algorithms • Adding policies to determine the trust level of a CRS • Establishing the authority of a CRS over a KB • Establishing performance metrics • Collaborating with LOD community for wide scale deployment • Formalising the linking methodology SSWS07 - Vilamoura, Potugal

  20. URI Identity Management for Semantic Web Data Integration and Linkage Summary • Coreference exists in many disciplines and will exist on the Semantic Web • The equivalence of non-information resources depends on context • The semantics of owl:sameAs do not fit with the current usage in Linked Data • The CRS is a solution that is being deployed on a large knowledge-based infrastructure • Its my knowledge, so let me name it! SSWS07 - Vilamoura, Potugal

  21. URI Identity Management for Semantic Web Data Integration and Linkage Questions? SSWS07 - Vilamoura, Potugal

More Related