1 / 28

Logics for Data and Knowledge Representation

Logics for Data and Knowledge Representation. Introduction to Semantic Web. Fausto Giunchiglia Feroz Farazi. Semantic Web. Definitions.

dex
Télécharger la présentation

Logics for Data and Knowledge Representation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logics for Data and Knowledge Representation Introduction to Semantic Web Fausto Giunchiglia Feroz Farazi

  2. Semantic Web Definitions • An extension of the WWW, in which information is given well-defined meaning, better enabling computers and people to work in cooperation [T. Berners-Lee et al., 2001] • A new form of Web content that is computer comprehensible will open up a revolution of new possibilities [T. Berners-Lee et al., 2001] • An alternative approach to represent Web content in machine processable way, and to use intelligent techniques to take advantage of these representations [G. Antoniou and F.v. Harmelen, 2004] • An extra abstraction layer, a so-called semantic layer, to be built on top of the Web [F. Giunchiglia et al., 2010]

  3. Semantic Web Keys • Semantics • Data and documents are assigned semantics • Semantics are codified as metadata • Logic • Logic as a tool for expressing knowledge and semantics • Ontology • A set of terms and semantic relations among them • ZIP code and postal code are equivalent for example • Language and Vocabulary • Semantic Web Languages (e.g., RDF and OWL) • Standard Vocabularies (e.g., Dublin Core and FOAF)

  4. World Wide Web • An enormous collection of data and documents • Any kind • Mixed • Keeps growing • Open to all • Suffers from some well known limitations in information • Searching • Extracting • Maintaining • Unveiling • With all this limitations and features it is quite useful and interesting • Nevertheless, for better user experience we want to build a more integrated and consistent Web

  5. Dumb Web to Smart Web [D. Allemang and J. Hendler, 2008] • Consider that you are planning vacation to major excavation region of Heraklion in Crete Island • Find a list of hotels by location • List shows your known hotel chain Aldemar has a branch there • Unfortunately, you do not see it in Aldemar’s website • What would you call it? Dumb? • Here with dumb we mean inconsistent • Consider that you are planning a conference trip to Crete Island • You find many branches of Aldemar in the surroundings of the conference venue • You wonder to know the nearest (minimum walking distance) one • You can find many mapping sites (e.g., Google Map) answering the distance with the addresses given in input • You are the one spending time in copying and pasting addresses on the site. Can we make it any better?

  6. Dumb Web to Smart Web Why • Suppose you wonder to know the municipalities in the Autonomous Province of Trento • municipalities in the province of Trento were reorganized in 2010 • these were reduced from 223 to 217 • still many sites listing the former statistics instead of the latter • because information is hard-coded in the html pages or retrieved from the databases of the authorities to represent them on the web • in way for human consumptiononly • not for the machines, which hinders other parties to update changes automatically • Considering all the above what do we opt to build a smart web? • Smart applications or smart Web infrastructure?

  7. Smart Web Applications • The Web is overwhelmed with smart applications, in addition day to day new ones are coming to the scene • Great advancement achieved in the implementation of the ideas once considered very hard to do or will never happen • To name a few applications • Search engines’ matches are non-trivial, seem deep and intuitive • Commerce sites recommend intelligently considering customer purchase patterns • Mapping sites can plan routes and provide detailed information about geography • What role the Web infrastructure can play? • All these smart applications are only as smart as the data provided to them • Inconsistent data will lead to dumb result even from smart applications • Web infrastructure needs to be improved to support better consistency of the datathe fact that smart applications can perform to their potential

  8. Smarter Web • A Web with an infrastructure that enhances the whole Web experience by • enabling connections among data • letting users connect data to smart Web applications • not surprising us with inconsistencies • In the case of Aldemar hotel branch in the major excavation region of Heraklionwe need a coordination • between the Aldemar site and the hotel listing site by location in the level of data • that would help updating the list when there is a change in the location of hotels • In the mapping site scenario, we would like it to understand • the data from the conference and the hotels sites • without requiring human intervention in copying and pasting

  9. Semantic Data and Web of Data • Semantic data is computer understandable data • e.g., representing the hotels as real world entities and their addresses as attributes in Semantic Web languages using standard vocabularies • e.g., representing each municipality of Trento as part_meronym of the province, entity-entity connectivity within a dataset • The Semantic Web is a web of interconnected datasets where • one data element can point to another (through URIs), rather than a webpage points to another, forming a web of data • the Web infrastructure provides a data model supporting a single entity can be distributed over the Web • the data model coherence is part of the Web infrastructure

  10. Linked Data • Linked Data approach form the basis of data publishing guidelines pinpointing how can data from government, public and private sectors be more valuable for the consumers • Linked Data approach came up with • a set of principles • the star rating system • Principles • the use of http URIs as the identifiers of things(concepts, entities and attributes) • the provision of meaningful content published in RDF for each such URI reference • the production of navigable content via links

  11. Linked Open Data • The star rating system is a system that rates the published data in a scale from 1-star to 5-star • Getting 1-starrequires publishing data on the Web with an open license regardless of format, e.g., datasets can be published as images; this is also called Open Data • Producing 2-stardata requires the Open Data to be made available in structured format (e.g., excel; proprietary) in order to make it become machine readable • Producing 3-stardata requires non-proprietary formats, e.g., csv or tsv, on top of the previous rating levels • Getting 4-starrequires publishing data using W3C open standards, e.g., RDF • Achieving 5-star, the highest level in the rating spectrum, demands establishing links to RDF datasets published by others • A dataset that reaches 5-star is also called Linked Open Data

  12. Linked Entities A World of Entities Entitypedia

  13. What is an entity? We organize our world (ground) knowledge around entities • Entities are objects which are so important in our everyday life to be referred with a name • Each entity has its own metadata (e.g. name, latitude, longitude, height…) • Each entity is in relation with many other entities (e.g. Eiffel Tower is located in Paris, Fausto is a friend of Raffaella) • There are relatively “few” commonsenseentitytypes (person, …, event) • There are manyapplication/focus dependententities (artifacts, maths, ..) EiffelTower

  14. Entitypedia – the key ideas • Clear separation between the • knowledge (about entities/instances) and the • language(classes/concepts) used to express the knowledge • Knowledgeasverycarefullydesigned(2) • Lattice ofentitytypes(attributes, relations, services) • … unifying most (all?) standards (de jure, de facto) (Dublin Core, FOAF, Facebook, …) • Languageasverycarefullydesigned (1) • Linguistic resource (Wordnet + (Corelex + homographs) + multiple NLs) • … + a faceted domain Knowledge organization infrastructure, developed using the analytico-synthetic approach (extending Library Science PMEST/DEPA frameworks) • Direct linear time encoding into RDF/DL (3) • but (!) with fine tuned very fast data structures (for search, entity matching, …) • (Relatively) large scale bootstrapping + continuous evolution (4) • via system-sourcing and crowd-sourcing (under study now) • Data certification (5) • … via quality certification pipeline (under study now)

  15. Natural language and formal language Different languages and terminology The same concept can be expressed in different ways in the same language and across languages AUTOMOBILE CAR MACCHINA

  16. Formal language: domains DERAdomains (D for Domain) organize the (formal concept) language into any number of domains (“any area of knowledge, chosen subjectively, that we want to reason or communicate about”). Examples: medicine. music, pop music, people, Movies, skiing, my garden … LOCATION EIFFEL TOWER MONUMENT • Inspired by Ranganatanfaceted approach • Following precise design principles (analytico-synthetic approacch) • Organize entities as classes of similar objects • Independent of the specific chosen domains • Lattice of (overlapping) domains • Top level domain = upper levelontology COLOSSEUM BODY OF WATER GARDA LAKE MISSISSIPI RIVER AMAZON RIVER A fragment of the Space Domain

  17. Formal language: Facets • A DERA Domain contains any number of facets (hierarchy of terms each denoting an atomic concept – often corresponding to a NL multiword) • A DERA Facet is of one of three types (E for Entity, R for Relation, A for Attribute) LOCATION EIFFEL TOWER MONUMENT COLISEUM • Entity: see picture (classes of entities and entities) • Relation: Far, near, east, … with roles playing the double role of entity and relation • Attribute: qualities / quantities (high, low, 23m,) , descriptive attributes (“India is a democratic country”) BODY OF WATER GARDA LAKE MISSISSIPI RIVER AMAZON RIVER A fragment of an entity facet in the Space Domain

  18. User interface

  19. Knowledge • A set of entity types, each entity type defined in terms of: • Attributes (e.g., height, lattitude) • Relations (e.g., locatedIn, friend) • Services (e.g., computeAge, computeFoFs, computeInverseRelation, ..) • Many (categoriesof) metaattributes (e.g., mandatory, identifying, permanent, timespan, provenance, …) • Entity types organized in a lattice • coherent with the domain lattice • With an ordering on <attibutes, relations, services> but also subsupmption, value ranges, … • Entities: • A name and a URI • Etype <attributes, relations, services> plus free • Onereferenceetype and manyinducedetypes

  20. Knowledge services • CRUD on entities • EntitySearch(“metadataof E1”) (*useful in NER *) • EntityMatch (E1, E2) • Etypes (“some elementofanentity”) • Extension (etype) (* sameassearch(etype) *) • Navigate (E1, R) (* Navigate (Fausto, Friends) *) • Distance(E1,E2,R) (* Distance(Fausto, Obama, Friend) *) • … • … manyetype and applicationdependentservices

  21. Entity type lattice

  22. Some examples of etypes ENTITY Name String [ ] Description SString [ ] Part Of <Entity> Homepage URL [ ] Start Moment End Moment Duration Duration EVENT extends ABSTRACT ENTITY Participant <Person> [ ] | <Organization> [ ] Location <Location> Status Enum <SString> … PHYSICAL ENTITY extends ENTITY Height float Length float Width float Weight float LOCATION extends PHYSICAL ENTITY Latitude float Longitude float Altitude float …

  23. Example of entities Ulm Germany part-of birth place CITY COUNTRY Albert Einstein Mileva Maric spouse SCIENTIST PERSON affiliation ETHZurich UNIVERSITY

  24. A critical issue: dot-objects ETHZurich Some entities have a clear inherent polysemy (Pustejovski) • According to the situation either one aspect or the other (typically the physical or abstract aspect) of the entity is emphasized. This generates polysemy in language. • Since it depends on the situation, it would be wrong to permanently disambiguate it in one or the other way • We need a systematic way to represent these entities UNIVERSITY (as organization) UNIVERSITY (as building)

  25. Encoding into RDF • Choose (sub)domain • E facettranslatesinto TBOX conceptsubsumptionaxioms (e.g., river LG “body of water”) • R facettranslatesinto TBOX rolesubsumption (e.g., parentOf MG fatherOf) • A facet translates into TBOX subsumption (e.g., angularDistance MG latitude) • Entitypropertiestranslateinto ABOX axioms (e.g., livesIn(Fausto, Trento) NOTE: Usedonlyforinteroperability, open data, … reasoning on native data structuresasspecificpurposeservices

  26. Features of a Semantic Web • Radical new way of thinking about representing information for better results and better management • The feature of the Web is characterized by AAA Slogan(Anyone can say Anything about Any topic) • On the Semantic Web any individual has to be allowed to contribute a piece of data about some entity that can be linked to the information from other sources • This requirement • was taken into account while designing RDF • has a consequence that there is always one more (something new that someone will express) could be known – Open World Assumption

  27. RDF • RDF (Resource Description Framework) • A language for representing data in the Semantic Web • a simple data model for making statements • the capability to perform inference on the statements • Data model in RDF • The data model in RDF is a graph data model • An edge with two connecting nodes form a triple • Triple elements are subject, object and predicate • RDF representation • URIs to identify subjects, objects and predicates • Objects can be Literals

  28. References • T. Berners-Lee, J. Hendler, & O. Lassila (2001, May). The Semantic Web. Scientific American 284,34–43. • G. Antoniou & F. van Harmelen (2004). A Semantic Web Primer (Cooperative Information Systems). MIT Press, Cambridge MA, USA. • F. Giunchiglia, F. Farazi, L. Tanca, and R. D. Virgilio. The semantic web languages. In Semantic Web Information management, a model based perspective. Roberto de Virgilio, Fausto Giunchiglia, Letizia Tanca (Eds.), Springer, 2009. • D. Allemang and J. Hendler. Semantic web for the working ontologist: modeling in RDF, RDFS and OWL.Morgan Kaufmann Elsevier, Amsterdam, NL, 2008. • T. Berners-Lee. Linked Data. Design Issues for the World Wide Web - W3C, http://www.w3.org/DesignIssues/LinkedData.html, 2006.

More Related