Introduction to the Semantic Web

Introduction to the Semantic Web Vincenzo Maltese, Fausto GiunchigliaUniversity of Trento LDKR course These slides are inspired by the book A Semantic Web Primer written by G. Antoniou, F. van Harmelen (and related slides)

Roadmap • The current Web • From the current Web to the Semantic Web • Semantic Web technologies • A layered approach • Semantic Web applications 2

The Current Web

The current Web • An enormous collection of data and documents • Any kind of material • Mixed together • Keeps growing • Open to all • Mostly to be directly used by people • Unstructured content (text, images, videos) • Semi-structuredcontent (tables) • Typical usage: keyword search, navigation • Search by hand • Consumption by reading • Navigation by clicking • No meaning of terms 4

A Web for direct usage by people: navigation 5

A Web for direct usage by people: keyword search 6

Problems of Keyword-Based Search Engines • High recall, low precision • Results are highly sensitive to vocabulary • Results are single Web pages • Human involvement is necessary to interpret and combine results • Results of Web searches are not readily accessible by other software tools CITY CITY PERSON FACILITY PERSON 7

A Web for direct usage by people: encyclopedias 8

A Web for direct usage by people: encyclopedias Wikipedia infoboxes • The meaning of Web content is not machine-accessible • lack of semantics Wikipedia categories 9

From the current Web to the Semantic Web

The Semantic Web vision Represent Web content in a form that can be processed by machines such that intelligent services can be automatically developed and combined • An extension of the WWW, in which information is given well-defined meaning, better enabling computers and people to work in cooperation [T. Berners-Lee et al., 2001] • A new form of Web content that is computer comprehensible will open up a revolution of new possibilities [T. Berners-Lee et al., 2001] • An alternative approach to represent Web content in machine processableway, and to use intelligent techniques to take advantage of these representations [G. Antoniou and F. van Harmelen, 2004] • An extra abstraction layer, a so-called semantic layer, to be built on top of the Web [F. Giunchiglia et al., 2010] 11

Example: arrange a trip to Crete Consider that you are planning vacation to major excavation region of Heraklion in Crete Island • You use a search engine • You find a list of hotels by location • In the list you find out that an hotel of your favorite hotel chain is there • Unfortunately, you do not see it in the main website of the hotel chain (failure) Can we do it any better? Consider that you are planning a conference trip to Crete Island • You use a search engine • You find many branches of your favorite hotel chain in the surroundings of the conference venue • You wonder to know the nearest (minimum walking distance) one • You use Google Maps to find out, but you need to copy-paste the addresses from the website of the hotel and of the conference venue (manual effort) [D. Allemang&J. Hendler, 2008] 12

Example: find answers How many and what are the municipalities in Trentino? • Information is hard-coded in HTML pages • Information cannot be directly processed by machines • Information is hidden in authorities’ databases • Different sources may provide different information, not easy to keep aligned Can we do it any better? 13

Contributing fields • Knowledge Representation and Reasoning (KRR) • Representing knowledge • Reason about known facts • Knowledge Organization (KO) • Classify documents • Support information retrieval • Knowledge Management (KM) • Acquiring, accessing, and maintaining knowledge within an organization • Key activity of large businesses: internal knowledge as an intellectual asset • It is particularly important for international, geographically dispersed organizations • Most information is currently available in a poorly structured form (e.g. text, audio, video) • … 14

The importance of KM Why? Gartner predicts that, by 2017, 33 percent of Fortune 100 organizations will experience an information crisis, due to their inability to effectively value, govern and trust their enterprise information. 15

Evolving KM 16

Benefits of the new paradigm • Data • Better understanding of the content and reduced ambiguity/inconsistency • Enabling connections among data • Semantics as a standard and interoperability • Reduced cost of data reuse • Applications • Enabling smart agent-based applications • Automatic information interpretation • Automatic recommendation and negotiation systems • Automatic translations • Provenance and reputation computation and updates • Enforcing privacy policies • Better coordination across different applications • Reduced development costs • Reduced human effort • Example: Ride Sharing (SmartSociety) • Example: the ESSENCE training network 17

Semantic Web technologies

Semantic Web key technologies Data and documents are given explicit semantics. • Explicit Metadata • Properties are codified as explicit metadata (e.g. XML, JSON) • Standard Vocabularies (e.g., Dublin Core, FOAF) • Semantic Web Languages (e.g., RDF, OWL) • Ontologies • Formal language and Vocabulary • A set of terms and semantic relations between them • Logic and inference • Logic as a tool for expressing knowledge and semantics • Agents • Artificial agents that reason and act automatically 19

HTML • Web content is mainly formatted for human readers rather than artificial agents • HTML is the predominant language in which Web pages are written • The vocabularydescribes the presentation layer <h1>Agilitas Physiotherapy Centre</h1> Welcome to the home page of the Agilitas Physiotherapy Centre. Do you feel pain? Have you had an injury? Let our staff Lisa Davenport, Kelly Townsend (our lovely secretary) and Steve Matthews take care of your body and soul. <h2>Consultation hours</h2> Mon 11am - 7pm<br> Tue 11am - 7pm<br> Wed 3pm - 7pm<br> Thu 11am - 7pm<br> Fri 11am - 3pm<p> But note that we do not offer consultation during the weeks of the <a href=". . .">State Of Origin</a> games. • PROBLEM: Artificial agents are not able to reason and act on HTML 20

Explicit metadata • Metadata is data about data • Metadata capture (part of) the meaning of data • Semantic Web does not rely on text-based manipulation, but rather on machine-processable metadata • Here the vocabulary describes metadata • This representation is far easier to process by machines <company> <treatmentOffered>Physiotherapy</treatmentOffered> <companyName>Agilitas Physiotherapy Centre</companyName> <staff> <therapist>Lisa Davenport</therapist> <therapist>Steve Matthews</therapist> <secretary>Kelly Townsend</secretary> </staff> </company> PROBLEM: Still the meaning is not explicit 21

Ontologies • An ontology is an explicit specification of a shared conceptualization [Gruber, 1993] • Terms denote important concepts (classes of objects) of the domain • Relations are defined between these terms: ontologies are often thought of as directed graphs • By providing a common formal terminology and understanding of a given domain of interest, an ontology allows for automation (logical inference), supports reuse and favor interoperability across applications and people. 22

Kinds of ontologies • Informal representations • User classification • Web directories • Business catalogs • Progressive formal • Enumerative (e.g. DDC) • Knowledge Organization Systems • Faceted Classifications • Formal ontologies • Expressed into a formal logic language and represented using formal specifications, e.g., OWL) Ontologies differ according to the purpose and the semantics [Uschold and Gruninger, 2004] 23

Additional elements in ontologies • Relations • e.g. X teaches Y • e.g. X friend of Y • Attributes • e.g. X height is 1.85 m • e.g. X age is 45 • Value restrictions • e.g. only faculty members can teach courses • e.g. the range of the attribute height goes from 0 to 3 m) • Disjointness statements • e.g. faculty members and administrative staff are disjoint • Logical relationships between objects • e.g. every department must include at least 10 faculty members 24

The Role of Ontologies on the Web • By providing a common terminology and understanding of a given domain of interest, ontologies: • overcome differences in terminology (vocabulary control) and support learning • support knowledge organization (indexing, search and navigation of information) • support reuse and favor semantic interoperability across applications and people • If the terminology is formal (TBOX), they allow for automation (logical inference) • Ontologies are useful for improving the accuracy of Web search • search engines can look for pages that refer to a precise concept in an ontology (indexing, concept search) • Web search can exploit the generalization/specialization relations between concepts (query expansion) • If a query fails to find any relevant documents, the search engine may suggest to the user a more general query • If too many answers are retrieved, the search engine may suggest to the user some specializations (categorization) 25

Web Ontology Languages • RDF and RDF Schema • RDF is a data model for objects and relations between them • RDF Schema is a vocabulary description language • RDF Schema describes properties and classes of RDF resources • RDF Schema provides semantics for generalization hierarchies of properties and classes • OWL • OWL is a richer ontology language • It supports relations between classes, including disjointness • It supports cardinality (e.g. exactly one) • It supports richer typing of properties • It supports characteristics (meta-properties) of properties (e.g., symmetry) 26

Logical inference • When a formal language (logic) is used, automatedreasonerscan deduce (infer) conclusions from the given knowledge • Logic can also be used by intelligent agents for making decisions and selecting courses of action • Logic is more general than ontologies • Logic can be used to uncover ontological knowledge that is implicitly given • It can also help uncover unexpected relationships and inconsistencies 27

Software Agents A personal agent on the Semantic Web will be able to: • receive some tasks and preferences from the person • seek information from Web sources • communicate with other agents • compare information about user requirements and preferences • make certain choices • give answers to the user Software agents work autonomously and proactively 28

A layered approach

The Semantic Web Stack(s) • The development of the Semantic Web proceeds in steps • Each step builds a layer on top of another • Principles of “downward compatibility” and “Upward partial understanding” • Two alternative stacks are currently in place 30

The layers Digital signatures, recommendations, rating agencies …. Proof generation, exchange, validation It enhances ontology languages further (application-specific) More expressive than RDF Schema, OWL is the current Semantic Web standard RDF basic data model for facts + RDF Schema simple ontology language XML as syntactic basis URIs are universal reference identifiers 31

Semantic Web Applications

Semantic Data and Web of Data • The Semantic Web is a web of interconnected datasets where: • one data element can point to another (through URIs), rather than a webpage points to another, forming a Web of data (rather than a Web of pages) • the Web infrastructure provides a data model supporting a scenario in which a single entity can be referred to over the Web • the coherence of the data model is part of the Web infrastructure 33

Linked Data The Linked Data approach forms the basis of data publishing guidelines pinpointing how can data from government, public and private sectors be more valuable for the consumers Principles: • the use of http URIs as the identifiers of things (concepts, entities and attributes) • the provision of meaningful content published in RDF for each such URI reference • the production of navigable content via links 34

Linked Data 35

The 5-start rating system links to other RDF open datasets W3C open format (e.g. RDF) Non-proprietaryformat (e.g. CSV) structured format publishing on the Web with an open license regardless of format 36

Open Government Data • Various governmental departments as part of their daily activities, produce, manage and store large volume of authentic and interesting data • Why opening data?: • great economic value • strong potential for supporting innovation • transparency and participation • improving organizational and communication efficiency • support data-centric applications • Not all of this data can be made publicly available because of the constraints such as: • privacy issues • intellectual property rights • national security concerns 37

Summary

Features of the Semantic Web • The Web is characterized by the AAA Slogan: Anyone can say Anything about Any topic • The Semantic Web is a radical new way of thinking about a better representation of information with embedded meaning • The Semantic Web is still characterized by the AAA Slogan where anyone can contribute with a piece of data about some entity that can be linked via URIs to other sources • This requirement is at the basis of Web languages and follows an Open World Assumption 39

References • T. Berners-Lee, J. Hendler, & O. Lassila (2001, May). The Semantic Web. Scientific American 284,34–43. • Gruber (1993). A translation approach to portable ontology specifications. Knowledge Aquisition, 5 (2), 199–220. • G. Antoniou & F. van Harmelen (2004). A Semantic Web Primer (Cooperative Information Systems). MIT Press, Cambridge MA, USA. • Uschold & Gruninger (2004). Ontologies and semantics for seamless connectivity. SIGMOD Rec., 33(4), 58–64. • F. Giunchiglia, F. Farazi, L. Tanca, and R. D. Virgilio (2009). The semantic web languages. In Semantic Web Information management, a model based perspective, Springer. • D. Allemang and J. Hendler (2008). Semantic web for the working ontologist: modeling in RDF, RDFS and OWL. Morgan Kaufmann Elsevier, Amsterdam, NL. • T. Berners-Lee (2006). Linked Data. Design Issues for the World Wide Web - W3C, http://www.w3.org/DesignIssues/LinkedData.html. • T. Heath, C. Bizer (2011). Linked Data. Evolving the Web into a global data space, Morgan and Claypool.

Introduction to the Semantic Web