Towards Semantic Web engineering

Towards SemanticWeb engineering Multichannel publishing 3/12/2009 Olli Alm

Outline Part 1: Semantic Web Ontology RDF languages Querying and reasoning SW data Part 2: Modelling SW data SW data processing Case examples Summary

Outline

Part 1

Outline: part 1

Semantic Web • The vision: WWW with intelligent machines (Tim Berners-Lee) • In practice: a set of languages and techniques for knowledge processing, modelling and representation • W3C activity group: standards, specifications, recommendations, tools (www.w3.org)

Semantic Web • ”The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of • documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.” • (from W3C SW activity statement) • 1) common formats for integration of data • 2) for recording how the data relates to real world object

Semantic Web • The layer cake of the Semantic Web technologies

Semantic Web • MVC & XML-movement in the web: separate the data model from it’s representation • The Semantic Web: • “unified” data model for representing (real world) data to be utilized • on any representation • what if we could… • …represent any kind of (real world) data? • …represent data in a unified way? • …just take and reuse open datain our application? • …integrate data easily from diverse sources?

Semantic Web • The Semantic Web: • A branch of Artificial Intelligence? • Symbolic AI: old ideas in a new form? • Machine intelligence: symbolic representation of the facts • ”Symbolic AI (or Classical AI) is the branch of • artificial intelligence research that concerns • itself with attempting to explicitly represent • human knowledge in a declarative form (i.e. facts and rules).”*

Semantic Web • The Semantic Web: • Explicit representation: an ontology

Semantic Web • The Semantic Web: • Explicit representation: an ontology • Not just explicit representation, in addition: shared

Semantic Web • The Semantic Web: shared conceptualization?

Semantic Web • The Semantic Web: shared conceptualization? (the linked data project)

Semantic Web • The Semantic Web: shared conceptualization • everything is connected • everything is referable (URIs) • distributed set of statements (ontologies) as a basis of our world model • ontology language(s): • tool for identifying resources • tool for stating facts about resources (=statements) • tool for sharing and integrating statements • tool for reasoning the data • -e.g. acquiring new statements with deductive reasoning • in SW world, term “ontology languages” refer to • RDF-based languages such as RDFS, OWL (and OWL2).

Ontology ”OWL Full can be viewed as an extension of RDF, while OWL Lite and OWL DL can be viewed as extensions of a restricted view of RDF Every OWL (Lite, DL, Full) document is an RDF document, and every RDF document is an OWL Full document, but only some RDF documents will be a legal OWL Lite or OWL DL document”*

RDF An example of RDF-data (in XML serialization) -person info <foaf:Person rdf:about="#me" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:name>Dan Brickley</foaf:name> <foaf:homepage rdf:resource="http://danbri.org/" /> <foaf:img rdf:resource="/images/me.jpg" /> </foaf:Person>

RDF An example of RDF-data (in TURTLE / TTL serialization) -person info <http://mynamespace.fi#me> rdf:type foaf:Person ; foaf:name ”Dan Brickley” ; foaf:homepage http://danbri.org ; foaf:img http//danbri.org/images/me.jpg .

RDF An example of RDF-data (in TURTLE / TTL serialization) -web page info @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix dc: <http://purl.org/dc/elements/1.1/#>. @prefix exterms: <hhttp://www.example.org/terms/>. <http://www.example.org/index.html> exterms:creation-date "August 16, 1999"; dc:language "en"; dc:creator <http://www.example.org/staffid/85740>.

RDF An example of RDF-data (graph representation) -web page info

RDF • An example of RDF-data (graph representation) • -web page info • The graph-like nature of the RDF -resources / objects are nodes -properties / attributes are edges* *properties are also resources (in the metalevel) and can be represented as a nodes in the graph (why is that?)

RDF • RDF (Resource Description Framework) is… • -a statement language (logics) • -a statement = triple • A triple has three parts: 1) subject, 2) predicate and 3) object • Example from Friend-Of-A-Friend schema (FOAF) <http://mynamespace.fi#me> rdf:type foaf:Person ; foaf:name ”Dan Brickley” ; foaf:homepage http://danbri.org ; foaf:img http//danbri.org/images/me.jpg . subject object predicate Triple says: ”me is a (type of) person” Triple says: ”me is called ”Dan Brickley” Triple says: ”me has a homepage danbri.org” The sets of triples forms a graph that interlinks resources with each other! (here: 4 triples, with subject #me)

The sets of triples forms a graph that interlinks resources with each other!

RDF • URI • in RDF, everything has a unique identifier, URI • Uniform Resource Identifier • URI is an URL without link: not always clickable • in SW, URLs can be and are utilized as a URIs • (don’t mix with URNs, IRIs or PURLs) <http://mynamespace.fi#me> rdf:type foaf:Person ; foaf:name ”Dan Brickley” ; foaf:homepage http://danbri.org ; foaf:img http//danbri.org/images/me.jpg . Dan Brickley is identified by http://mynamespace.fi#me foaf:name is an abbreviation for URI http://xmlns.com/foaf/0.1/ (a property defined in foaf-namespace)

RDF • URI • For consistency, URIs should not change often (or at all) • (should the URI change if the “identity” or “essence” of the resource changes?) • URI identifies an object, but that doesn’t mean that different URIs • refer to different resources: • in Web Ontology Language(OWL), we can state that two different URIs refer to the same object: • <rdf:Description rdf:about="#William_Jefferson_Clinton"> • <owl:sameAs rdf:resource="#BillClinton"/> • </rdf:Description> • (also the opposite is possible: we can state that two resources are distinct from each other)

RDFS • RDFS (Resource Description Framework Schema) • Divides the world into universals (classes) and particulars (individuals / instances)  TYPING • E.g. “Lassie is a dog” = • @prefix sws: <http://www.metropolia.fi/~ollial/2009/11#>. • <sws:lassie> • rdf:type sws:dog ; • foaf:name ”Lassie” . • Classes have subclasses: • <sws:dog> • rdf:type rdfs:Class ; • rdfs:subClassOf sws:animal; • (Transitive) reasoning in RDFS: • 1) Lassie is a dog • 2) Dog is a kind of animal _ •  Lassie is a kind of animal

OWL • OWL (Ontology Web Language) • Extends RDFS to express • relations between classes, between instances • property types: literal vs. objects •  literal property: foaf:name = “Olli” •  object property: foaf:knows http://someone/somewhere • Subtyping of properties reasoning (e.g. functional, transitive) • Computability / complexity levels for the model • Three sublanguages OWL-FULL, OWL-DL, OWL-LITE

Reasoning in Symbolic AI • (Theory behind) ontology languages are (more or less) based on the assumptions that: • Logic is expressive (as a natural language): We can model our domain / world by defining a set of statements that holds (in our world). (state of affairs is the main concern, objects are secondary) • Language corresponds the world: If we are using strong and expressive language, we can model in a deep way real world phenomena in a consistent way and assume that our model corresponds the world. • 3) Reason out the information: We can now deduce new (world) information (in the form of statements) by inferencing the set of statements.

Reasoning in Ontologies / open world • In addition to logic-as-a-language-correspondence-theories, the logic • behind ontologies follows the open-world semantics: • Our model may not contain all the relevant information • If something is stated, it is true, BUT • If something is not described, the machine don’t know the answer! • An example: • The statement in ontology: “Lassie is a dog” • A) The question: “Is Lassie a dog?” • Closed world semantics:  TRUE • Open world semantics:  TRUE • B) The question “Is Lassie a cat?” • Closed world semantics:  FALSE • Open world semantics:  Don’t know

Practical reasoning in Ontologies • We load our data (e.g. the XML file) to the reasoning machine (e.g. Jena). • We set the inference engine on, and also defineit’s level (e.g. reason out the transitive closures). • Now, we can ask statements from the model and get also the statements generated by the reasoner. • The data (1): • “Lassie is a dog”, “Dog is a mammal”, “Mammal is an animal” • Transitive closure inference (2): • -reason out the is-a –relations, if there are related instances, add the new facts for those instances. • The deduced data (3): • “Lassie is a dog”, “Dog is a mammal”, Mammal is an animal”, “Lassie is a mammal”, “Lassie is an animal”

Practical reasoning in Ontologies • OWL: reasoning with properties • Transitive properties: • P(x,y) AND P(y,z)  P(x,z) • An example: • locatedIn(Punavuori,Helsinki) AND locatedIn(Helsinki, Uusimaa) • locatedIn(Punavuori, Uusimaa) Symmetric properties: P(x,y)  P(y,x) An example: isFriendOf(Olli, Matti)  isFriendOf(Matti, Olli) Functional properties: • P(x,y) AND P(x,z)  y = z (~every object has it’s own unique value for P) • An example: • hasFather(Olli, Frank) AND hasFather(Olli, Paul)  Frank = Paul

Practical reasoning in Ontologies • OWL: reasoning with properties • Transitive properties: • P(x,y) AND P(y,z)  P(x,z) • An example: • locatedIn(Punavuori,Helsinki) AND locatedIn(Helsinki, Uusimaa) • locatedIn(Punavuori, Uusimaa) Symmetric properties: P(x,y)  P(y,x) An example: isFriendOf(Olli, Matti)  isFriendOf(Matti, Olli) Functional properties: • P(x,y) AND P(x,z)  y = z (~every object has it’s own unique value for P) • An example: • hasFather(Olli, Frank) AND hasFather(Olli, Paul)  Frank = Paul This means: We can define certain ”implication patterns” in our model and utilize them for processing data. Instead of having only the ”static” data, new data is generated based on the ”implications”.

Reasoning and processing data • In addition to the inferencing in the model, we can process • the data in more traditional ways: • Build a procedural program for processing data • Use specific rule-language for processing • Query the data by using specific RDF query language, e.g. SPARQL • (RQL, RUL, RDQL, …) • The best solution depends on the nature of the problem: • e.g. the inference engine reasoning is usually expensive / costly solution (=takes lot of time)

SparQL query language • SparQL: W3C recommendation • Current de facto query language for RDF • Quite same as SQL to relational databases: • SELECT, WHERE, ORDER BY (why the FROM is missing?) PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?mbox WHERE { ?x foaf:name ?name . ?x foaf:mbox ?mbox }

SparQL query language • SparQL: W3C recommendation • Current de facto query language for RDF • Quite same as SQL to relational databases: • SELECT, WHERE, ORDER BY (why the FROM is missing?) PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX ns: <http://example.org/ns#> SELECT ?title ?price WHERE { ?x ns:price ?price . FILTER (?price < 30.5) ?x dc:title ?title . }

SparQL query language • SparQL: why? • Clear representation for data queries (instead of coding by hand) • Good query engine implementation  fast data retrieval? • Implemented in many development libraries • What you cannot do with SparQL? • Update data? (extension: SparQL Update) • Do recursive queries: • “get all the superclasses of the dog” • (procedural example) • x = dog • While(x has superclasses) { • add superclass to resultset • x = superclass • }

Part 2

Outline: part 2

Data modeling for the Semantic Web • When modelling things in ontologies, we can use “object-oriented” approach: • Try to define the domain • Model objects that exist in the domain and the relations between the objects • In the modelling task, we are defining • The metadata schema as usual (~database schema / objects of the domain) • In addition, we should also define the ‘domain ontologies’ or ‘domain vocabularies’ we are using

Data modeling for the Semantic Web • metadata schema • Defines the primary objects (classes) to model: books, cars, persons, … • Defines the properties for objects: • title, author, edition, no of pages, ISBN, genre, … • Properties either have literal values or object values • Literal / DatatypeProperty: • name, title, street address, isbn, hasGenre(?) • Object property: • hasFriend, isLocated, hasAuthor, hasGenre(?) • For “similar” objects, you can use the inheritance (subclassing!) • woman is a person, person is an agent, agent is an entity…

Data modeling for the Semantic Web • metadata schema: defining properties for a class (in RDFS / OWL) • <myNS:book> • rdf:type owl:Class . • <myNS:title> • rdf:type owl:DatatypeProperty; • rdfs:domain myNS:book; • rdfs:range xsd:string. • <myNS:isbn> • rdf:type owl:DatatypeProperty; • rdfs:domain myNS:book; • rdfs:range xsd:string . • <myNS:author> • rdf:type owl:ObjectProperty; • rdfs:domain myNS:book; • rdfs:range myNS:author . class definition property definitions

Data modeling for the Semantic Web • metadata schema: defining properties for a class (in RDFS / OWL) • rdfs:domain •  the objects that have this property • rdfs:range •  the suitable values for the property • Ontology languages are “schemaless” in the sense that you • can assign any properties for any objects. (open world assumption) • Reasoning on the rdfs:domain: • <myNS:hasTail> • rdf:type owl:ObjectProperty ; • domain: myNS:donkey . • <myNS:matti> • rdf:type myNS:person ; • myNS:hasTail myNS:tail001 . <myNS:matti> rdf:type myNS:person ; rdf:type myNS:donkey ; myNS:hasTail myNS:tail001 .

Data modeling for the Semantic Web • metadata schema: defining properties for a class (in RDFS / OWL) • rdfs:domain •  the objects that have this property • rdfs:range •  the suitable values for the property • Ontology languages are “schemaless” in the sense that you • can assign any properties for any objects. (open world assumption) • Reasoning on the rdfs:domain: • <myNS:hasTail> • rdf:type owl:ObjectProperty ; • domain: myNS:donkey . • <myNS:matti> • rdf:type myNS:person ; • myNS:hasTail myNS:tail001 . ”if it has the tail, it is a donkey!” <myNS:matti> rdf:type myNS:person ; rdf:type myNS:donkey ; myNS:hasTail myNS:tail001 .

Data modeling for the Semantic Web • Domain vocabularies: reusing domain knowledge • In our schema, we can refer to “external” ontologies that • define some domain of discourse. • The idea: • you don’t have to reinvent the wheel • saves time and money • easy data integration (connected data) • (and you can always extend the domain vocabulary) • In practice: • 1) refer / fetch / download the ontology • 2) assign your schema properties (property range) to the values • 3) use the domain vocabulary to describe your resorces

Data modeling for the Semantic Web • Domain vocabularies: reusing domain knowledge • Case study: ONKI ontology service: www.yso.fi • User interface, web services for utilizing domain vocabularies

Data modeling for the Semantic Web

Data modeling for the Semantic Web • Domain vocabularies: reusing domain knowledge • Example domains: • Classification schemes • Geographical information (place+coordinate+relations) • YSO (General Finnish Upper Ontology – Yleinen Suomalainen Ontologia) • DB-pedia (information extracted from the Wikipedia) • Author databases (Getty ULAN)

Data modeling for the Semantic Web • Domain vocabularies: reusing domain knowledge • In addition to domain vocabularies, the reusage • of schema definitions is also encouraged! • Why? •  allow data integration based on the properties •  existing metadata schemas may provide, well-thinked, mature solutions for modelling • Example schemas: • Dublin Core, simple DC • SKOS (for thesauri and concept scheme modelling) • FOAF (Friend-of-a-Friend: social connections)

Processing the Ontology data • Although the RDF data may be initially distributed, (usually) it has to be stored in one place for reasoning / processing. •  ontology repositories, usually build on the RDMS. • (triple-stores, few big tables, attributes for subject, predicate and object) •  repositories are usually quite slow when compared to RDMS (WHY?) • The RDF data (graph data) is strongly interconnected, the whole model has to be in memory or in DB for processing. •  e.g. usually streaming / SAX-like processing is not possible • Many Semantic Web applications are concerned on processing or analyzing 1) subsumption hierarchies OR 2) connections between the resources

Towards Semantic Web engineering