510 likes | 658 Vues
Introduction to Semantic Web What? Why? How? So far? Next? . Frank van Harmelen AI Department Vrije Universiteit Amsterdam. Creative Commons License: allowed to share & remix, but must attribute & non-commercial. Who am I. Frank van Harmelen Prof in AI at Vrije Universiteit Amsterdam
 
                
                E N D
Introduction to Semantic WebWhat? Why? How? So far? Next? Frank van Harmelen AI Department Vrije Universiteit Amsterdam Creative Commons License: allowed to share & remix, but must attribute & non-commercial
Who am I • Frank van Harmelen • Prof in AI at Vrije Universiteit Amsterdam • Knowledge Representation • Early Semantic Web Projects (> 1999) • Co-designed OWL • Tech advisor of Aduna (Sesame) • Scientific Director of LarKC(Large Knowledge Collider) • I know nothing about image analysis…
Who are you? • who knows roughly what Semantic Web is? • who has heardof RDF & OWL? • who has studied RDF & OWL? • who has used RDF & OWL? • who expects ever to use RDF & OWL? • who is a logician • who is a KR researcher • who is a Web researcher • who is an imageresearcher
General idea of the Semantic Web
General idea of Semantic Web • Make current web more machine accessible(currently all the intelligence is in the user) • Motivating use-cases • search • personalisation • semantic linking • data integration • web services • ...
These are non-trivial design decisions. Alternative would be: General idea of Semantic Web Make current web more machine accessible(currently all the intelligence is in the user) Do this by: • Making data and meta-dataavailable on the Webin machine-understandable form (formalised) • Structure the data and meta-data in ontologies
What’s wrong with the Web? and another web page about Frank This page is about the Vrije Uniersitei a web page in English about Frank And this page is about LarKC And this page is about Stefano ? ? ? linked web-pages, written by people, written for people, used only by people... ? ? Many of these pages already come from data, usable by computers! linked data, usable by computers! useful for people! But we can’t link the data....
Semantic Web "Web of Data" (TBL) • expose data on the web (“facts”) in interoperable form (RDF) • expose knowledge on the webwith interoperable semantics (ontologies, RDF Schema, OWL) • Apply lightweight inference for • Interoperability • Query answering • Search • Unexpected reuse • …
Not just data,also knowledge • All of this: • Low expressivity logic (RDF) • That allows some inference:Property inheritance, domain/range inference • Some of this: • Medium expressive logic (OWL) • That allows more inference:(in)equality, number restrictions, datatypes
different owners & locations Desideratum:On the Web of Data, anyone can say anything about anything • Need for total decoupling of • data • vocabulary • meta-data [<x> IsOfType <T>] x T <village>
different owners & locations Two versions of Semantic Web story:  • V1: Semantic Web = annotated Web ;1 & 2 are embedded in text & images on the Web • V2: Semantic Web = Web of Data ;1 & 2 live in dedicated repositories (triple stores)   [<x> IsOfType <T>] x T <village>
alleviates <treatment> <name> <symptoms> <drug> IS-A <disease> <drugadministration> machine accessible meaning(What it’s like to be a machine) META-DATA
name symptoms disease drug administration What is meta-data? • it's just data • it's data describing other data • its' meant for machine consumption
Required are: • one or more standard vocabularies • so search engines, producers and consumersall speak the same language • a standard syntax, • so meta-data can be recognised as such • lots of resources with meta-data attached
Bluffer’s Guide to RDF & RDF Schema
Bluffer’s Guide to RDF • Express relations between things: • Results in labelled network (“graph”) • All labels are actually web-addresses (URIs) • You can “ping” any label and find out more • Bits of the graph can live at physically different locations & have different owners Predicate Object Subject AuthorOf Frank y publishedBy AuthorOf x MIT
Bluffer’s Guide to RDF Schema • types for subjects & objects & predicates • Types organised in a hierarchy • Inheritance of properties person artifact publisher book author man AuthorOf Frank y publishedBy AuthorOf x MIT
So what’s special about RDF(S)? • statements about an identifier can be distributed <owl:Individual ID="CENTRAL-COAST" /> <owl:Individual rdf:about="CENTRAL-COAST"> <type rdf:resource="#CALIFORNIA-REGION"/> </owl:Individual> • no unique name assumption • no closed world assumption Remember web-style decoupling
different owners & locations Remember: • Need for total decoupling of • data • vocabulary • meta-data [<x> IsOfType <T>] x T <village>
RDF(S) have a (very small) formal semantics • Defines what other statements are implied by a given set of RDF(S) statements • Ensures mutual agreement on minimal contentbetween parties without further contact • In the form of “entailment rules” • Very simple to compute(and not explosive in practice)
RDF(S) semantics: examples • Aspirin isOfType PainkillerPainkiller subClassOf Drug Aspirin isOfType Drug • aspirin alleviates headachealleviates range symptom  headache isOfType symptom
RDF(S) semantics: examples • AspirinisOfTypePainkillerPainkillersubClassOfDrug AspirinisOfTypeDrug • aspirin alleviates headachetreatsrangesymptom headacheisOfTypesymptom
RDF(S) semantics • X R Y + R domain T  X IsOfType T • X R Y + R range T  Y IsOfType T • T1 SubClassOf T2 +T2 SubClassOf T3  T1 SubClassOf T3 • X IsOfType T1 +T1 SubClassOf T2  X IsOfType T1 Semantics = predictable inference
OWL: things RDF Schema can’t do • equality • enumeration • number restrictions • Single-valued/multi-valued • Optional/required values • inverse, symmetric, transitive • boolean algebra • Union, complement • …
Layered language • OWL Lite: • Classification hierarchy • Simple constraints • OWL DL: • Maximal expressiveness • While maintaining tractability • Standard formalisation • OWL Full: • Very high expressiveness • Loosing tractability • Non-standard formalisation • All syntactic freedom of RDF(self-modifying) Full DL Lite Syntactic layering Semantic layering
OWL Light • (sub)classes, individuals • (sub)properties, domain, range • conjunction • (in)equality • cardinality 0/1 • datatypes • inverse, transitive, symmetric • hasValue • someValuesFrom • allValuesFrom RDF Schema • OWL Full • Allow meta-classes etc • OWL DL • Negation • Disjunction • Full Cardinality • Enumerated types Language Layers Full DL Lite
Backward compatibility with RDF <owl:Class rdf:ID="City"> <rdfs:subClassOf rdf:resource="#GeographicEntity"/> <rdfs:subClassOf> <owl:Restriction> <owl:onPropertyrdf:resource="#ruler"/> <owl:allValuesFromrdf:resource="#Mayor"/> </owl:Restriction> </rdfs:subClassOf> </owl:Class> • OWL agents understand everything…
Backward compatibility with RDF <owl:Class rdf:ID="City"> <rdfs:subClassOf rdf:resource="#GeographicEntity"/> <daml:subClassOf> <daml:Restriction> <daml:onPropertyrdf:resource="#ruler"/> <daml:toClassrdf:resource="#Mayor"/> </daml:Restriction> </daml:subClassOf> </owl:Class> • OWL agents understand everything… … others still the most important aspects
OWL also has a formal semantics • Defines what other statements are implied by a given set of statements • Ensures mutual agreement on content(both minimal and maximal)between parties without further contact • Can be used for integrity/consistency checking • Hard to compute (and rarely/sometime/always explosive in practice)
OWL semantics: minimal • vanGogh isOfType ImpressionistImpressionist subClassOf Painter vanGogh isOfType Painter • vanGogh painter-of sunflowerspainter-of domain painter vanGogh isOfType painter
OWL semantics: maximal • vanGogh isOfType ImpressionistImpressionist disjointFrom Cubist NOT: vanGogh isOfType Cubist • painted-by has-cardinality 1sun-flowers painted-by vanGoghPicasso different-individual-from vanGogh NOT: sun-flowers painted-by Picasso
Remember: Require are • standard vocabularies • a standard syntax, • lots of resources with meta-data attached
Ontologies: real life examples • handcrafted • music: CDnow(2410/5), MusicMoz(1073/7) • biomedical: SNOMED (200k), GO(15k), Emtree(45k+190kSystems biology • ranging from lightweight • Yahoo, UNSPC, Open directory (400k) to heavyweight (Cyc (300k)) • ranging from small (METAR) to large (UNSPC)
Biomedical ontologies (a few..) • Mesh • Medical Subject Headings, National Library of Medicine • 22.000 descriptions • EMTREE • Commercial Elsevier, Drugs and diseases • 45.000 terms, 190.000 synonyms • UMLS • Integrates 100 different vocabularies • SNOMED • 200.000 concepts, College of American Pathologists • Gene Ontology • 15.000 terms in molecular biology • NCBI Cancer Ontology: • 17,000 classes (about 1M definitions),
Remember: Require are • standard vocabularies • a standard syntax, • lots of resources with meta-data attached
Who makes the meta-data? • Don’t throw away what we already have: • Databases (Amazon.com) • Navigation structures • meta-data in documents • Office, Acrobat, MP3, jpg • As spin-off on what we already do • MIT Media Lab photo annotator • Automated analysis • Text, Images, Video
Linked Data/Semantic Web • Identification • Uniform Resource Identifier (URI) • Global identifier (NB: persistent!) • Looks like a URL, is often and internationalized Resource Identifier (IRI) • Description • Resource DescriptionFramework (RDF) • RDF Schema (RDFS) • SimpleKnowledgeOrganization System (SKOS) • Web OntologyLanguage (OWL) • Querying • RDF Triple stores • SPARQL Query Language
Hoe ziet RDF eruit? • Datamodel is een (directed) graph • Elk data-item is een ‘resource’ met een URI als identifier • Elke eigenschap is een binaire relatie: • ‘triple’ • Tussen resources: <subjectURI, predicateURI, objectURI> • Tussen een resource en een ‘literal’ <subjectURI, predicateURI, “literal value”>
Why is this a Web of data? • Global unique identifiers • Reuse of identifiers in other datasets • For data:(two sources say something about over ‘Amsterdam’ ) • For schema:(two sources each use the same concept ‘City’) • This reuse builds “links” between datasets
Linked Open Data cloud already many billions of facts & rules any CD ever recorded (almost) life-science databases basic facts on every country on the planet hierarchical dictionaries (UK, FR, NL) common sense rules & facts (100.000’s) May ‘09 estimate > 4.2 billion triples + 140 million interlinks scientific bibliographies names of artists & art works (10.000’s) Geographic names (millions) Encyclopedia It gets bigger every month
And remember:not just data • All of this: • Low expressivity logic (RDF/RDFS) • That allows some inference:Property inheritance, domain/range inference • Some of this: • Medium expressive logic (OWL) • That allows more inference:(in)equality, number restrictions, datatypes
Semantic Web News Quiz • Google • Reuters • New York Times • Microsoft • Zemanta • Obama Government • BBC (music, worldcup, wildlife) • BestBuy.com • Facebook