RDF

RDF Part II

Reification • We can make statements about the RDF statements themselves. This can be used to annotate information • In science, it is common to quote someone, or provide provenance or date stamp information, like who conducted certain experiment or simulation, and when it was done • Explicit reification, which is used in database modeling, is also used in RDF to write more sophisticated statements about other statements using built-in vocabulary • This is done by first making a reified model of the statement, with type, subject, predicate, and object properties • We make a new resource to represent the entire statement

RDF Reification vocabulary • Reification is done in RDF by using the following qualified names to annotate the statement: rdf : Statement(resources that are statement), and rdf : subject,rdf : predicate, and rdf : object properties • For example, if we want to say that “Bill Fritz says that Dinwoody Formation formed in the Triassic”, we do it by first assigning a qualified name to the statement, such as q : n1, and then use it in the reification quad statements: q : n1 rdf : type rdf : Statement; rdf : subject strat : Dinwoody; rdf : predicate strat : formed-in; rdf : object time : Triassic. Person : Bill Fritz s : says q : n1 i.e., the statement q, which is an rdf statement, the subject, predicate, and objects of which are given by the three qualified names, and that Dr. Fritz made this statement. This statement is using a bnode.

rdf : Statement rdf : type rdf : subject strat : Dinwoody rdf : predicate strat : formed-in attributed-to says rdf : object Bill Fritz time : Triassic

Alternative way to reify it • Bill Fritz says that Dinwoody Formation formed in the Triassic Fritz says S S rdf:typerdfs:Statement S rdf:subjectDinwoodyFormation S rdf:predicateformedIn S rdf:objectTriassic

SPARQL • SPARQL (pronounced sparkle) is the standard RDF query language • SPARQL uses variablesfor the subject, predicate, and object of an RDF triple • The queries are made of parts called ’triple pattern”, which has variables represented by a letter preceded by a question mark (?), e.g., ?x.

SPARQL Queries, Example • Which epoch precedes Miocene (Oligocene) ?x time : precedes time : Miocene. • Which minerals are part-of granite (quartz, feldspars, micas) petr : Mineral ?y petr : Granite. • Pollutant pollute which aquifer? hydro : Pollutant hydro : pollute ?z. • The SPARQL engine needs the ontologies (in this case, Time, Petrology, and Hydrogeology) to return the associated responses to these queries

Graph Pattern Query • A graph pattern query (given within{}braces) is the one with a set of triple patterns. • For example the following two triples: • Which orogeny deformed (tect: namespace) the Tertiary system (strat: namespace)? • Zagros orogeny (tect: namespace) formed (strat: namespace) which mountain range? • The set of two triples are given in N3 as: {?orogenytect : deformed strat : TertiarySystem tect : ZagrosOrogenystruc : formed ?MtRange} • For these queries to work, all the triple patterns must match the nodes and edges of the ontologies in these namespaces!

Inferencing • The Semantic Web languages allow explicit expression of the relationship between classes of objects strat: Triassic partOf strat: Mesozoic • Compared to databases, which require programming to drive data from complex hierarchical structures, these languages allow smarter integration and connection of data, making it easier to query and use the data

What is Inferencing? • The Semantic Web languages provide ‘inferencing’, meaning that we can derive other related [unstated] information from a set of stated information • The mechanisms for inference are provided in the language constructs, like rdfs:subclassOf, which make ‘inference-based semantics’ possible • Through inferencing, we should be able to query a broader (general) term (e.g., Fault Rock) and get information about their narrower (specialized) subclass terms that extend it, e.g., Mylonite subClassOfFaultRock If we know FaultRockisA Rock, and Rock is Solid, and Solid isNot Liquid, then we can infer that Mylonite is Solid, and MyloniteisNotliquid. Note: isNot is modeled by saying that Liquid disjointWith Solid

… • The Web Ontology Language (OWL) provides formal meaning to its constructs such as rdfs: Class and rdfs : subClassOf • It is inferred from the language that: if C is a subClassOf C’, then every member x of class C is also a member of class C’ • For example, if the Idaho batholith is a Batholith, and Batholith rdfs: subClassOfIgneousBody, then IdahoBatholithsrdfs:subclassOfIgneousBody • So, if we search for igneous bodies in general, we may be offered information about the narrower Batholith term, and data about the Idaho batholiths may be provided C’ y C x

Type Propagation Rule • The ‘type propagation rule’ gives the definition of the meaning of the C subClassOf C’ statement: IF?C rdfs : subClassOf ?C’.AND?x rdf : type ?C.THEN?x rdf :type ?C’. • if C isA C’, and x is an instance of C, then x is an instance of C’. C’ y C x

Example for inference • If all porphyritic textures are igneous texture, and all igneous textures are texture, and the individual texture1 is porphyritic: • Applying predicate logic: • If x is porphyritic texture, then x is igneous texture PorphyriticTexture (x) IgneousTexture (x) • If x is igneous texture, then x is texture IgneousTexture (x)  Texture (x) Given the following two instances: IgneousTexture (IgneousTexture1) and PorphyriticTexture (PorphyriticTexture1) Then we infer the following unasserted facts:IgneousTexture (PorphyriticTexture1) Texture (IgneousTexture1) Texture (PorphyriticTexture1) Texture IgneousTexture IgneousTexture1 PorphyriticTexture PorphyriticTexture1

B C Multiple Subclassing • The Web Ontology Language (OWL), and its sub-languages(RDF and RDFS), provide formal constraint for the meaning of theirconstructs to make inferencing from combinations of terms possible • Like object-oriented programming (OOP) languages, multiple subclassing(inheritance) exists in RDFS • If A subClassOf B and A subClassOf C, then if x is an instance (individual) of A, thenx is instances of both B and C(which follows from the type propagation rule) A x Brittle Ductile Semibrittle x

Benefits of Inference Rules • This inference-based semantics is very powerful for the integration of heterogeneous data provided from autonomous, distributed sources on the Web, and making the distributed data useful • The reason why inference rules make data, which are constrained by the OWL constructs, more useful, is that RDFS and OWL inferencing query engines, that know OWL inference rules, will infer (during a query) unasserted information from the directly asserted triples in the RDF store

Assume the triple store contains two asserted RDF triples Rock struc : FaultRockrdfs : subClassOfpetr : Rock struc : Mylonite rdf : type struc : FaultRock • Suppose the following SPARQL code queries thetriple store, and wants to find out about things that are of type Rock, which is defined in the ‘petr’ namespace ?x rdf : type petr : Rock . • Despite the fact that there is no triple for thestruc:Mylonitesubject, with predicate rdf:type and object petr:Rockin the above asserted triples, the query will return (in addition to the started ?x = struc : FaultRock) the following inferred result using the rdfs inference query engine: ?x = struc : Mylonite FaultRock Mylonite

Inferred Triples • Inference engines, applying their set of inference rules return unasserted, inferred triples from asserted triples • The inferred triples may or may not be saved in the triple store, and may be generated only at the time of querying

Example • The following diagram shows the hierarchy of the pyroxene minerals in the min : Mineralogy ontology • This means that Diopside isA Pyroxene, and Pyroxene isA Silicate, and Silicate isA Mineral

Inferred Triples • Given the following asserted triples: min : Diopside rdf : type min : Pyroxenemin : Pyroxene rdf : type min : Silicatemin : Silicate rdf : type min : Mineral • We can derive the following inferred triples using the type propagation rule on the asserted triples: min : pyroxene rdf : type min : Mineral min : diopside rdf : type min : Silicate min : diopside rdf : type min : Mineral

RDF and Relational Database • Every statement in RDF is like a value in a cell of a database table which requires three values for its complete representation: Table • a row identifier (subject, s) • a column identifier (predicate, p) • the value in each table cell (object, o) • Note: for a 3x3 table, we have 9 triples! • Recall that we refer to the ‘subject-predicate-object’ statement as a ‘triple’

Triples: Building blocks for RDF • Subject (S) is the thing for which we are making the statement. In this case it is the record, i.e., row • Predicate (P) is the property for the subjectentity in the row • In this case it is the column or field • Object (O) is the value for the property at the cell

Data Federation • RDF is designed for data federation of any kind (database, spreadsheet, XML), originated from multiple sources • These data can be converted into a set of triples and put in the RDF data store (federated graph), ready to be queried • In the RDF triple: ‘Course instructor Babaie’,course is the subject, instructor is the predicate, and Babaie is the value for the instructor: Subject Predicate Object instructor Course Babaie

Directed Graph • An RDF store commonly has morethan one triple referring to the same subject (S), i.e., 1 s, many o’s • The picture is shown for one row only! • This translates to one row, (i.e., record)of a relational database table with multiple fields (columns) • This leads to the ‘directed graph’, which shows triples as ‘edges’ (labeled by predicates) radiating from one subject ‘node‘ to different object nodes p1 o1 p2 s o2 p3 o3

Sample Table p1 p2 p3 S1 Directed Graph only shown for N235 S2 basalt lithology takes purpose K-Ar dating SampleIDN235 Investigator type powder

URI (Uniform Resource Identifier) • Merging a distributed group of directed groups requires mapping nodes in each graph • Even if nodes in different graphs have the same name, it is not guaranteed that the nodes are from the same resource! • To make matching of the nodes possible, we need to use the URI(Uniform Resource Identifier), which is a superclass of the URL (every URL is a URI, but not the other around). • A URI is a global identifier for a resource (has information about server name, protocol, port number, file name) which is required for a global networking • URI refers to either a Web name or a location, compared to the URL which only refers to a Web location

URI Prefix • Nodes from two graphs can be merged if they have the same URI • We use a prefix to represent the long URI strings, e.g., ‘geochem’ and ‘struc’ can represent the Geochemistry and structural geology prefixes which may have a URI: http://www.usgs.org/ontologies/Geochemistry.owl# http://www.usgs.org/ontologies/StructuralGeology.owl# • If the Geochemistry or Structural Geology ontology has a class called Analysis or Foliation, respectively, we designate them as: geochem : Analysis struc : Foliation

Default Namespace • If there is only one (default) namespace, we show the class name with a colon followed by the class name (e.g., : Fracture). • OWL, RDF, RDFS, and XSD have their own standard namespace • Thus, rdf : type is a typing construct in the rdfnamespace. Here are some more: struc : Fold rdf : type struc : Structure geochem : oxidize rdf : type rdf : Property

Relational database tables and RDF Record Record • Rows in a relational table represent a single record • Each record maps to an individual entity • This means that each row should have a unique URI, which in the database is represented by the unique identifier (ID column, the primary key)

Geochem : Sample Relational Database to RDF Graph • The best practice is to design a URI for the table, with a prefix: xmlns : geochem = http://www.gsi.ir/ontologies/geochemistry.owl#Sample • We identify each row by concatenating the table name (Sample) with the ID of each row, for example, geochem : Sample1, geochem : Sample2, etc. • To make the fields also unique, we concatenate the table name (Sample) with the column name, like: geochem : Sample_lithology, geochem : Sample_type, geochem : Sample_purpose

Example for RDB to RDF • Notice that, during conversion of a relational table to RDF, each cell in the table converts into one RDF triple • In the table in the next slide, we have:7 rows and 5 columns, which lead to 35 triples • Note: Only triples for two samples are shown!

Geochem : Sample Geochem:SampleNumber Geochem:Sample location Geochem:Sample analysis Geochem:Sample lithology Geochem:Sample1 Geochem:Sample2 Geochem:Sample3 Geochem:Sample4 Geochem:Sample5 Geochem:Sample6 Geochem:Sample7

Relational database (RDB) to RDF • Fields (columns) of the table become properties (predicate): geochem : Sample_number geochem : Sample_location etc. • Each row provides the subject, for example, geochem : Sample1 geochem : Sample2 etc. • The following table shows part of the RDF graph of the previous Sample table in the Geochemistry database:

RDF triples for the Sample table in the Geochemistry database (only 2 samples shown!)

In this case, the objects are not class (object) resources. Here they are literal values (i.e., string). The type for each individual (i.e., each row) is the table (in this case, Sample). These types are also given in the RDF graph.

RDF

RDF

Presentation Transcript

RDF Containers

RDF Gravity

Practical RDF Chapter 10. Querying RDF: RDF as Data

RDF Next

DDI-RDF

RDF Briefing

RDF, RDF, RDF….

RDF

RDF Tools

RDF Tutorial

RDF

Practical RDF Ch.10 Querying RDF: RDF as Data

Graphically Querying RDF Using RDF-GL

RDF Schema

RDF

XML/RDF

Understanding RDF

Understanding RDF

RDF