170 likes | 303 Vues
Logics for Data and Knowledge Representation. SPARQL Protocol and RDF Query Language (SPARQL). Feroz Farazi. SPARQL. A language for expressing queries to retrieve information from various data represented in RDF [ SPARQL Spec. ]
E N D
Logics for Data and Knowledge Representation SPARQL Protocol and RDF Query Language (SPARQL) Feroz Farazi
SPARQL • A language for expressing queries to retrieve information from various data represented in RDF [SPARQL Spec.] • A query language with the capability to search graph patterns [SPARQL Spec.] • Often SPARQL queries contain • a basic graph pattern: a set of subject, object, predicate triple patterns • RDF terms possibly substituted with variables • Result of the query • a subgraph of the RDF data graph
Terminologies • RDF Terms: Given that I is the set of all IRIs, L is the set of all RDF literals and B is the set of all blank nodes in an RDF graph. Within the graph the set of all RDF Terms, T = I U L U B • RDF Dataset: D = {G, (I1, G1), (I2, G2),…(Ik, Gk)}, where G is the default graph (Ii, Gi) are named graphs and i = 1 to k • An RDF dataset always containsa default graph, which does not have a name • It contains zero or more named graphs • Each named graph is identified by a URI
Terminologies • Query Variable: A query variable, v ∈V, where V is infinite and V ∩ T = ∅ • Triple Pattern: A triple pattern P ∈ {(T U V) x (I U V) x (T U V)} • Solution Mapping: A solution mapping is a partial functionM:V -> T where V is the query variable and T is the set of all RDF Terms • Solution Sequence: A list of solutions which might be unordered. Number of solutions might be zero, one or more.
Terminologies • Solution Sequence Modifier: (i) Ordery By (ii) Projection (iii) Distinct (iv) Reduced (v) Offset (vi) Limit • Others: IRI (Internationalized Resource Identifier), Lexical form, language tag (e.g., en, it), datatype IRI (e.g., xsd:boolean) • IRIs and URIs • URIs include a subset of the ASCII character set • IRIs can include Unicode characters (Universal Character Set) • ASK: to perform a test to know if a query expression has a solution. It replies with yes/no.
Query SELECT query form returns RDF Terms bound to the variables title • Dataset: paper1: :creator “Fausto Giunchiglia” Query Expression: SELECT ?author WHERE { :paper1 :title ?author. } Query Result: “Fausto Giunchiglia” • Dataset: paper1: :title “Semantic Matching” Query Expression: SELECT ?title WHERE { :paper1 :title ?title. } Query Result: “Semantic Matching”
Query Multiple Matches name homepage Tim Berners-Lee <http://www.w3.org/People/Berners-Lee/> Fausto Giunchiglia <http://disi.unitn.it/~fausto/> • Dataset: _:a :name "Tim Berners-Lee" . _:a :homepage <http://www.w3.org/People/Berners-Lee/> . _:b :name "Fausto Giunchiglia" . _:b :homepage <http://disi.unitn.it/~fausto/> . Query Expression: SELECT ?name ?homepage WHERE { ?x :name ?name . ?x :homepage ?homepage } Query Result:
Query RDF Literals Matching This query has 0 solution because without language tag the search element does not match with dataset element This query has 1 solution because the inclusion of language tag bound u to :x • Dataset: :x :name "Tim Berners-Lee"@en . :y :name "Fausto Giunchiglia"@en. Query Expression 1: SELECT ?u WHERE { ?u :name "Tim Berners-Lee"} Query Result: u Query Expression 2: SELECT ?u WHERE { ?u :name "Tim Berners-Lee"@en} Query Result: u :x
Building RDF Graphs • CONSTRUCT: this query construct returns an RDF graph • Dataset: _:a :creator "Tim Berners-Lee" . _:b :creator "Fausto Giunchiglia" . Query Expression: CONSTRUCT { ?x :name ?name } WHERE { ?x :creator ?name } Query Result: _:c :name "Tim Berners-Lee" . _:d :name "Fausto Giunchiglia" . • In this dataset with :creator we mean Dublin Core (dc) creator metadata • In the query with :name we mean FOAF name metadata • We built a graph with FOAF name attribute which was not available in the source dataset
RDF Term Restrictions Query Expression: SELECT ?author ?age WHERE { ?x :creator ?author. ?x :age ?age FILTER (?age >52) } Query Result: author age "Fausto Giunchiglia" 53 • FILTER: solutions are restricted to those RDF Terms which match with the filter expression • Dataset: _:a :creator "Tim Berners-Lee" . _:a :age 52 . _:b :creator "Fausto Giunchiglia" . _:b :age 53. Query Expression: SELECT ?author WHERE { ?x :creator ?author. FILTER regex(?author, "Tim") } Query Result: author "Tim Berners-Lee" . • The above query can be made case insensitive by adding “i” flag in the filter as follows: FILTER regex(?author, “tim”, “i”)
Querying Optional Pattern • OPTIONAL: to allow binding variables to RDF Terms to be included in the solution in case of availability • Dataset: _:a :creator "Tim Berners-Lee" . _:a :age 52 . _:a :homepage <http://www.w3.org/People/Berners-Lee/> . _:b :creator "Fausto Giunchiglia" . _:b :age 53. Query Expression: SELECT ?author ?homepage WHERE { ?x :creator ?author. OPTIONAL {?x :homepage ?homepage}} Query Result: author homepage "Tim Berners-Lee" <http://www.w3.org/People/Berners-Lee/> "Fausto Giunchiglia" • It is a left associative operator • Why do we need it? All entities might not have the same set of attributes
ORDER BY Clause • ORDER BY: a facility to order a solution sequence • Dataset: _:a :creator "Tim Berners-Lee" . _:a :age 52 . _:b :creator "Fausto Giunchiglia" . _:b :age 53. Query Expression: SELECT ?author WHERE { ?x :creator ?author; ?x :age ?age} ORDER BY ?author DESC (?age) Query Result: author "Fausto Giunchiglia" "Tim Berners-Lee"
DISTINCT and REDUCED Modifiers • DISTINCT: to remove duplicate from a solution sequence Dataset: _:b :creator "Fausto Giunchiglia" . _:b :age 53. _:c :creator "Fausto Giunchiglia" . _:c :age 37. Query Expression: SELECT DISTINCT ?creator WHERE { ?x :creator ?creator} Query Result: creator "Fausto Giunchiglia" • REDUCED: to permit the duplicates to be removed. Query Expression: SELECT REDUCED ?creator WHERE { ?x :creator ?creator} The cardinality of the elements in the solution set is at least one and no more than the cardinality without removing duplicates
OFFSET and LIMIT Clauses • Query Expression: • SELECT ?author • WHERE { ?x :creator ?author } • ORDER BY ?author • LIMIT 1 • OFFSET 1 • Query Result: • author • "Tim Berners-Lee" • OFFSET: to show the elements of the solution set starting after a specified number. If the number is zero, there will be no effect. Dataset: _:b :creator "Fausto Giunchiglia" . _:b :age 53. _:c :creator "Tim Berners-Lee" . _:c :age 52. Query Expression: SELECT ?author WHERE { ?x :creator ?author } ORDER BY ?author OFFSET 1 Query Result: author "Tim Berners-Lee" • Limit: to put an upper bound on the number of elements of the solution set returned
Relational vs RDF queries [D. Allemang and J. Hendler, 2008] • Relational queries consist of (among others): • Relational algebra of joins • Foreign key references • RDF queries consists of (among others): • (Logical) statements in triple form • Unification variables are used to connect graph patterns • A relational query: • Produces a new database table that is a combination of two or more input tables (partially or completely) • An RDF query: • Produces a subset of the input RDF graph • Simplifies some issues of table based queries, for example, no need to put subquery construct
References • SPARQL Spec. (2008). W3C Recommendation. • D. Allemang and J. Hendler. Semantic web for the working ontologist: modeling in RDF, RDFS and OWL.Morgan Kaufmann Elsevier, Amsterdam, NL, 2008. • R. de Virgilio, F. Giunchiglia and Letizia Tanca. Semantic Web Information management, a model based perspective. Springer, 2009.