Semantic Web Query Processing with Relational Databases

Semantic Web Query Processing with Relational Databases Artem Chebotko artem@cs.wayne.edu Department of Computer Science Wayne State University

Outline • The Semantic Web • RDF • SPARQL • Relational Storage of RDF data • SPARQL-to-SQL Translation • Relational Nested Optional Join

My Web page as seen by a Human

My Web page as seen by a Computer

My Web page with Semantics <foaf:Person rdf:nodeID=“http://www.cs.wayne.edu/~artem/ID"> <foaf:name>Artem Chebotko</foaf:name> <foaf:homepage rdf:resource="http://www.cs.wayne.edu/~artem" /> <foaf:img rdf:resource="http://www.cs.wayne.edu/~artem/main/welcome/welcome.jpg" /> <foaf:workplaceHomepage rdf:resource="http://www.cs.wayne.edu"/> </foaf:Person>

The Semantic Web • A Web of data (vs. a Web of documents) • … machine-processable/readable data • Framework for integration and combination of data from various sources • Data reuse across application, organization, and community boundaries

The Semantic Web “Stack”

RDF • RDF (Resource Description Framework) provides a common framework for representing resources and relations among them. Anything can be a resource (e.g., a person, a file, etc). • RDF provides a data model and a syntax <foaf:Person rdf:nodeID=“http://www.cs.wayne.edu/~artem/ID"> <foaf:name>Artem Chebotko</foaf:name> <foaf:homepage rdf:resource="http://www.cs.wayne.edu/~artem" /> <foaf:img rdf:resource="http://www.cs.wayne.edu/~artem/main/welcome/welcome.jpg" /> <foaf:workplaceHomepage rdf:resource="http://www.cs.wayne.edu"/> </foaf:Person>

RDF Model • RDF statement is a triple that consists of a subject, a predicate, and an object. • foaf="http://xmlns.com/foaf/0.1/" <foaf:Person rdf:nodeID=“http://www.cs.wayne.edu/~artem/ID"> <foaf:name>Artem Chebotko</foaf:name> <foaf:homepage rdf:resource="http://www.cs.wayne.edu/~artem" /> <foaf:img rdf:resource="http://www.cs.wayne.edu/~artem/main/welcome/welcome.jpg" /> <foaf:workplaceHomepage rdf:resource="http://www.cs.wayne.edu"/> </foaf:Person>

RDF Model • RDF’s graph model: RDF models statements as nodes and edges in a graph. http://www.cs.wayne.edu/~artem/ID foaf:name foaf:workplaceHomepage foaf:img foaf:homepage Artem Chebotko http://www.cs.wayne.edu http://www.cs.wayne.edu/~artem http://www.cs.wayne.edu/~artem/main/welcome/welcome.jpg

SPARQL • SPARQL is an RDF query language • Graph pattern matching • Basic graph patterns, optional graph patterns, etc. Query 1: Find the homepage URL of Artem Chebotko PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?urlFROM <my-foaf-data.rdf> WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . } Result 1: ?url is bound to the value “http://www.cs.wayne.edu/~artem”

SPARQL Query 2: Find both the homepage and weblog of Artem Chebotko PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?url ?log FROM <my-foaf-data.rdf> WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . ?someone foaf:weblog ?log .} Result 2: ?url and ?log are unbound

SPARQL Query 3: Find (1) the homepage of Artem Chebotko and (2) his weblog if this information is available PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?url ?log FROM <my-foaf-data.rdf> WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . OPTIONAL { ?someone foaf:weblog ?log .} } Result 3: ?url is bound to “http://www.cs.wayne.edu/~artem” and ?log is unbound

SPARQL • Basic semantics of OPTIONAL patterns • The evaluation of an OPTIONAL clause is not obligated to succeed, and in the case of failure, no value will be returned for those unbound variables in the SELECT clause. • Semantics of shared variables • In general, shared variables must be bound to the same values. Variables can be shared among subjects, predicates, objects, and across each other. • More complicated semantics follows …

SPARQL • Semantics of parallel OPTIONAL patterns • While the failure of the evaluation of an OPTIONAL clause does not block the evaluation of a following parallel OPTIONAL clause, the success of the evaluation of an OPTIONAL clause obligates the same variables in the following parallel OPTIONAL clauses to be bound to the same values.

SPARQL Query 4: Find (1) the homepage of Artem Chebotko and (2) his weblog if this information is available (3) his workplace homepage if this information is available PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?url ?log ?work FROM <my-foaf-data.rdf> WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . OPTIONAL { ?someone foaf:weblog ?log .} OPTIONAL { ?someone foaf:workplaceHomepage ?work .} } OPTIONAL { ?someone foaf:workplaceHomepage ?log .} Result 4: What if …

SPARQL • Semantics of nested OPTIONAL patterns • Before an OPTIONAL clause is evaluated, all containing basic graph patterns or OPTIONAL clauses must have succeeded.

SPARQL Query 5: Find (1) the homepage of Artem Chebotko and (2) his weblog if this information is available (3) his workplace homepage if this information is available and weblog is available PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?url ?log ?work FROM <my-foaf-data.rdf> WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . OPTIONAL { ?someone foaf:weblog ?log . OPTIONAL { ?someone foaf:workplaceHomepage ?work .} } } Result 5: ?url is bound to “http://www.cs.wayne.edu/~artem” and ?log is unbound

Relational Storage of RDF data • Increasing amount of RDF data on the Web highlights the need for its efficient and effective management. • Using relational database technology as a basis for storing and querying RDF data is a reasonable choice as this technology is well understood and known to have good performance.

Relational Storage of RDF data • The simplest one Table Triples • More complicated (and more efficient) storage schemas are possible

SPARQL-to-SQL Translation • Problem: Relational databases “know” SQL, but not SPARQL • Solution: translate SPARQL queries into equivalent SQL queries in order to access RDF data stored in a relational database • Algorithm BGPtoSQL to translate a SPARQL basic graph pattern to its SQL equivalent • Algorithm SPARQLtoSQL to translate SPARQL queries with arbitrary complex optional graph patterns

BGPtoSQL • Basic idea: • Step 1: • Assign a unique table alias to every triple pattern • E.g., t1 and t2 • Construct the FROM clause to contain all the table aliases WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . } FROMTriples t1, Triples t2

BGPtoSQL • Step 2: • Construct the SELECT clause to contain every relational attribute that corresponds to a distinct variable WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage?url . } SELECT t1.subject AS someone, t2.object AS url FROMTriples t1, Triples t2

BGPtoSQL • Step 3: • Construct the WHERE clause to restrict attribute values to the corresponding URIs and Literals WHERE { ?someonefoaf:name“Artem Chebotko” . ?someonefoaf:homepage ?url . } SELECT t1.subject AS someone, t2.object AS url FROMTriples t1, Triples t2 WHERE t1.predicate = ‘foaf:name’ AND t1.object = ‘Artem Chebotko’ AND t2.predicate = ‘foaf:homepage’

BGPtoSQL • Step 4: • Create an inverted list for variables • Finish the WHERE clause: attributes that correspond to shared variables must have same values) WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage?url . } SELECT t1.subject AS someone, t2.object AS url FROMTriples t1, Triples t2 WHERE t1.predicate = ‘foaf:name’ AND t1.object = ‘Artem Chebotko’ AND t2.predicate = ‘foaf:homepage’ AND t1.subject = t2.subject

SPARQLtoSQL • Step 1: • Translate all BGPs to SQL with BGPtoSQL. • E.g., q1, q2, q3, q4 SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . OPTIONAL { ?someone foaf:weblog ?log . OPTIONAL { ?url foaf:topic ?topic .} } OPTIONAL { ?someonehttp://www.example.org/blog ?log .} }

SPARQLtoSQL • Step 2: • Join the ‘relations’ (q1, q2, q3, q4) in the order as their corresponding graph patterns appear in the query • LEFT OUTER JOIN SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . OPTIONAL { ?someone foaf:weblog ?log . OPTIONAL { ?url foaf:topic ?topic .} } OPTIONAL { ?someonehttp://www.example.org/blog ?log .} } Q = SELECT r1.someone AS someone, r1.url AS url, r2.log AS log FROM(q1) r1 LEFT OUTER JOIN (q2) r2 ON (r1.someone = r2.someone)

SPARQLtoSQL SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . OPTIONAL { ?someone foaf:weblog ?log . OPTIONAL { ?url foaf:topic ?topic .} } OPTIONAL { ?someonehttp://www.example.org/blog ?log .} } Q = SELECT r11.someone AS someone, r11.url AS url, r11.log AS log, r22.topic AS topic FROM(Q) r11 LEFT OUTER JOIN (q3) r22 ON ( r11.url = r22.url AND r11.log IS NOT NULL)

SPARQLtoSQL SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . OPTIONAL { ?someone foaf:weblog ?log . OPTIONAL { ?url foaf:topic ?topic .} } OPTIONAL { ?someonehttp://www.example.org/blog ?log .} } Q = SELECT r111.someone AS someone, r111.url AS url, COALESCE(r111.log,r222.log) AS log, r111.topic AS topic FROM(Q) r111 LEFT OUTER JOIN (q4) r222 ON ( r111.someone = r222.someone AND (r111.log = r222.log OR r111.log IS NULL) )

SPARQLtoSQL • Step 3: • Project only required attributes (variables) SELECT ?url ?log ?topic WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . OPTIONAL { ?someone foaf:weblog ?log . OPTIONAL { ?url foaf:topic ?topic .} } OPTIONAL { ?someonehttp://www.example.org/blog ?log .} } } SELECT r.url AS url, r.log AS log, r.topic AS topic FROM(Q) r

SPARQLtoSQL • Almost complete query (need to replace q1, q2, q3, q4) SELECT r.url AS url, r.log AS log, r.topic AS topic FROM( SELECT r111.someone AS someone, r111.url AS url, COALESCE(r111.log,r222.log) AS log, r111.topic AS topic FROM( SELECT r11.someone AS someone, r11.url AS url, r11.log AS log, r22.topic AS topic FROM( SELECT r1.someone AS someone, r1.url AS url, r2.log AS log FROM(q1) r1 LEFT OUTER JOIN (q2) r2 ON (r1.someone = r2.someone) ) r11 LEFT OUTER JOIN (q3) r22 ON (r11.url = r22.url AND r11.log IS NOT NULL) ) r111 LEFT OUTER JOIN (q4) r222 ON ( r111.someone = r222.someone AND (r111.log = r222.log OR r111.log IS NULL) ) ) r

Experimental Study • Dataset: WordNet, 700,000+ triples • Translation algorithms are very efficient and scalable. • For example, SPARQLtoSQL translated queries with less than 50 OPTIONAL clauses with one triple pattern in each in less than 0.001 sec. regardless of the clause tree layout • The evaluation of most sample queries in Oracle showed to be unsatisfactory (order of seconds) due to the simple relational schema being the most important reason. • Note that this does not imply that the algorithms are not practical. SPARQLtoSQL does not directly depend on a particular database schema as long as the BGPtoSQL stub for the database is provided, which we believe is a reasonable expectation from existing RDF storage systems.

Experimental Study • The evaluation of sample queries in the in-memory relational database showed much better results. • In these experiments, we were able to try different implementations of the left outer join based on nested-loops, sort-merge and simple hash methods.

Relational Nested Optional Join

New Example

New Example • Retrieve: • (1) every graduate student in the RDF graph; • (2) the student's advisor if this information is available; • (3) the student's coadvisor if this information is available and if the student's advisor has been successfully retrieved in the previous step. • In other words, the query returns students and as many advisors as possible; there is no point to return a coadvisor if there is even no advisor for a student.

Motivation: Computation Waste with LOJ

Nested Optional Join • A novel relational operator to translate nested optional patterns • An alternative to the left outer join • Joins Twin Relations (base relation + optional relation) • A base relation: tuples that have a potential to satisfy a join condition if used in a nested optional join. • An optional relation: tuples that are guaranteed to fail a join condition if used in a nested optional join.

SPARQL-to-SQL Translation with NOJ

Nested Optional Join • NOJ vs. LOJ • the NOJ allows the processing of the tuples that are guaranteed to be NULL padded very efficiently, in linear time • the NOJ does not require the NOT NULL check to return correct results • NOJ algorithms • nested-loops NOJ algorithm NL-NOJ • sort-merge NOJ algorithm SM-NOJ • simple hash NOJ algorithm SH-NOJ.

Nested Optional Join • Queries with joins with low selectivity factors (<0.0002)

Nested Optional Join • for in-memory evaluation: • JSF <= 0.005, SH-NOJ • JSF >= 0.8, NL-NOJ • 0.005 < JSF < 0.8, SM-NOJ

Possible Future Work • Extending our work to support other SPARQL constructs, such as UNION, FILTER, etc. • Adding intelligence to our SPARQL-to-SQL translation to support the nested optional join. • Investigating possible optimizations of parallel optional graph patterns. • Defining the relational algebra for SPARQL with the support of nested and parallel optional joins. • … and more

References • Artem Chebotko, Mustafa Atay, Shiyong Lu and Farshad Fotouhi "Extending Relational Databases with a Nested Optional Join for Efficient Semantic Web Query Processing". Technical Report TR-DB-052006-CLJF, Department of Computer Science, Wayne State University, November, 2006. Download • Artem Chebotko, Shiyong Lu, Hasan M. Jamil and Farshad Fotouhi "Semantics Preserving SPARQL-to-SQL Query Translation for Optional Graph Patterns". Technical Report TR-DB-052006-CLJF, Department of Computer Science, Wayne State University, May, 2006. Download

Acknowledgements Dr. Shiyong Lu, Dr. Farshad Fotouhi, Dr. Hasan Jamil, Dr. Mustafa Atay,Oracle DBA Shwetal JoshiQuestions?Thank you!

Semantic Web Query Processing with Relational Databases

Semantic Web Query Processing with Relational Databases

Presentation Transcript

Query Processing over Incomplete Autonomous Web Databases

Query Processing in Spatial Network Databases

The PIER Relational Query Processing System

Interaction with Relational Databases

Query Processing over Incomplete Autonomous Databases

Query Processing in Mobile P2P Databases

Semantic Web - Query Languages –

Relational Semantic Hiding Databases (RSHDB)

Relational Databases and Query Languages

Semantic Wrapper over Relational Databases

Foundations of Semantic Web Databases

Relational Semantic Hiding Databases (RSHDB)

Suggestions for Semantic Web Interfaces to Relational Databases

Query Processing over Incomplete Autonomous Databases

Query Processing in Molecular Simulation Databases

Query Processing in Spatial Network Databases

Query Processing in Mobile Databases

Distributed Databases and Query Processing

APPROXIMATE QUERY PROCESSING IN DATABASES

Foundations of Semantic Web Databases

Relational Databases: Structured Query Language (SQL)

Suggestions for Semantic Web Interfaces to Relational Databases