1 / 34

A Path-based Relational RDF Database

A Path-based Relational RDF Database. A. Matono , T. Amagasa , M. Yoshikawa, S. Uemura ADC 2005 SNU IDB Lab. Hyewon Lim January 9 th , 2009. Contents. Introduction An Overview of RDF Related Work and the Differences with Our Work

bendek
Télécharger la présentation

A Path-based Relational RDF Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Path-based Relational RDF Database A. Matono, T. Amagasa, M. Yoshikawa, S. Uemura ADC 2005 SNU IDB Lab. Hyewon Lim January 9th, 2009

  2. Contents • Introduction • An Overview of RDF • Related Work and the Differences with Our Work • Path-based Approach for Storing RDF Data in Relational Databases • Performance Evaluation • Conclusions

  3. Introduction (1/8) • Quality and quantity of metadata • Semantic Web makes it possible to perform high-level processes • Reasoning, deduction, semantic searches • Metadata • Described by Resource Description Framework (RDF) • RDF describes data and their semantics

  4. Introduction (2/8) • The specification defines an RDF model and RDF syntax • RDF model • Statements describe a relationship between a pair of terms • A set of statements • Represent metadata whose structure is a directed graph

  5. Introduction (3/8) • RDF is common to use as a format to describe various types of metadata • Typical usage: describe large-scale metadata • Wordnet (35MB), Gene Ontology (365MB), Open Directory Project (2GB) • In order to handle such data efficiently, RDF DBs that can manage massive RDF data are essential

  6. Introduction (4/8) • One naïve approach is to use XML DBs • Any RDF data can be serialized as XML data • This approach is impractical • Structure of semantics as RDF data is different to the structure of syntax as XML data • Semantics cannot be stored into XML DBs

  7. Introduction (5/8) • Another way: utilize relational DBs or Berkeley DB • Several RDF DBs have been proposed • Such conventional RDF DBs can be classified into two groups • 1. Schema data are designed based on RDF schema • Cannot handle such RDF data that do not have accompanying RDF schema • 2. RDF DBs store RDF data in terms of triples

  8. Introduction (6/8) • Problems of processing large RDF data using conventional RDF databases • Abilityto handle RDF schema • RDF query using information of RDF schema is important classes of RDF queries • Second group do not make any distinction between schema information and instance data • First group can process such queries • Poor performance in processing path queries • Need to perform a join operation per each path step

  9. Introduction (7/8) • Propose a path-based relational RDF DBs • Relational schema is designed to be independent of RDF schema information, and • Designed to make the distinction between schema information and instance data • Can handle schemaless RDF data as well as RDF data with schema • Extract all reachable path expressions for each resource, and store them • To improve performance for path queries • Do not need to perform join operations

  10. Introduction (8/8) • Steps • Classify every statement into categories according to the type of predicate • Construct subgraphs for each category • Store the subgraphs into distinct relational tables • Apply appropriate techniques for representing the semantics of each subgraph • Limit the structure of a subgraph is DAG

  11. An Overview of RDF (1/4) • RDF • A foundation for representing and manipulating metadata on Web resources • Usable as long as the location of a Web resource is identifiable in terms of a URI • Statements represent binary relationships between two distinct(or identical) resources • RDF data are modeled as a directed graph • Nodes and arcs represent resources and relationships • “This paper is authored byAkiyoshi MATONO.” authored www.matono.net/paper “Akiyoshi MATONO”

  12. An Overview of RDF (2/4) • RDF Schema • A specification for defining schematic information of RDF data • We can define: • Classes (rdfs:class) as types of resources • Properties of a class (rdf:Property) • Domains (rdfs:domain) and ranges (rdf:range) of the properties • Inheritance relationships (rdfs:subClassOf, rdfs:subPropertyOf) among classes or properties • Types (rdf:type)

  13. An Overview of RDF (3/4) • Using RDF and RDF Schema, we can represent complex information

  14. An Overview of RDF (4/4) • Classifying RDF data • Large size • Wordnet, ODP, and Gene Ontology • Created mainly for systematical organization of data resources • Do not contain cycles • Simple structure • Small size • RSS, FOAF, and Dublin Core • Used as metadata of images or Web pages

  15. Related Work and the Differences with Our Work (1/3) • Several RDF DBs have been proposed • Most of which use Relational DBs or Berkeley DB as their underlying data storage • Approaches using RDB • Flatly sores statements into a single relational table • Creates relational tables for classes and properties that are defined in the RDF schema information, storing resources according to their classes/properties • Approaches using BDB • Create three hash tables • Keys: subjects, predicates, objects

  16. Related Work and the Differences with Our Work (2/3) • Problems of the conventional approaches • Using the flat and hash approaches • Difficult to perform schema queries • They do not make any distinction between schema information and resource descriptions • schema approach • Be able to process queries about RDF schema • Cannot handle RDF data without RDF schema information • Relational schema is designed based on that • Costly to maintain schema evolution • Capabilities of the three approaches for processing path-based queries are not sufficient

  17. Related Work and the Differences with Our Work (3/3) • In conventional RDF databases, • statement-based queries can be processed efficiently • RDF data is decomposed into a large number of statements • When processing a path-based query • Require a number of join operations according to the steps in the path expression

  18. Path-based Approach for Storing RDF Data in Relational Databases- Subgraph extraction from RDF graph(1/2) • When storing RDF data • Parses the RDF data • generates own RDF graph • decomposes the graph into five subgraphs according to the type of predicate • Class Inheritance (CI) graphs – rdfs:subClassOf • Property Inheritance (PI) graphs – rdfs:subPropertyOf • Type (T) graphs – rdf:type • Domain-Range (DR) graphs – rdfs:domain, rdfs:range • Generic (G) graphs

  19. Path-based Approach for Storing RDF Data in Relational Databases- Subgraph extraction from RDF graph(2/2) • Advantages of dividing an RDF graph • Store RDF data into distinct relational tables • Dising relational schema to be independent of RDF schema information • Structures of the resulting subgraphs are less complex than the original RDF graphs • Opportunities to apply several techniques for representing each subgraph by consider each graph structure

  20. Path-based Approach for Storing RDF Data in Relational Databases- Path expressions (1/3) • Most queries of RDF data • Queries to detect subgraphs matching a given graph • Queries to detect a set of nodes which can be reached via given path expressions • These queries are represented in path expressions • Storage based on path expressions • Decrease in the number of join operations

  21. Path-based Approach for Storing RDF Data in Relational Databases- Path expressions (2/3) • Store not the entire RDF graph • only graph G to which path-based queries are frequently posed • Graph CI and PI should be stored by a scheme that can detect ancestor-descendant relationships • Queries for RDF data use path expressions consisting of arcs • Stores arc paths into a relational table

  22. Path-based Approach for Storing RDF Data in Relational Databases- Path expressions (3/3) • Arc path • DAG g, node set V(g), arc set E(g) • Afinitesequence of arcs • (v0, v1), (v1, v2), …, (vk-2, vk-1), (vk-1, vk) • The path expression of the arc path • l(v0, v1), l(v1, v2), …, l(vk-2, vk-1), l(vk-1, vk) • Absolute arc path • An arc path whose source node is a root vm vn

  23. Path-based Approach for Storing RDF Data in Relational Databases- Extended interval numbering scheme for DAGs (1/2) • Interval numbering scheme • Detect ancestor-descendant relationships between two nodes in a tree • We use it to detect inheritance relationships between classes or properties • Extend the scheme to apply it to DAGs

  24. Path-based Approach for Storing RDF Data in Relational Databases- Extended interval numbering scheme for DAGs (2/2) • Relationship between two nodes can be verified by a subsumption • v is an ancestor of uiffpre(v) < pre(u) ∧ post(u) < post(v) • v is a parent of u if depth(u) - depth(v)=1 v v (2, 5, 1) (5, 4, 2) u u (6, 3, 3) (4, 1, 3)

  25. Path-based Approach for Storing RDF Data in Relational Databases- Proposed relational schema (1/2) • Designed relational schema for storing RDF data based on the subgraphs

  26. Path-based Approach for Storing RDF Data in Relational Databases- Proposed relational schema (2/2) • Storage example of the RDF data

  27. Path-based Approach for Storing RDF Data in Relational Databases- Query Processing • Examples • Find the title of something painted by someone SELECT r.resourceNameFROM path AS p, resource AS rWHERE p.pathID=r.pathIDAND p.pathexp=‘#title<#paints’ • Find the names of the classes that are http://www.w3.org/2000/01/rdf-schema#Resources‘s direct superclass SELECT c1.classNameFROM class AS c, class AS c1WHERE c.pre<c1.pre AND c.post>c1.postAND c.depth=c1.depth-1 AND c.className=‘http://www.w3.org/2000/01/ref-schema#Resources’

  28. Performance Evaluation • Compared the processing time between our approach and Jena2 • Jena2: based on the flat approach • Cannot evaluate the performance of schema-based queries • Exist no RDF data with schema information whose size is large enough to be used in our experiments on the Web • Environments • Athlon 1.4 GHz CPU, 1GB memory, Gentoo Linux 1.4, PostgreSQL 7.4.3

  29. Performance Evaluation- Schema-based Queries (1/3) • Basic schema queries • Find immediate children (or parents) of a given class (or property) • Find inheritance relationships between given two classes (or properties) • Find classes as a domain (or range) of a given property • Querying the meta-schema • Find all resources, that is, instances of “rdfs:Resource”. • Find all literals

  30. Performance Evaluation- Schema-based Queries (2/3) • Quering type information • Find a set of instances of given class • Find a set of statements using given property • When the above queries are processed, there are two cases: • Answer is obtained by a single access to data storage, or multiple accesses

  31. Performance Evaluation- Schema-based Queries (3/3) • The ability of each approach for schema-based queries • Our approach is efficient because of interval number scheme • In meta-schema queries, if the RDF graph includes many multiple paths, the redundancy is increased

  32. Performance Evaluation- Path-based Queries (1/2) • Datasets • Sufficient size to see scalability • The G graph of the data does not contain any cycles • The G graph of the data contain long absolute path expressions • Use the Gene Ontology

  33. Performance Evaluation- Path-based Queries (2/2) • Experiment results

  34. Conclusions • We can handle schemaless RDF data • We can process schema-based queries using the interval numbering scheme • For path-based queries • Achieved high performance • To reduce the number of join operations, we stored RDF data based on path expressions • Future work • Investigate query-processing techniques • Query language, query transformation, and query optimization for RDF data

More Related