330 likes | 446 Vues
RDF Databases. By: Chris Halaschek. Outline. Motivation / Requirements Storage Issues Sesame General Introduction Architecture Scalability RQL Introduction Demo Future Directions. Motivation. Having metadata available is not enough
E N D
RDF Databases By: Chris Halaschek
Outline • Motivation / Requirements • Storage Issues • Sesame • General Introduction • Architecture • Scalability • RQL Introduction • Demo • Future Directions
Motivation • Having metadata available is not enough • Need tools to process, transform, and reason with the information • Need a way to store the metadata and interact with it
Requirements • Scalable • Good performance • Useful query language
Storage Issues • How to store the data? • In relational database as tables • Querying requires many joins…costly • Triples • Native graph structure • Querying requires graph traversals…need efficient algorithms
Sesame - Introduction • Open source RDF Schema-based repository and querying facility • Developed as a research prototype by Aidministrator Nederland bv • NLnet Foundation sponsors its further development as open source software
Sesame - Introduction • Can handle RDF data in XML-serialized RDF and N-Triples format • Can extract the contents of a Sesame repository in XML-serialized RDF, N-Triples, and N3 format
Repository • Many options due to Repository Abstraction Layer (RAL) • DBMS – relational, object-relational, etc • Existing RDF stores • RDF files • RDF network services
Repository Abstraction Layer (RAL) • Interface that translates RDF-specific methods to a specific DBMS • Defined by an RDF API • Created their own set of interfaces rather than adopt or extent the existing RDF API proposal • Existing API targeted main memory model • Theirs offers specific operations that support RDF Schema semantics (i.e. subsumption reasoning)
RAL Continued • Several of Sesame’s functional modules are clients of the RAL • Problems: • Must read from repository – performance decrease • Solution – selectively caching data in memory • For small repositories, all data can be cached
Functional Modules • Interact with RAL • RQL query module • Evaluates RQL queries • RDF administration module • Allows uploading RDF data and schema information, as well as deleting information • RDF export module • Allows extraction of schema and/or data from repository
RQL Query Module • Proposed RQL: • Developed within the European IST project C-Web • Follow-up project by ICS at FORTH, in Greece • Adopts the syntax of OQL • Sesame’s implementation of RQL is slightly different from the proposed RQL • Better compliance to W3C specificaitons • Support for optional domain and range restrictions • Queries are translated into sets of call to the RAL • Note: Also supports RDQL – based on SquishQL
Admin Module • Main functions: • Add RDF data/schema information • Clear repository • Retrieves information from an RDF(s) source and parses it using SiRPAC RDF parser • Parser delivers information to admin module in statement form – (S,P,O) • Module check statements for consistency and then inserts data
RDF Export Module • Exports the contents of a repository formatted in XML-serialized RDF • Supplies a basis for using Sesame in combination with other RDF tools
Communication with Sesame • Multiple options for various contexts • HTTP • RMI • SOAP • Intermediaries between the functional modules and their clients
Sesame - Scalability • Performance Tests • Uploaded and queried collection of nouns from Wordnet – 400,000 RDF statements • Performed on Sun UltraSPARC 5, 256 MB RAM • Used Java Servlets running on web server to communicate of HTTP • PostgreSQL version 7.1.2 repository
Scalability Continued • Uploading nouns • 94 minutes • 71 statements per second • Querying was much slower than expected • Due to distributed storage over multiple tables • Retrieving data required doing many joins
Sesame’s Future • Migration of Sesame to alternate repositories to boost performance • DAML + OIL support
RQL Introduction • Museum schema example
RQL - Syntax • Query typically built upon three clauses • Select • Projection over query results • From • Bind variables to specific locations in graph model • Where • Optional – constraint on values of variables in the from clause
RQL - Example select X, @P from {X} @P {Y} where Y like "Pablo" • x and y are bound to nodes • @P bound to a connecting edge - @ prefix signifies the variable is bound to properties • $ prefix signifies classes • http://sesame.aidministrator.nl/sesame/actionFrameset.jsp?repository=museum
RQL - Namespaces • In RDF, nodes and edges are identified by URIs • Can be very long • Namespace abbreviation mechanism • Extra clause • using namespace cult = http://www.icom.com/schema.rdf# • Simply type: cult:paints
RQL – Path Expressions • Specify a linear path through the graph select PAINTER, PAINTING, TECH from {PAINTER} cult:paints {PAINTING}. cult:technique {TECH} using namespace cult = http://www.icom.com/schema.rdf# • http://sesame.aidministrator.nl/sesame/actionFrameset.jsp?repository=museum
RQL – Querying Schema • Retrieving the class of a resource select X, $X, Y from {X : $X} cult:paints {Y} using namespace cult = http://www.icom.com/schema.rdf# • Variable $X is matched to the class of the resource value of X • http://sesame.aidministrator.nl/sesame/actionFrameset.jsp?repository=museum
RQL – Querying Schema • Constraining resources to a schema select X, Y from {X : cult:Cubist } cult:paints {Y} using namespace cult = http://www.icom.com/schema.rdf#
RQL – Standard Functions • Class (also Property) • subClassOf (also subProperyOf) • typeOf • In all above use ^ for only direct descendents (i.e. subClassOf^( cult:Painter ) )
RQL – subClassOf • Example: select X, @P, Y from {X} @P {Y} where X in subClassOf^( cult:Painter ) using namespace cult = http://www.icom.com/schema.rdf#
RQL – Advanced Queries • Set Operators • Union, Intersection, Difference • Logical Operators • Domain and Range Constraints • Comprehensive List: http://sesame.aidministrator.nl/publications/rql-tutorial.html
Future of RDF Databases • Standard query language • Improved storage structures • Native graph model
References / Links • Sesame: http://sesame.aidministrator.nl/ • NLnet Foundation: http://www.nlnet.nl/ • Original Specifications of RQL: http://139.91.183.30:9090/RDF/RQL