1 / 20

RDF storages and indexes

RDF storages and indexes. Enterprise Integration – Semantic Web. Maciej Janik. September 1, 2005. Outline. RDF storages Jena Sesame Redland Brahms Indexing RDF difference from DB indexing what to index examples of index types. Storages. Jena Implemented in Java

marissa
Télécharger la présentation

RDF storages and indexes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RDF storages and indexes Enterprise Integration – Semantic Web Maciej Janik September 1, 2005

  2. Outline • RDF storages • Jena • Sesame • Redland • Brahms • Indexing RDF • difference from DB indexing • what to index • examples of index types

  3. Storages • Jena • Implemented in Java • Supports RDF, RDFS and OWL • In memory and persistent storage (Oracle, MySQL, PostgreSQL) • RDQL • Reasoning/inference engine • Optimization for common statement patterns -grouping of properties • Powerful, but slow and memory exhaustive

  4. Storages • Sesame • Implemented in Java • Modules (HTTP/SOAP handler, admin, query, export, Repository Abstraction Layer) • Persistent RDF store • traditional DBMS or dedicated RDF triple storage • Database independent • Scalable architecture • Node-centric approach • Fast and efficient, as for Java implementation

  5. Storages • Redland – together with Rasqual and Raptor • Modular approach • Redland – only storage for RDF triples + low level API • Implemented in pure C for portability • Rich API and bindings to other languages • Rasqual - RDF query module (RDQL, SPARQL) • Raptor - a very fast RDF parser • Average performance

  6. Storages • Brahms /from LSDIS lab/ • Read-only main-memory storage for RDF • read RDF and saves optimized snapshot • Written in C++, optimized for speed • additional bindings to Java • Full indexing of Subject-Predicate-Object • Uses Raptor as RDF parser • Rich low level API for graph manipulation • Very fast and memory efficient • Waiting for SPARQL implementation

  7. Brahms • Separation of different resource types: • InstanceNode, Literal, SchemaClass, SchemaProperty • Statements • InstanceStatament (instance – property – instance) • LiteralStatement (instance – property – literal) • TypeOfStatement (instance – type – class) • Taxonomy for classes and properties • Iterators deal only with one type of resource • not wasting time during instance search algorithm to check for literal or type relation

  8. Indexing of RDF • RDF = Graph • traditional DB indexes may not be sufficient • XML cannot be indexed directly as relational DB • Indexing may take advantage of tree structure • depth of node • common path from the root • convert each path to string expression • precalculate the path tree • Simple indexes on statements may also be powerful

  9. Brahms Redland What to index? • Most straight-forward approach Statements : subject –[predicate] object • Possibilities: Single: SPO SOP OSP OPS PSO POS Double: SOP SPO POS

  10. Single indexes in Brahms [design]

  11. Power of single indexes • Full indexing of statements • SPO, SOP, PSO, POS, OSP, OPS • indexes for each type of statements (InstanceStatements, LiteralStatements ...) • fast check if given resrouce is connected to another, or uses given property – use of binary search • merge of 2-hop path element in linear time • All RDF storages are based on simple indexes and their extensions

  12. Schema Vs. Instances [Brahms] • Schema is small compared to instances • Instance to taxonomy • know or check for type of the instance • Taxonomy index (classes and properties) • direct subtypes/supertypes • all ancesstors/descendants • dynamically build index of instances for given type and all its subtypes

  13. Tree-based index • Idea is based on Patricia’s trie • Index should scale with the growth of data • Path together with leaf is encoded into string -> the Index Fabric „A Fast Index for Semistructured Data” - Brian F. Cooper et al.

  14. Index fabrics • Index is used to accelerate path expressions - mainly for queries that ask for root-to-leaf path • Idea of prefix encoding • xml: <A>alpha<B>beta<C>gamma</C></B></A> • paths: <A>alpha ; <A><B>beta ; <A><B><C>gamma • encoded: A alpha ; A B beta ; A B C gamma • infix (not common): A alpha B beta C gamma • Convert path to string for fast searches • Replace tags with ‘non-terminal’ characters (like in automata)

  15. Indexing of graphs http://www.aisee.com/ Backbone

  16. Indexing of graphs http://www.aisee.com/ Tree-type - prefixes - tries

  17. 2-index 1-index Indexing of graphs T-index Path templates „Index Structure for Path Expressions” - Tova Milo, Dan Suciu

  18. Indexing of graphs http://www.aisee.com/ Landmarks

  19. Indexing of graphs • Indexing semistructured data • index fabric - encoding, multilayered • common prefixes - trie structure • backbone - highways between points • landmarks - county division • path templates - precalculated expressions • clustering - grouping by theme access • Indexing such data is NOT easy, solution depends how you want to search the graph

  20. References • Beckett, D., „The Design and Implementation of the Redland RDF Application Framework”. • Cooper et al., „A Fast Index for Semistructured Data” • Janik M. And Kochut K., „BRAHMS: A WorkBench RDF Store And HighPerformance Memory System for Semantic AssociationDiscovery” • Milo T. and Suciu D., „Index Structures for Path Expressions” • Wilkinson et al., „Efficient RDF Storage and Retrieval in Jena2” • Jena - http://jena.sourceforge.net/ • Raptor - http://librdf.org/raptor/ • Redland – http://librdf.org/ • Sesame - http://www.openrdf.org/

More Related