1 / 16

Storing XML using native storage

Storing XML using native storage. Presented by Molato Badr Supervised by Dr. H.Haddouti. Introduction. XML more frequently used development of systems that store and query xml data efficiently Research to improve system performance: Indexing paths Optimizing XML queries

nirav
Télécharger la présentation

Storing XML using native storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Storing XML using native storage Presented by Molato Badr Supervised by Dr. H.Haddouti

  2. Introduction • XML more frequently used development of systems that store and query xml data efficiently • Research to improve system performance: • Indexing paths • Optimizing XML queries • Storage configuration of XML data on disk efficiency of an XML Data Management System

  3. Outlines • Native storage as a definition • Several Native storage strategies • Comparison to DBMS storage

  4. Native storage? • based on the XML Data Models such as Document Object Model (DOM), • NXDs : a native XML database is simply a database for storing and accessing XML using XML.

  5. NXDs • NXD defines a (logical) model for an XML document, stores and retrieves documents according to that model. • Has an XML document as its fundamental unit of (logical) storage, just as relational database has a row in a table as its fundamental unit of (logical) storage. • Documents go in and documents come out. Thus NXD may not actually be a standalone database at all. • NXD is intended to developer by providing robust storage and manipulation of XML documents. • NXDs manage collections of documents, allowing you to query and manipulate those documents as a set.

  6. Native storage strategies • Schema independent • Subtree-based strategy (Natix) • Document based strategy (Apache Xindice system) • Element based strategy (TIMBER) each element node is a record. • OrientStore two schema-guided storage strategies: • Element-Based Clustering (EBC), • Logical partition-Based Clustering (LPC) strategies.

  7. Subtree-strategy (Natix) • Natix (University of Mannheim, Germany) – Semantically partition large document into subtrees based on tree structure • Store each subtree in one record (unit of storage) that is atomic • Proxy nodes are used to connect subtrees in different records • Primitives for read/write/insert/delete of element • Record size need not be statically configured, can be a dynamic value; adapting to the size and structure of document at runtime • Reconstruction of original tree by replacing proxies by subtrees

  8. Document based strategy (Apache Xindice system) • No mapping to relational required • Stores documents in tokenized form • Provides quick fragment retrieval • Supports optimized XML querying

  9. Document based strategy (Apache Xindice system) cont’ • Basic unit of data is a Document • Sets of Documents are Collections • Collections may contain Collections • Think of it as a file system for XML • Collections may be indexed • Collections may maintain XMLObjects • XMLObjects are like Stored Procedures

  10. Element-based strategy (TIMBER)

  11. Element-based strategy (TIMBER) • Build on Shore (responsible for disk management) • takes an XML document as input, produces a parse tree as output. • Takes each node of this parse tree as it is produced, transforms it into an internal representation • Stores it into shore as an atomic unit of storage • Each node corresponds to an element. Child nodes for sub-element. • All attributes of an element node are clubbed into a single node Stored as a child node of that element. • The content of an element node is pulled out in a child node. • Mixed content: each pulled out in a separate child node.

  12. Schema guided strategy (OrientStore) • EBC (Element-Based clustering) similar to Element-based strategy but clusters the element records such that records with the same schemaNodeID. • LPC (Logical partition-based clustering): The Logical Partition-Based Clustering (LPC) storage strategy partitions the schema graph into semantic blocks. • A semantic block describes a relatively integrated logical unit.

  13. EBC (Element based clustering) Clusters all the elements title together with all their text values together.

  14. LPC (logical partition-Based strategy) • Book and its children title and publisher form a semantic block. • Records are instances of the formed semantic blocks: v (n, b1, b2) instance of vendor (name, book).

  15. Logical Partition-Based Clustering • all the instances of the same semantic block are clustered together. Thus the records b1 (p1, t1) and b2 (p2, t2) in Figure 2(b) will be stored in a physical page, • v (n, b1, b2) may be stored in another physical page. N.B.: Lies between Subtree based strategy and element-based strategy

  16. Comparison with DBMS

More Related