XML Storage and Indexing Native XML

XML Storage and Indexing Native XML Ervin Domazet Ahmed Cavit Zafar Hakan Demir

Outline • Two major classes of XML database • Data versus Documents & their influence on storing XML data • XML-enabled databases • Storing and Retrieving Data in XML-enabled databases • Mapping Document Schemas to Database Schemas • XML Query Languages • Native XML databases • Storing Data in a Native XML Database

Two major classes of XML database XML-enabled These map all XML to a traditional database (such as a relational database[1]), accepting XML as input and rendering XML as output. This term implies that the database does the conversion itself (as opposed to relying on middleware).

Two major classes of XML database Native XML (NXD): The internal model of such databases depends on XML and uses XML documents as the fundamental unit of storage, which are, however, not necessarily stored in the form of text files.

Data versus Documents & their influence on storing XML data • Characterizing documents as data-centric or document-centric will help to decide what kind of database to use. • As a general rule, data is stored in a traditional database, such as a relational, object-oriented, or hierarchical database. • This can be done by third-party middleware or by capabilities built in to the database itself.

Data versus Documents & their influence on storing XML data • In the latter case(documents), documents are stored in a native XML database (a database designed especially for storing XML) or a content management system (an application designed to manage documents and built on top of a native XML database).

A note on traditional databases and native XML databases • The boundaries between traditional(xml-enabled) databases and native XML databases are beginning to blur, as traditional databases add native XML capabilities and native XML databases support the storage of document fragments in external (usually relational) databases.

Data-Centric Documents • A data-centric XML document contains regular repeating elements. Child elements of a given element might all have the same tag name, or they might not. Typically, child element order doesn’t matter. There are lots of examples of this – many types of transforms of a relational database to XML results in data-centric XML.

Data-Centric Documents-Example • <Customers> <Customer> <Name>Bob</Name> <Age>45</Age> </Customer> <Customer> <Name>Jill</Name> <Age>37</Age> </Customer></Customers>

Document-Centric Documents • Document-centric XML documents have the characteristic that the child elements of a given element are much less bounded – you might have many child elements of a given name, or you might have none. • You might have ‘recursion’ in the hierarchy – element A is a child of element B, which is itself a child of a different element A. A number of examples: Open XML word processing markup, XHTML, and XPS.

Document-Centric Documents-Example • An Open XML paragraph that contains some bold text: <w:p><w:r><w:t>abc</w:t> </w:r><w:r><w:rPr><w:b/></w:rPr><w:t>def</w:t></w:r><w:r><w:t>ghi</w:t></w:r></w:p>

XML-enabled Databases

Storing & Retrieving DATA in XML-enabled databases • When transfering data between XML documents and a database, it is necessary to map the XML document schema (DTD, XML Schemas, RELAX NG, etc.) to the database schema. • The data transfer software is then built on top of this mapping. The software may use an XML query language (such as XPath, XQuery, or a proprietary language) or simply transfer data according to the mapping (the XML equivalent of SELECT * FROM Table).

Storing & Retrieving DATA in XML-enabled databases • In the latter case, the structure of the document must exactly match the structure expected by the mapping. Since this is often not the case, products that use this strategy are often used with XSLT. That is, before transferring data to the database, the document is first transformed to the structure expected by the mapping; the data is then transferred. • Similarly, after transferring data from the database, the resulting document is transformed to the structure needed by the application.

Mapping Document Schemas to Database Schemas • Mappings between document schemas and database schemas are performed on element types, attributes, and text. They almost always omit physical structure (entities, CDATA sections, and encoding info) and some logical structure (processing instructions, comments). • This is more reasonable than it may sound, as the database and application are concerned only with the data in the XML document.

Mapping Document Schemas to Database Schemas • Two mappings are commonly used to map an XML document schema to the database schema: • Table-based mapping • Object-relational mapping

Table-based mapping • The table-based mapping is used by many of the middleware products that transfer data between an XML document and a relational database. It models XML documents as a single table or set of tables. That is, the structure of an XML document must be as follows, where the <database> element and additional <table> elements do not exist in the single-table case:

<database> <table> <row> <column1>...</column1> <column2>...</column2> ... </row> <row> ... </row> ... </table> <table> ... </table> ... </database>

Example for table-based Mapping

Object-relational mapping • The object-relational mapping is used by all XML-enabled relational databases and some middleware products. • It models the data in the XML document as a tree of objects that are specific to the data in the document. • In this model, element types with attributes, element content, or mixed content (complex element types) are generally modeled as classes. Element types with PCDATA-only content (simple element types), attributes, and PCDATA are modeled as scalar properties.

Object-relational mapping • The model is then mapped to relational databases using traditional object-relational mapping techniques: • classes are mapped to tables, • scalar properties are mapped to columns, • object-valued properties are mapped to primary key / foreign key pairs.

Possible mappings from an XML structure to an object-relational schema

Object-relational mapping • It is important to understand that the object model used in this mapping is not the Document Object Model (DOM). • The DOM models the document itself and is the same for all XML documents, while the model described above models the data in the document and is different for each set of XML documents that conforms to a given XML schema.

Query Languages • Template-Based Query Languages (*) • SQL-Based Query Languages (*) • XML Query Languages (**) • Xquery • XPath (*) - can only be used with relational databases (**)- can be used over any XML document

XQuery • With XQuery, either a table-based mapping or an object-relational mapping can be used. • If a table-based mapping is used, each table is treated as a separate document and joins between tables (documents) are specified in the query itself, as in SQL. • If an object-relational mapping is used, hierarchies of tables are treated as a single document and joins are specified in the mapping.

XPath • With XPath, an object-relational mapping must be used to do queries across more than one table. This is because XPath does not support joins across documents. Thus, if the table-based mapping was used, it would be possible to query only one table at a time.

The overall picture of storage in XML-enabled databases

Native XML Databases

Storing Data in a Native XML Database • A Native XML database: • Defines a (logical) model for an XML document — as opposed to the data in that document — and stores and retrieves documents according to that model. At a minimum, the model must include elements, attributes, PCDATA, and document order. Examples of such models include the XPath data model, theXMLInfoset, and the models implied by the DOM and the events in SAX 1.0.

Storing Data in a Native XML Database • A Native XML database: • Has an XML document as its fundamental unit of (logical) storage, just as a relational database has a row in a table as its fundamental unit of (logical) storage.

Storing Data in a Native XML Database • A Native XML database: • Need not have any particular underlying physical storage model. For example, NXDs can use relational, hierarchical, or object-oriented database structures, or use a proprietary storage format (such as indexed, compressed files).

Advantages of Native XML Databases • The first of these is when your data is semi-structured. That is, it has a regular structure, but that structure varies enough that mapping it to a relational database results in either a large number of columns with null values (which wastes space) or a large number of tables (which is inefficient).

Advantages of Native XML Databases • A second reason to store data in a native XML database is retrieval speed. Depending on how the native XML database physically stores data, it might be able to retrieve data much faster than a relational database.

Advantages of Native XML Database • A third reason to store data in a native XML database is that you want to exploit XML-specific capabilities, such as executing XML queries. Given that few data-centric applications need this today and that relational databases are implementing XML query languages, this reason is less compelling.

Disadvantages of Native XML Databases • One problem with storing data in a native XML database is that most native XML databases can only return the data as XML. • If your application needs the data in another format (which is likely), it must parse the XML before it can use the data.

Disadvatnages Native XML Databases • This is clearly a disadvantage for local applications that use a native XML database instead of a relational database. • It is not a problem with distributed applications that use XML as a data transport, since they must incur this overhead regardless of what type of database is used.

The overall picture of storage in Native XML databases

References • http://www.rpbourret.com/xml/XMLAndDatabases.htm, Ronald Bourret, XML and Databases • http://techessence.info/node/51Jenn Riley “Data-centric” vs. “Document-centric” XML • http://en.wikipedia.org/wiki/XML_database XML_Database

Next • Data Types, Null Values, Character Sets, and All That Stuff

XML Storage and Indexing Native XML

XML Storage and Indexing Native XML

Presentation Transcript

XML Indexing Structure

XML Indexing Techniques

SQL/XML, XQuery , and Native XML Programming Languages

XML Storage and Indexing Native XML

Universal, Composable Indexing Queries, Text, Spatial Data, XML Structure, and XML Semantics

XML Storage and Query Processing

The NATIVE XML Server

Indexing of XML Data

Adaptive XML Storage

Storing XML using native storage

Sedna: A Native XML DBMS

Lecture 13: XQuery XML Publishing, XML Storage

Native XML Databases

XML to XML through XML

XML Native Query Processing

TIMBER A Native XML Database

XML Compression and Indexing

Native XML Databases

Lecture 12: XML Publishing, XML Storage

XML Indexing and Search