XML-to-Relational Schema Mapping Algorithm ODTDMap

XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Email: artem@wayne.edu Wayne State University Joint work with Mustafa Atay, Shiyong Lu and Farshad Fotouhi

Introduction • XML has emerged as the standard for representing and exchanging data on the World Wide Web. • The increasing amount of XML documents requires the need to store and query XML documents efficiently.

Current approaches of storing and querying XML documents • Native XML repositories, e.g., Software AG’s Tamino, eXcelon’s XIS. • XML-enabled commercial database systems such as SQL Server, Oracle, and DB2 • Using RDBMS/ODBMS to store and query XML documents.

Issues of the relational approach • Schema Mapping • XML data model needs to be mapped into the relational model • Data Mapping • XML documents need to be shredded and composed into tuples to be inserted into the relational database • Query Mapping • XML queries need to be translated into SQL queries • Reverse Data Mapping • Query results need to be tagged to XML format.

Our contributions • We propose a schema mapping algorithm, ODTDMap, which generates a relational schema from an XML DTD for storing and querying ordered XML documents. • Improvements over the existing algorithms • Losslessness • Efficient support for XML queries • Completeness (recursion, set-valued attributes DTD operators)

Outline of the talk • Introduction of XML DTDs • Mapping DTDs to relational schemas • Simplifying DTDs • Creating and inlining DTD graphs • Generating relational schemas • An example • Conclusions and future work

An overview of DTDsA DTD example <!DOCTYPE memo [ <!ELEMENT memo (to, from, date, subject?, body)> <!ATTLIST memo security CDATA> <!ATTLIST memo lang CDATA> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT body (para+)> <!ELEMENT para (#PCDATA)> ]

DTD: Document Type Defintion • <!DOCTYPE root-element [ doctype-declaration... • <!ELEMENT element-namecontent-model>, content model: “|”, “,”, “*”, “+”, “?” • <!ATTLIST element-nameattr-nameattr-typeattr-default ...>

DTD: Document Type Definition (con’t) • <!ATTLIST element-nameattr-nameattr-typeattr-default ...>declares which attributes are allowed or required in which elements attribute types: • CDATA: any value is allowed (the default) • (value|...): enumeration of allowed values • ID, IDREF, IDREFS: ID attribute values must be unique (contain "element identity"), IDREF attribute values must match some ID (reference to an element) • ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION: just forget these... (consider them deprecated) • attribute defaults: • #REQUIRED: the attribute must be explicitly provided • #IMPLIED: attribute is optional, no default provided • "value": if not explicitly provided, this value inserted by default • #FIXED "value": as above, but only this value is allowed

Mapping DTDs to relational schemas • Simplifying DTDs • Creating and inlining DTD graphs • Generating relational schemas

Simplifying DTDs • A DTD might be very complex due to nesting, e.g., <ELEMENT a ((b+, c*, d?)?, (e?, f, (g*, h?)+)?)> • An XML query language is concerned about: • The parent-child relationships between XML elements • The relative order relationships between siblings (add an ordinal attribute to each relation)

DTD simplifications rules • e+ e* • e?  e • (e1 | … | en)  (e1, … ,en) • (a) (e1,… ,en)*  (e1*, … ,en*) (b) e**  e* 5. (a) …, e, …, e, … …,e*,…,… (b) …, e, …, e*, … …,e*,…,… (c) …, e*, …, e, … …,e*,…,… (d) …, e*, …, e*, … …,e*,…,…

Example of simplifying a DTD <ELEMENT a ((b+, c*, d?)?, (e?, f, (g*, h?)+)?)> simplified to <ELEMENT a (b*, c*, d, e, f, g*, h*)>

Creating and inlining DTD graphs • We create a DTD graph based on the simplified DTD. • Definition 3.2 (DTD graph) The structure of a DTD can be represented by a labeled graph, in which nodes represent elements and attributes, and edges represent their parent-child relationships. The edges are labeled by either `*' (star edge) or `, ' (normal edge) where the label `,' is not shown for simplicity. • Idea: inline a child c to its parent p if p can contain at most one occurrence of c. • Rationale: inlined elements will produce a relation.

Inlinable node and subtree, shared node • Definition 3.3 (Inlinable node) Given a DTD graph, a node is inlinable if and only if it has exactly one incoming edge and that edge is a normal edge. • Definition 3.4 (Inlinable subtree) Given a DTD graph and a node e in the graph, e and all other inlinable nodes that are reachable from e by normal edges constitute a subtree. This subtree is called the inlinable subtree for the node e (it is rooted at e). • Definition 3.5 (Shared node) Given a DTD graph, a node is called a shared node if it has more than one incoming edge.

Inlining • Case 1: Node a is connected to b by a normal edge and b has no other incoming edges, inlining b to a. • Case 2: Node a is connected to b by a normal edge but b has other incoming edges, b is a shared node, no inlining. • Case 3: Node a is connected to b by a star edge, no inlining.

Inlining (con’t)

Inlining DTD graphs

Complexity of inlining • Theorem 3.7 (Time Complexity) The time complexity of our inlining algorithm is O(n) where n is the number of elements in the input DTD.

The inlining procedure

The inlining procedure (con’t)INCORRECT

The inlining procedure (con’t)CORRECT

Generating relational schema

Generating schema mapping info. • Definition 3.8 (s Mapping)s is a mapping from X to R, where X is theset of XML element and attribute types in the input XML DTD, and R is theset of relations in the relational database. Given an XML element type e, s(e)will return the corresponding relation that is used to store e. Similarly, givenan XML attribute type a of element type e, s(e.a) will return thecorrespondingrelation that is used to store a of e.

A complete example

DTD graphInlined DTD graph

Generated relational schema

Conclusions • We defined the schema mapping algorithm ODTDMap, which has several improvements over the existing ones. • It is lossless in the sense that one can reconstruct original XML document in the given document order, based on the target relational schema generated by ODTDMap. • It has efficient support for recursive queries and schemas. • It defines how to map set-valued XML attributes. • Experimental results showed good performance and scalability of the algorithm.

Future work • Extending our work to XML Schema to support data types other than string type. • Maintain the ID/IDREF/IDREFS in terms of key and foreign key constraints.

XML-to-Relational Schema Mapping Algorithm ODTDMap

XML-to-Relational Schema Mapping Algorithm ODTDMap

Presentation Transcript

XML to Relational Database Mapping

XML Schema

Relational Database Schema Designer Using Bernstein’s Algorithm

Main challenges in XML/Relational mapping

Chapter 10, Mapping Models to Relational Schema

XML Schema

XML Schema

XML Schema

XML Schema

Chapter 10, Mapping Models to Relational Schema

Schema Advisor for Hybrid Relational-XML DBMS

XML Schema

XML Schema

Chapter 10, Mapping Models to Relational Schema

Sea Ice

Sea Ice