SilkRoute: A framework for publishing relational data in XML

SilkRoute: A framework for publishing relational data in XML In ACM Transactions on Databases, 2002 By Mary Fernández, AT&T Labs - Research Dan Suciu, Univ. of Washington Yada Kadiyska, Univ. of Washington Atsuyuki Morishima, Univ. of Tsukuba Wang-Chiew Tan, Univ. of Pennsylvania Presented by Yaniv Kaplan Original slides by M. Fernández, D.Suciu and W.C.Tan

? ? Problem Supplier Supplier • XML is widely used as an exchange format (regardless of how data is actually stored) • Most data resides in RDBMS • Goal: Publish and query XML data using the relational engine Common XML Illusion Broker

Requirements • Generality: support general mappings from relations to XML • Selectivity: only the required fragments of the XML view should be materialized • Efficiency: results should be quickly obtained

Supplier Example Shared XML Schema element supplier { element company, element product* } element product { element name, element category, element description, element retail, element sale?, element report* } elements company, name, category { string } elements retail, sale { float } element report { attribute code { string }, string } Supplier’s Database CREATE TABLE Clothing( pid CHAR(10) PRIMARY KEY, item VARCHAR(30), category VARCHAR(20), description VARCHAR(200), price REAL, cost REAL) CREATE TABLE Problems( pid CHAR(10), code CHAR(10), comments VARCHAR(200)) CREATE TABLE Discount( pid CHAR(10) PRIMARY KEY, item VARCHAR(30), discount REAL)

Canonical XML View of Relational Data R(A,B,C) = {(a1,b1,c1), (a2,b2,c2), …, (an,bn,cn)} <R> <Tuple> <A>a1</A><B>b1</B><C>c1</C> </Tuple> … <Tuple> <A>an</A><B>bn</B><C>cn</C> </Tuple> </R>

<Clothing> <Tuple> <pid>001</pid> <item>Suede jacket</item> <category>outerwear</category> <description>Hip length</description> <price>300.00</price> <cost>175.00</cost> </Tuple> <Tuple> <pid>002</pid> <item>Rain boots</item> <category>outerwear</category> <description>Ankle height</description> <price>45.00</price> <cost>19.99</cost> </Tuple> </Clothing> Canonical XML View of Supplier Example Clothing(pid, item, category, description, price, cost) : { <001, “Suede Jacket”, “outerwear”,”Hip length”, 300.00, 175.00>, <002, “Rain boots”, “outerwear”, “Ankle height”, 45.00, 19.99> }

Public XML View/Public Query • Defined by a XQuery over canonical XML view of relational data • Client/Application sees this virtual XML document • Written by DB administrator

Input (relational) schema S: Clothing(pid, item, category, description, price, cost) Discount(pid, item, discount) Problems(pid, code, comments) <supplier> <company>Acme Clothing</company> { FOR $c IN $CanonicalView/Clothing/Tuple WHERE data($c/category) = “outerwear” RETURN <product> <name>{ data($c/item) } </name> <category>{ data($c/category) }</category> <description>{ data($c/description) }</description> <retail>{ data($c/price) } </retail> { FOR $d IN $CanonicalView/Discount/Tuple WHERE $d/pid = $c/pid RETURN <sale> { data($c/price)*data($d/discount) } </sale> } { FOR $p IN $CanonicalView/Problems/Tuple WHERE $p/pid = $c/pid RETURN <reportcode=“{ data($p/code) }”>{ $p/comments }</report> } </product> } </supplier> QP: Public Query Example Output (XML)Schema: Supplier company product* name category description retail sale? report*

Public View of Supplier Example Problems(pid, code, comments) : { <002, “defective”, “leaks in heel”>, <002, “defective”, “heel separates”> } Discount(pid, item, discount) : { <001, “Suede Jacket”, 0.70>, <002, “Rain boots”, 0.50> } <supplier> <company>ACME Clothing</company> <product> <name>Suede jacket</name> <category>outerwear</category> <description>Hip length</description> <retail>300.00</retail> <sale>210.00</sale> </product> <product> <name>Rain boots</name> <category>outerwear</category> <description>Ankle height</description> <retail>45.00</retail> <sale>22.50</sale> <report code=“defective”>leaks in heel</sale> <report code=“defective”>heel separates</sale> </product> </supplier>

Application Query • Written by a client or application • A XQuery over the public XML view

QA: Application Query Example • Find all products with sale price less than ½ of its retail price FOR $s in $PublicView/supplier RETURN <supplier> { <name>{ data($PublicView/supplier/company }></name> <discounted> { FOR $p in $s/product WHERE data($p/sale) < 0.5*data($p/retail) RETURN <product>{ data($p/name) }</product> } </discounted> } </supplier>

QA+QP (Intuition only) • Goal: Compose QA+QP to get an XQuery expression that takes as input only the canonical view • 1st option: replace every instance of $PublicView in QA with QP • 2nd option: next slide…

QA+QP (Cont.) <supplier> <name>Acme Clothing</name> <discounted> { for $c in $CanonicalView/Clothing/Tuple, $d in $CanonicalView/Discount/Tuple where data($c/category) = “outerware”, data($c/pid) = data($d/pid) data($c/price) * data($d/discount) < 0.5 * data($c/price) return <product>{ data($c/item) }</product> } </discounted> </supplier>

View Forest/Tree • Internal abstraction of how an XML document can be materialized using SQL queries over relational tables • Also an abstraction of XQuery to SQL translation • Nodes correspond to XML hierarchy; internal nodes – elements/attributes, leafs – element/attribute type • Each node “holds” an SQL query to retrieve all elements of node type • The trick: connect elements in correct parent-sibling relation • Compositional

N1 <supplier> N1.1 N1.2 <company> <product> N1.1.1 N1.2.2 N1.2.1 N1.2.3 N1.2.4 N1.2.5 N1.2.6 <sale> <report> <category> <name> <description> <retail> string N1.2.1.1 N1.2.2.1 N1.2.3.1 N1.2.4.1 N1.2.5.1 N1.2.6.1 N1.2.6.2 string string string float @code float string N1.2.6.1.1 string CN1 = SELECT * FROM ( ) CN1.2 = SELECT * FROM Clothing c WHERE c.category = "outerwear" CN1.2 = SELECT * FROM Clothing c, Problems p WHERE c.category = "outerwear" AND p.pid = c.pid

View Forest of a Canonical Mapping CN1: SELECT * FROM () CN1.1: SELECT * FROM Clothing c CN1.1.2: SELECT * FROM Clothing c CN1.1.2.1: SELECT c.item FROM Clothing c

Main Modules • View Composer: compose QA± Qp to get a query that refers only to the canonical XML documents, i.e. relations • Query Planner: creates one or more SQL queries • XML Generator: massages tabular results into XML format

SilkRoute Architecture

View Forest of QA? • If we can obtain the view forest of QA, we can compute the XML result of QA using only SQL queries • Can we always translate QA into a view forest?

XQueryCore • A subset of XQuery Core Language given in W3C XQuery Formal semantics • No recursive functions, no operators that depend on “orderness” of XML. E.g., n << m (returns true if node n precedes node m in document order) • Every XQueryCore expression can be translated into a view forest • QA± Qp can be translated into a view forest

Property of View Forest Composition Algorithm (VFCA) • Let Q be a XQueryCore expression defined over the view forests X1, …, Xn • The output of VFCA(Q, X1, …, Xn) is a view forest VQ such that for all relation instances I over the relation schema S VQ(I) = Q(X1(I), …, Xn(I)) • Note that each Xi is either a canonical view forest or a view forest of a XQueryCore expression defined directly or indirectly over the canonical view forest of S

Computing VFCA(Q, X1, …, Xn) • Refer to paper for details…

Query Planner • Takes a view forest and returns one or more SQL queries to send to the relational engine • Idea: partition the view forest • Extreme strategies: Fully partitioned strategy vs. unified strategy • Many SQL queries and connect calls to RDBMS vs. one big SQL query and only one connect call to RDBMS • Optimal strategy usually lies somewhere in between

Steps in Query Planning

Query Planning Example FOR $c IN $CanonicalView/Clothing/Tuple RETURN <product> { FOR $d IN $CanonicalView/Discount/Tuple WHERE $d/pid = $c/pid RETURN <sale> { data($c/price)*data($d/discount) } </sale> } { FOR $p IN $CanonicalView/Problems/Tuple WHERE $p/pid = $c/pid RETURN <report>{ $p/comments }</report> } </product> CN1.1.1: SELECT (d.discount*c.price) as sale FROM Clothing c, Discount d WHERE c.pid=d.pid CN1.2.1: SELECT p.comments FROM Clothing c, Problems p WHERE c.pid=p.pid

Step I. Identify keys • Identify and add keys to every SELECT clause in view forest SELECT A FROM E1 x1, …, Ek xk  SELECT A , key(E1) , … , key(Ek) FROM E1 x1, …, Ek xk • Keys are used to merge data from multiple tuple streams efficiently • If tuple streams are sorted on keys, the XML result can be constructed by making one pass through the tuple streams

Input (relational) schema S: Clothing(pid, item, category, description, price, cost) Discount(pid, item, discount) Problems(pid, code, comments) Identify Keys • Add keys c.pid, p.pid • SELECT c.pid, p.pid, p.comments • FROM Clothing c, Problems p • WHERE c.pid=p.pid • Add keys c.pid and d.pid • SELECT c.pid, d.pid, (d.discount*c.price) as sale • FROM Clothing c, Discount d • WHERE c.pid=d.pid

Step II. Partition the View Forest • Exponentially many plans in general • Let m be the number of edges in a view tree. Number of possible partitions = 2m • Step III: Generate a SQL query for each partition: Connect elements in correct parent-sibling relation using joins: OJP (Outer-Join Plan) of a partition = SQL query of root node LEFT OUTER-JOIN (OJP(n1) UNION … UNION OJP(nk)) where n_1, …, n_k are children nodes root node

Step III: OJP of View Forest (a) SELECT 1 AS L1, c.pid, L2, (c.price*Q.discount) as Sale, Q.comments FROM Clothing c LEFT OUTER JOIN ( ( SELECT 1 AS L2 d.pid AS pid, d.discount AS discount, NULL AS comments FROM Discount d) UNION ( SELECT 2 AS L2, p.pid AS pid, NULL AS discount, p.comments AS comments FROM Problems p) ) AS Q ON c.pid = Q.pid ORDER BY L1, c.pid, L2, sale, Q.comments Ordering information: <sale> comes before <report>

Step IV: Generate XML Output These tuples generate information for the same <product> element. Step IV: XML Generation <product> <sale>10</sale> <report>fits poorly</report> <report>button missing</report> </product>

Corresponding XML Output <product> <sale>10</sale> <report>fits poorly</report> <report>button missing</report> </product> <product> <sale>120</sale> </product> <product/> <product> <sale>56</sale> <report>zipper jeans</report> <product>

Step III: OJP of view forest (b) For <product>-<sale> edge: SELECT 1 AS L1, c.pid, L2, (c.price*Q.discount) as sale FROM Clothing c LEFT OUTER JOIN ( ( SELECT 1 as L2, d.pid, d.discount FROM Discount d) ) AS Q ON c.pid = Q.pid ORDER BY L1, c.pid, L2, Q.pid, Q.sale For <report> edge: SELECT 1 AS L1, 2 AS L2, c.pid, p.comment FROM Clothing c, Problems p WHERE c.pid = p.pid ORDER BY L1, c.pid, L2, p.comment

Step IV: Generate XML Output

Step IV: Generate XML Output XML Generator makes one pass through the two tuple streams: <product> <sale>10</sale> <report>fits poorly</report> <report>button missing</report> </product>

Partition Greedily • Cost of a query Q, cost(Q) = a*evaluationcost(Q) + b*datasize(Q) • Cost of an edge E, cost(E) = cost(Qc) – (cost(Q1) + cost(Q2)) Q1, Q2 = queries at parent and child nodes of E resp. Qc = combined query • Evaluationcost, datasize are estimates given by RDBMS • Pick an edge(E) if cost(E) is less than some threshold

We can go on and on… • VFCA • View Forest/Tree details • View Tree reductions • Other alternatives to OJP • Experimental results • Other available systems • SilkRoute is available at silkroute.sourceforge.net

SilkRoute: A framework for publishing relational data in XML

SilkRoute: A framework for publishing relational data in XML

Presentation Transcript

A Rank-Rewrite Framework for XML Data Summarization

XML and The Relational Data Model

Efficiently Publishing Relational Data as XML Documents

A Framework for Relational Interoperability in Federated Databases

Outside-In XML Publishing

XML Publishing

On Relational Support for XML Publishing

Storing XML Data in Relational Databases

XML-  : an extendible framework for manipulating XML data

A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS

Efficiently Publishing Relational Data as XML Documents

Efficiently Publishing Relational Data as XML Documents

XML Structures for Relational Data

Using XML to View Relational Data

XML Publishing

Store XML Data in a Relational Database

SilkRoute: A Framework for Publishing Relational Data in XML

Viewing relational data as XML

Processing XML data using a relational database: Schema-Based XML Storage

a framework for quality publishing

A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS