1 / 35

Querying XML Views

Querying XML Views. XML views Virtual views Materialized views Querying XML views (virtual) The XPERANTO approach The SilkRoute approach Translating XML queries to relational queries. query. answer. XML View. query translation. middleware. DBMS. updates. RDB. XML Views.

barbie
Télécharger la présentation

Querying XML Views

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Querying XML Views • XML views • Virtual views • Materialized views • Querying XML views (virtual) • The XPERANTO approach • The SilkRoute approach • Translating XML queries to relational queries QSX (LN 6)

  2. query answer XML View query translation middleware DBMS updates RDB XML Views • Materialized views: store data in the views • Query support: straightforward and efficient • Consistency: the views should be updated in response to changes to the underlying database • Virtual views: do not store data • Query support: view queries should be translated to equivalent ones over the underlying data • Updates: not an issue QSX (LN 6)

  3. Virtual vs. materialized XML views are important for data exchange, Web services, access control (security), Web interface for scientific databases, … • Materialized views: publishing • sometimes necessary, e.g., XML publishing • when response time is critical, e.g., active system • “static”: the underlying database is not frequently updated • Virtual views: shredding • “dynamic”: when the underlying data source constantly changes and/or evolves • Web interface: when the underlying database and the views are large • Access control: multiples views of the same databases are supported simultaneously for different user groups QSX (LN 6)

  4. db * book title * * * author chapter ref * * text section Nonrecursive query translation A query in XQuery posted on the original XML document: Q1: Find titles and authors of all books authored by “Bush”. for $book in /db/book where $book/author =`Bush’ return <book> <title> {$book/title } </title>, for $author in $book/author return <author> {$author } </author> </book> For XML data stored inrelations QSX (LN 6)

  5. db * book title * * * author chapter ref * * text section SQL translation Recall the relational schema storing the XML data db(dbID) book(bookID, parentID, code, title: string) author(authorID, bookID, author: string) chapter(chapterID, bookID) ref(refID, bookID) text(textID, chapterID, text: string) SQL: select book.title, author.author from book, author where book.title = `Bush’ and author.bookID = book.bookID One natural join – taking advantage of keys and foreign keys QSX (LN 6)

  6. XML query rewriting Tagging query results Tagging the query result Result of the query: (title, author) select book.title, author.author from book, author where book.title = `Bush’ and author.bookID = book.bookID grouping and tagging <book> <title> title </title>, <author> a1 </author>, . . ., <author> an </author> </book> External module – beyond DBMS QSX (LN 6)

  7. Recursive XML queries Not all XQuery (XSLT, XPath) queries can be rewritten in SQL Q2: find the titles of all books referenced by books written by Bush Can Q2 be translated to an equivalent SQL query? • SQL does not support recursion • SQL’99 supports linear recursion – least fixpoint operator • capable of expressing many queries commonly used • Open: a precise characterization of XML queries answerable in SQL’99 • Is there an effective way to answer general XML queries by using traditional DBMS? • Extend relational DBMS by supporting general recursion • Develop a middleware: does the computations beyond the traditional DBMS QSX (LN 6)

  8. query answer XML View Middleware: view query rewriting DBMS RDB Middleware approach Query answering/rewriting with views has been extensively studied • Derive the inverse of shred mapping – lossless • Treat the inverse as a virtual view of the relational data • Make use of techniques for view query rewriting inverse XML store QSX (LN 6)

  9. Querying and Maintaining XML Views • XML views • Virtual views • Materialized views • Querying XML views (virtual) • The XPERANTO approach • The SilkRoute approach • Translating XML queries to relational queries QSX (LN 6)

  10. Challenging issues • Schema-directed XML view definition: we know how to do it now • Query translation: given a query over a virtual XML view, rewrite the query to an equivalent one over the underlying database – from views to relations schema query answer XML query translation publishing middleware DBMS RDB QSX (LN 6)

  11. Request (XQuery) Composed XQuery SQL Queries Result Tables ? ? ? Querying a View The middleware must respond to view queries/requests by generating SQL queries/requests at runtime, composing and tagging the results View Definition Query-cost Estimates Source Capabilities SQL Generator Query Composer Tagger QSX (LN 6)

  12. query answer XML View Middleware: view query rewriting DBMS RDB The XPERANTO approach Middleware: • A library of XQuery functions • Transformation rules (algebra) • A cost model and optimization techniques An intermediate language -- middleware • accept an XML query • rewrite the query with the rules and library • push work down to the underlying DBMS • conduct computations that DBMS cannot do • optimize the rewritten queries • compose query results from DBMS and middleware to build the answer to the XML query QSX (LN 6)

  13. query answer XML View Middleware: view query rewriting DBMS RDB Research issues Optimization: can one effectively find an optimal rewriting? • How much work should be pushed down to DBMS? • communication cost between DBMS and the middleware • leveraging the DBMS optimizer • Multi-query optimization – hard • accurate cost model? • composition – when to tag? • query dependency • workload Effective generic optimization techniques are beyond reach for Turing-complete query languages (NP-hard at least) QSX (LN 6)

  14. Request (XQuery) Query Generator Query Composer ? The SilkRoute Approach Advantage of XQuery view specs: easy to compose query with view Running example: relations  canonical XML DB Actor Appear Movie * * * Actor (aid, lname, fname) ActorRow AppearRow MovieRow Appearance (mid, aid) @year @mid @aid @lname @fname @aid @mid Movie (mid, title, year) QSX (LN 6) @title

  15. ° = ? ? Query Composition View: Movies by Gibson for $aid in DB//ActorRow[@lname=“Gibson”]/@aid return <Actor><Fname> Mel </Fname> <Lname> Gibson </Lname> for $actapp in DB//AppearRow[@aid=$aid] for $movie in DB//MovieRow[@mid=$actapp/@mid] return <Movie year=“{$movie/@year}”> {$movie/@title} </Movie> </Actor> Query: Get each Actor + Movies in 1999 for $act in //Actor return <Actor> {$act/Lname} for$movie in $act/Movie[@year=1999] return <Movie>{$movie}</Movie> </Actor> Composed Query: Movies by Gibson in 1999 QSX (LN 6)

  16. ° = ? ? Query Composition View: Movies by Gibson for $aid in DB//ActorRow[@lname=“Gibson”]/@aid return <Actor><Fname> Mel </Fname> <Lname> Gibson </Lname> for $actapp in DB//AppearRow[@aid=$aid] for $movie in DB//MovieRow[@mid=$actapp/@mid] return <Movie year=“{$movie/@year}”> {$movie/@title} </Movie> </Actor> Query: Get each Actor + Movies in 1999 for$aid in DB//ActorRow[@lname=“Gibson”]/@aid return <Actor> {$act/Lname} for$movie in //Movie[@year=1999] return <Movie>{$movie}</Movie> </Actor> Instantiate name QSX (LN 6)

  17. ° = ? ? Query Composition View: Movies by Gibson for $aid in DB//ActorRow[@lname=“Gibson”]/@aid return <Actor><Fname> Mel </Fname> <Lname> Gibson </Lname> for $actapp in DB//AppearRow[@aid=$aid] for $movie in DB//MovieRow[@mid=$actapp/@mid] return <Movie year=“{$movie/@year}”> {$movie/@title} </Movie> </Actor> Query: Get each Actor + Movies in 1999 for$aid in DB//ActorRow[@lname=“Gibson”]/@aid return <Actor> Gibson for$movie in //Movie[@year=1999] return <Movie>{$movie}</Movie> </Actor> Instantiate name QSX (LN 6)

  18. ° = ? ? Query Composition View: Movies by Gibson for $aid in DB//ActorRow[@lname=“Gibson”]/@aid return <Actor><Fname> Mel </Fname> <Lname> Gibson </Lname> for $actapp in DB//AppearRow[@aid=$aid] for $movie in DB//MovieRow[@mid=$actapp/@mid] return <Movie year=“{$movie/@year}”> {$movie/@title} </Movie> </Actor> Query: Get each Actor + Movies in 1999 for$aid in DB//ActorRow[@lname=“Gibson”]/@aid return <Actor> Gibson for$actapp in DB//AppearRow[@aid=$aid] for$movie in DB//MovieRow[@mid=$actapp/@mid and @year=1999] return <Movie>{$movie}</Movie> </Actor> Specific conditions about movies QSX (LN 6)

  19. ? Query Composition ° = ? Composed Query on Canonical XML:= for$aid in DB//ActorRow[@lname=“Gibson”]/@aid return <Actor>Gibson for$actapp in DB//AppearRow[@aid=$aid] for$movie in DB//MovieRow[@mid=$actapp/@mid and @year=1999] return <Movie> {$movie/@title} </Movie> </Actor> Efficient query composition involves: • substitution • filtering • pattern matching QSX (LN 6)

  20. Translating XPath to SQL Why do we want to answer XML queries using a relational DBMS? • Middleware: • Costly: query decomposition, optimization, communication between middleware and DBMS, … • Availability: it requires the implementation of an extra tier • XML query engine: • XML query evaluation and optimization: not as sophisticated as its relational counterpart • Costly: converting relational data to XML • Update support: can’t be processed in the same framework as queries; concurrency control is not yet in place for XML, and thus updates need to be handled by DBMS anyway Answering XML queries directly using relational DBMS: XML query support within immediate reach of most commercial DBMS QSX (LN 6)

  21. XPath What XML queries can be answered by using a relational DBMS? XPath: essential to XQuery and XSLT Q ::=  | A | * | Q/Q | Q ∪ Q | //Q | Q[q] q ::= Q | Q = ‘c’ | | q ∧ q | q ∨ q | not q • : empty path • *: wildcard that matches any label • A: either a tag (label) • /, ∪: concatenation (child), union • //: descendants or self, “recursion” • [q]: qualifier (filter, predicate) • c: constant (integer) • ∧, ∨, not( ): conjunction, disjunction, negation Existential semantics: class[//prereq] QSX (LN 6)

  22. db * book title * * * author chapter ref * * section text Non-recursive XPath translation Find all books authored by Bush’s: book [author = ‘Bush’] / title select title from book, author where author.bookID = book.bookID and author.author = `Bush’ book(bookID, parentID, code, title: string) author(authorID, bookID, author: string) chapter(chapterID, bookID) ref(refID, bookID) The translation can be done by enumerating paths in the DTD matching the XPath query QSX (LN 6)

  23. Recursive XPath over non-recursive DTD Find all books referenced by Bush’s book: book [author = ‘Bush’] // book / title db * select title from book, author where author.bookID = book.bookID and author.author = `Bush’ book title * * * author chapter ref book(bookID, parentID, code, title: string) author(authorID, bookID, author: string) chapter(chapterID, bookID) ref(refID, bookID) * * section text • The translation can be done by enumerating paths in the DTD • matching the XPath query, when • either the XPath query is non-recursive (no //) • or the DTD is non-recursive although this is possibly expensive (exponential size) QSX (LN 6)

  24. db * book title * * * author chapter ref * * section text Recursive XPath over recursive DTD Find all books referenced by Bush’s book: book [author = ‘Bush’] // book / title book(bookID, parentID, code, title: string) author(authorID, bookID, author: string) chapter(chapterID, bookID) ref(refID, bookID) • Impossible to enumerate all matching paths • in the DTD -- infinitely many • the interaction between DTD recursion and XPath query recursion (//) • extension of SQL to handle recursion QSX (LN 6)

  25. db * book title * * * author chapter ref * * section text Regular XPath Capture DTD recursion and XPath recursion in a uniform framework • Regular XPath: Q ::=  | A | Q/Q | Q ∪ Q | Q* | Q[q] q ::= Q | Q = ‘c’ | q ∧ q | q ∨ q | not q • The child-axis, Kleene closure, union • An XPath fragment: Q//Q instead of Q* Example: book [author = ‘Bush’] // book / title book [ author = ‘Bush’] / (ref/book/title)* Each edge corresponds to a relation QSX (LN 6)

  26. More on regular XPath Compare regular XPath with • Regular expression Can we express regular XPath as a regular expression? • XPath Is XPath properly contained in regular XPath? • Regular XPath is more expressive than regular expression – qualifiers • equivalence of regular expressions is in PSPACE • equivalence of regular XPath is EXPTIME-hard • Regular XPath is more expressive than XPath • general Kleene star Q* in regular XPath • limited recursive // in XPath QSX (LN 6)

  27. db * book title * * * author chapter ref * * section text Capture both DTD recursion and XPath recursion Step 1: Rewrite XPath queries over recursive DTDs into equivalent regular XPath queries book [author = ‘Bush’] // book / title => book [ author = ‘Bush’] / (ref/book/title)* • Always possible? • Complexity: low polynomial if the regular XPath query F(Q) is represented as a graph (similar to finite state automata) • Theorem. There is a computable • function F that, given any XPath • query Q over a (possibly) recursive • DTD D, rewrites Q into an • equivalent regular XPath query F(Q) • equivalent: for any XML tree T of D, Q(T) = F(Q) (T) QSX (LN 6)

  28. Translate regular XPath to SQL Step 2: rewrite regular XPath query Qr to an equivalent “SQL” query Qs Equivalent: let M be the mapping from XML to relations. Then for any XML data T of the DTD D, Qr(T) = Qs( M(T) ) Question: how to handle recursion in Qr? Extension of SQL with recursion support • SQL’99: linear recursion: supported by IBM DB2 (connect-by) but not by other commercial systems (Oracle, Microsoft) • Fixpoint operator: Supported by • Oracle (connect-by) • IBM DB2 (with recursion) • Microsoft SQL Server 2005 (common table) QSX (LN 6)

  29. Linear recursion in SQL’99 Linear recursion: (R, R1, …, Rk) S(0)  R S(i+1)  S(i)  (S(i) C1 R1)  …  (S(i) Ck Rk) Regular XPath to SQL translation via linear recursion • Construct query graph G: matching paths in the DTD • Partition G into strongly connected components • Code each component using a linear-recursion operator • R: initial part -- incoming edges to components • Ri: SQL query coding an edge in a “connected component” QSX (LN 6)

  30. db * book title * * * author chapter ref * * section text Translate regular XPath to SQL – SQL’99 book [ author = ‘Bush’] / (ref/book/title)* with recursive S(from, to, title) as (R0 union S temp ) select title from S Temp: selectref.refID as from, book.bookID as to, book.title as title from ref, book where ref.bookID = book.bookID book(bookID, parentID, code, title: string) author(authorID, bookID, author: string) chapter(chapterID, bookID) ref(refID, bookID) QSX (LN 6)

  31. db * book title * * * author chapter ref * * section text Translate regular XPath to SQL – SQL’99 book [ author = ‘Bush’] / (ref/book/title)* with recursive S(from, to, title) as ((selectB1.bookID as from, B2.book.ID as to, B2.title as title from book B1, ref, book B2 where B1.bookID = ref.ID and ref.bookID = B2.bookID and B1.author = ‘Bush’) union (select S.from, temp.bookID as to, temp.title as title from S, temp where S.to = temp.refID)) select title from S book(bookID, parentID, code, title: string) author(authorID, bookID, author: string) chapter(chapterID, bookID) ref(refID, bookID) QSX (LN 6)

  32. db * book title * * * author chapter ref * * section text Fixpoint operator (R) Single input relation: S(0)  R S(i+1)  S(i)  (S(i) C R) R  selectrefID as from, ref.bookID as to, title as title from ref, book where book.bookID = ref.bookID (R)  select from, to, title from R connect by from = prior to and prior from in R0 R0  selectref.refID as from, B2.bookID as to, B2.title as title from ref, book B1, book B2 where B1.bookID = ref.refID and ref.bookID = B2.bookID and B1.author = ‘Bush’ QSX (LN 6)

  33. Fixpoint operator vs. Linear Recursion • Single input relation vs multiple • More efficient • Supported by most commercial systems Theorem. Any regular expression query can be expressed as an equivalent SQL + fixpoint ((R)) query. • Theoretically, in the worst case, explicit regular XPath translation is exponentially large. This can be avoided by using automaton or equation representations • In practice, SQL+ fixpoint is more efficient than linear-recursion, because XML documents in real-life have a small depth Research: • Translate FLWR XQuery expressions to SQL + fixpoint • What XSLT queries can be expressed as SQL + fixpoint? QSX (LN 6)

  34. Region index approach • Each node v in an XML tree carries a pair (order, range), where • order: the position of v in the preorder traversal of the tree • range: the number of descendants of v • Node v is an ancestor of w iff • order_v < order_w • order_w < order_v + range_v • An efficient method to evaluate // Problem: updates are expensive maintain the annotation when inserts/deletes are carried out Research: translating XPath to SQL by using range index QSX (LN 6)

  35. Summary and review • When should we choose materialized views / virtual views? • What are the key ideas for processing queries over views? • What is the middleware approach? • When can XPath queries be expressed in SQL? • When can XPath queries be expressed in SQL + fixpoint? • Why use regular XPath in query translation? QSX (LN 6)

More Related