1 / 52

Querying Distributed Data using XML

Querying Distributed Data using XML. Yannis Papakonstantinou UCSD. Overview. The Virtual XML View Approach towards Data Integration Query Processing in XML Mediators Issues Overview An Algebra-Based Architecture Navigation-driven Evaluation Related Topics Querying XML Views on the Web

yama
Télécharger la présentation

Querying Distributed Data using XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Querying Distributed Data using XML Yannis Papakonstantinou UCSD

  2. Overview • The Virtual XML View Approach towards Data Integration • Query Processing in XML Mediators • Issues Overview • An Algebra-Based Architecture • Navigation-driven Evaluation • Related Topics • Querying XML Views on the Web • Other architectures: a transducer/stream-based model • Beyond Structured Querying

  3. Data Integration Requirements in eBusiness Applications • It starts with …“Provide to customers, partners, employees Application X”, where X may be in Business Intelligence, Customer Support, … • Then the problem comes up…“The applications uses information assets widely distributed across my enterprise?” • If only….“Give to the application a single place to go to access all the information required. Requirements are evolving so make sure the system can be easily maintained and upgraded”

  4. customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … <customer_table> <customer> <name>John</name> <id>56</id> <city>Chicago</city> </customer> <customer> <name>George</name> <id>58</id> <city>Chicago</city> </customer> … </customer_table> View-Based Approach: Wrappers Export Basic Source Views Client Application Integrated (XML) View Mediator (XML) View (XML) View Wrapper Wrapper Customers Rel. DB Orders Rel. DB

  5. Wrappers Export Basic Source Views order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … Client Application Integrated (XML) View Mediator (XML) View (XML) View Wrapper Wrapper Customers Rel. DB Orders Rel. DB

  6. customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … Client Application customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … Integrated (XML) View Mediator (XML) View (XML) View Wrapper Wrapper Customers Rel. DB Orders Rel. DB Mediators Export Integrated Views, Tailored to Application Needs

  7. Virtual Views:Query-Driven Mediator Operation Find all Chicago customer names, along with their ordered items Application Retrieve Chicago customer names and id’s Retrieve all cid’s and item names of orders Mediator Wrapper Wrapper Customers Database Orders Database

  8. customers customer name John ordered_items item chips item salsa customer … customer name John id 56 … order cid 56 item chips order cid 56 item salsa … On-Demand (Query-Driven)Mediator Operation Application Mediator Wrapper Wrapper Customers Database Orders Database

  9. Multiple Plans are Possible • Retrieve customers • For each customer find matching orders

  10. A New Kind of Query Processing Problem • Build and Run “Optimal” Plan • Consisting of operators that • Collect source info using supported queries and commands • Combine info into XML result

  11. Challenges in Query Processing & Optimization • Operate within the Limited and Different Capabilities of the Sources • Describe sets of supported queries • Use most efficient supported queries • Optimize plans/queries sent to sources • Estimate Costs of Plans • Adapt Plans Along the Way • Beyond Conjunctive Queries • Compose Queries/Views Efficiently • Schema inference & optimization • Combine navigation & querying

  12. Queries supported by mediator Queries supported by wrapper From Limited Wrappers to Efficient Plans for Extended Query Sets all queries over schema • Answering Queries Using Views • But with Infinite Sets of Views • Increasing Relevance due to Web Services Source Data & Schema Source Data & Schema

  13. Challenges in Query Processing & Optimization • Operate within the Limited and Different Capabilities of the Sources • Describe sets of supported queries • Use most efficient supported queries • Optimize plans/queries sent to sources • Estimate Costs of Plans • Adapt Plans Along the Way • Beyond Conjunctive Queries • XQuery processing • Schema inference & optimization • Combine navigation & querying • Build iterator models for low memory footprint

  14. customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … Navigation-Driven Evaluation of Query Result

  15. right(p) down(p) Navigation-Driven Evaluation p Input: client navigations view definition ans = q( s1 … sn ) Client result Lazy Mediator Output: source navigations s1 sn ... XML source XML source

  16. Navigation-Driven Evaluation Input: client navigations Client view definition ans = q( s1 … sn ) result Lazy Mediator Output: source navigations s1 sn ... XML source XML source

  17. Navigation-Driven Evaluation Input: client navigations view definition ans = q( s1 … sn ) Client result Lazy Mediator Output: source navigations s1 sn ... XML source XML source

  18. Navigation-Driven Evaluation Input: client navigations view definition ans = q( s1 … sn ) Client result Lazy Mediator Output: source navigations s1 sn ... XML source XML source

  19. Navigation-Driven Evaluation Input: client navigations view definition ans = q( s1 … sn ) Client result Lazy Mediator Output: source navigations s1 sn ... XML source XML source

  20. customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … Mixing Querying & Navigation Find details of all salsa orders below visited node

  21. Challenges in Mixing Querying & Navigation • Two-dimensional navigation • Reminds of cursors but there are multiple continuation points • Controlling size + shape • Contextualizing queries by navigation

  22. Overview • The Virtual XML View Approach towards Data Integration • Query Processing in XML Mediators • Issues Overview • An Algebra-Based Architecture • Navigation-driven Evaluation • Quick Overview of Related Topics • Querying the XML View on the Web • Other architectures: a transducer-based model • Beyond Structured Querying • Fuzzy/preference queries & Top-N processing • Unstructured Queries

  23. An Algebra-Based Query Processor Architecture Client XQuery Navigation Requests Results XQuery Views Translation to Algebra Algebra Plan Source Schemas & Types Source Description Rewriter/Optimizer Physical Algebra Plan Functions Plan Execution Engine Function Description Queries & Fetch Requests to Sources

  24. Query Processing on Tuple-Oriented Algebra Enables… • Well-known efficient physical implementations of the operators • Join optimization • Nested data by nested plans or group-by • Efficient iterator model

  25. XQuery: Queries & Views for XML <customers> { for $cust in document(“db”)/customer return <customer> { $cust/id, for $order in document(“db”)/order where $order/cid = $cust/id return <order> { $order/id } </order> } </customer> } </customers>

  26. $db1 $cust $cust_id ct c1 i1 ct c2 i2 $db1 $cust ct c1 ct c2 ct c1 i1 $db1 ct c2 i2 Access and Navigation getD $cust, id  $cust_id db customer_table customer name John id 56 customer name George id 58 getD $db1, customer  $cust source db, [$db1]

  27. $db1 $cust_id ct i1 ct i2 ct $db1 ct Simplification Using Schema Inference Since $cust_id  $cust and $cust is “useless” otherwise db customer_table customer name John id 56 customer name George id 58 getD $db1, customer/id  $cust_id i1 i2 source db, [$db1]

  28. Plan p … $db1 $cust_id $orders ct i1 [o11…] nestedSrc $part $db1 $cust_id ct i1 $db1 $cust_id ct i2 $db1 $cust_id ct i1 $db1 $cust_id ct i2 $db1 $cust_id $part ct i1 ct i2 $db1 $cust_id ct i1 ct i2 Nested Plans ct i2 [o21…] apply $part, p  $orders for $part

  29. $db1 $cust_id ct i1 Joins and Selections $cust_id $db1 $cust_id $db2 $order $cust_id2 $order_id … $cust_id2=? $db2 $order $cust_id2 $order_id … getD $order, id  $order_id getD $order, cid  $cust_id2 getD $db2, order  $order nestedSrc $part source db, [$db2]

  30. … $order_id $oidL … o1 [o1] … o2 [o2] … $oidL $oidE … [o1] e1 … [o2] e2 e2 order e1 order $orders [e1, e2] Constructors listify $oidE  $orders o2 crEl order, $oidL  $oidE o1 crList $orders  $oidL … $order_id … o1 … o2

  31. Algebra Example

  32. Plan Decomposition • Within Rewriting Optimizer • Rules replacing “leaf” trees • May move commutable parts • Catch: No projection limitation

  33. Plan After Decomposition

  34. Replacing Nested Plans with GroupBy/Outerjoin Combinations apply $part, p  $R apply $part, p  $R p3 p3 nestedSrc $part groupBy S(p1)  $part p2 nestedSrc $part for $part p1 p1 p2

  35. Multiple Possible Plans

  36. Overview • The Need for Data Integration • The Virtual XML View Approach • Query Processing in XML Mediators • Architecture • Algebra • Navigation-driven Evaluation • Quick Overview of Related Topics • Querying the XML View on the Web • Other architectures: a transducer-based model • Beyond Structured Querying • Fuzzy/preference queries & Top-N processing • Unstructured Queries

  37. Building Navigation-Driven Evaluation on the Algebra Client Source access Source access Source Source

  38. $db1 $cust $cust_id ct c1 i1 ct c2 i2 $db1 $cust ct c1 ct c2 Think of Each Operator as a Lazy Mediator root tuple $db1 customer_table customer name John id 56 customer name George id 58 c1 $cust $cust_id i1 tuple getD $cust, id  $cust_id c2 $db1 $cust i2 $cust_id

  39. Navigation-Driven Evaluation of Operators • Augmented with • nextTuple(p) • p.attr Input: client navigations result Lazy Operator Output: source navigations s1 sn ... Result of Operator below Result of Operator below

  40. <f’1, f’2, …, f’n> Operator State V1: V2: … Vn: Other: … Proceed down/right f’1 f’2 … f’n Use of Semantic Id’s in Navigation-Driven Evaluation r/d(<f1, f2, …, fn>) Operator State V1: V2: … Vn: Other: … f1 f2 … fn

  41. Example of Semantic Id:getD X, a  Z root root tuple tuple tuple tuple pv = <value, p’v> pI = <identity, p’I> pB = <binding, p’B, p’’B>

  42. Fragments Reduce the “Set State” – “Produce State” Overhead root customer Hole 3 name, “John” order Hole 2 oid, 123 lineitem lineitem lineitem Hole 1

  43. root customer Hole 3 name, “John” order Hole 5 order ordnum=16 oid, 123 lineitem lineitem lineitem Hole 1 Hole 4 lineitem lineitem

  44. Controlling the Size and Shape of Fragments Client listify Client-Server Interaction Controler listify Source access Source access Source Source

  45.  Fragment Size causes  Memory Footprint causes  Performance

  46. Fragmentation Strategies • Fixed Fragment Size / FCFS • Ideal for depth-first, left-to-right navigation • Adaptive: Assign larger pieces to those who use them • ~ f(Li) / all Lj f(Lj), f is x2

  47. Response Perfomance for Breadth-First and Depth-First Depth First traversal Breadth First traversal

  48. XSM System [VLDB02] Joint Work w/ Bertram Ludaescher, Pratik Mukhopadhay, Yu Xu • Assume sequentially-accessed XML data • Transducer-based Compiled-code XQuery processor • Future high-bandwidth streams • XQuery on chip • XSM Compiler • inputs XQuery, DTD • produces Xml Stream Machine • XSM2C translates into C or Java code

  49. ... <u><v> /v><v><w> <w></v> ...< ... on < init_stream a > do <a> push error on event on a do flush </ > do action ... <a><b> /b><b><c> <c></b> ...< ... Xml Stream Machine output stream XSM on do finite control : Event action <c> <b> <a> input event stack data buffers input streams

  50. XKeyword, XSearch: XML DBs for Unstructured Queries Joint Work w/ Andrey Balmin, Vagelis Hristidis, Yu Xu • (XML) query languages too heavyweight and structured • Need to know structure, semantics, roles • XSearch: keyword proximity queries in trees (lowest common ancestor queries) • XKeyword/DISCOVER: keyword proximity queries for searches in labeled graphs

More Related