560 likes | 687 Vues
This document explores an innovative Adaptive Query Execution Engine designed for efficient data integration across autonomous and heterogeneous data sources within LAN, WAN, and Internet environments. Based on the foundational work of Zachary Ives, Alon Halevy, and Hector Garcia-Molina, the engine aims to provide uniform query capabilities, optimize resource allocation through adaptive execution strategies, and enhance query performance in real-time. It tackles challenges such as unreliable data sources and unpredictable transfer rates by interleaving planning and execution, optimizing with rule-based mechanisms, and enhancing latency management.
E N D
An Adaptive Query Execution Engine for Data Integration … Based on Zachary Ives, Alon Halevy & Hector Garcia-Molina’s Slides
Data Integration Systems Uniform query capability across autonomous, heterogeneous data sources on LAN, WAN, or Internet: in enterprises, WWW, big science.
Traditional Query Processing SQL query parse parse tree convert answer logical query plan execute apply laws statistics Pi “improved” l.q.p pick best estimate result sizes {(P1,C1),(P2,C2)...} l.q.p. +sizes estimate costs consider physical plans {P1,P2,…..}
Example: SQL query SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ ); (Find the movies with stars born in 1960)
Example: Logical Query Plan title starName=name StarsIn name birthdate LIKE ‘%1960’ MovieStar
Example: Improved Logical Query Plan title starName=name StarsIn name birthdate LIKE ‘%1960’ MovieStar
Example: Estimate Result Sizes Need expected size StarsIn MovieStar P s
Example: One Physical Plan Parameters: join order, memory size, project attributes,... Hash join SEQ scan index scan Parameters: Select Condition,... StarsIn MovieStar
Example: (Grace) Hash Join • Hash function h, range 0 k • Buckets for R1: G0, G1, ... Gk • Buckets for R2: H0, H1, ... Hk Algorithm (Grace Hash Join) (1) Hash R1 tuples into G buckets (2) Hash R2 tuples into H buckets (3) For i = 0 to k do match tuples in Gi, Hi buckets Partitioning (steps 1—2); probing (step 3)
Simple example hash: even/odd R1 R2 Buckets R1 R2 2 5 Even: 4 4 3 12 Odd: 5 3 8 13 9 8 11 14 2 4 8 4 12 8 14 3 5 9 5 3 13 11
Hash Join Variations Question: what if there is enough memory to hold R1? R2 R1 5 4 12 3 13 8 11 14 2 4 3 5 8 9
Hash Join Variations Question: what if there is enough memory to hold R1? <5,5> • Load entire R1 into memory • (2) Build hash table for R1 • (using hash function h) • (2) For each tuple r2 in R2 do • - read r2 • - probe hash table for R1 using h(r2) • - for matching tuple r1, • output <r1,r2> 2, 4, 8 3, 5, 9 5 R1 hash table R2 R1 5 4 12 3 …
Example: Estimate costs L.Q.P P1 P2 …. Pn C1 C2 …. Cn Pick best!
New Challenges for Processing Queries in DI Systems • Little information for cost estimates • Unreliable, overlapping sources • Unpredictable data transfer rates • Want initial results quickly • Need “smarter” execution
The Tukwila Project • Key idea: build adaptive features into the core of the system • Interleave planning and execution (replan when you know more about your data) • Compensate for lack of information • Rule-based mechanism for changing behavior • Adaptive query operators: • Revival of the double-pipelined join better latency • Collectors (a.k.a. “smart union”) handling overlap
Tukwila Data Integration System Novel components: • Event handler • Optimization-execution loop
Handling Execution Events • Adaptive execution via event-condition-action rules • During execution, eventsgenerated Timeout, n tuples read, operator opens/closes, memory overflows, execution fragment completes, … • Events trigger rules: • Test conditions Memory free, tuples read, operator state, operator active, … • Execution actions Re-optimize, reduce memory, activate/deactivate operator, …
Interleaving Planning and Execution Re-optimize if at unexpected state: • Evaluate at key points, re-optimize un-executed portion of plan [Kabra/DeWitt SIGMOD98] • Plan has pipelined units, fragments • Send back statistics to optimizer. • Maintain optimizer state for later reuse. WHEN end_of_fragment(0) IF card(result) > 100,000 THEN re-optimize
Handling Latency Orders from data source A OrderNo 1234 1235 1399 1500 TrackNo 01-23-45 02-90-85 02-90-85 03-99-10 relation tuple attribute UPS from data source B TrackNo 01-23-45 02-90-85 03-99-10 04-08-30 Status In Transit Delivered Delivered Undeliverable SelectStatus = “Delivered” UPS Data from B often delayed due to high volume of requests
Join Operation Orders Need to combine tuples from Orders & UPS: OrderNo 1234 1235 1399 1500 TrackNo 01-23-45 02-90-85 02-90-85 03-99-10 JoinOrders.TrackNo = UPS.TrackNo(Orders, UPS) UPS OrderNo 1234 1235 1399 1500 TrackNo 01-23-45 02-90-85 02-90-85 03-99-10 Status In Transit Delivered Delivered Delivered TrackNo 01-23-45 02-90-85 03-99-10 04-08-30 Status In Transit Delivered Delivered Undeliverable (2nd TrackNo attribute is removed)
Query Plan Execution • Pipelining vs. materialization • Control flow? • Iterator (top-down) • Most common database model • Easier to implement • Data-driven (bottom-up) • Threads or external scheduling • Better concurrency Query plan represented as data-flow tree: “Show which orders have been delivered” JoinOrders.TrackNo = UPS.TrackNo SelectStatus = “Delivered” Read Orders Read UPS
Standard Hash Join Standard Hash Join • read entire inner • use outer tuples to probe inner hash table Double Pipelined Hash Join
Standard Hash Join • Standard hash join has 2 phases: • Non-pipelined: Read entire inner relation, build hash table • Pipelined: Use tuples from outer relation to probe the hash table • Advantages: • Only one hash table • Low CPU overhead • Disadvantages: • High latency: need to wait for all inner tuples • Asymmetric: need to estimate for inner • Long time to completion: sum of two data sources
Adaptive Operators: Double Pipelined Join Standard Hash Join Double Pipelined Hash Join • Hash table per source • As tuple comes in, add to hash table and probe opposite table
Double-Pipelined Hash Join <5,5> 2, 4 3, 5 5 probe store R1 hash table R2 hash table 5 5 (2) 5 arrives (1) 2,4,3,5 arrived R2 R1 4 12 3 … 8 9
Double-Pipelined Hash Join • Proposed for parallel main-memory databases (Wilschut 1990) • Advantages: • Results as soon as tuples received • Can produce results even when one source delays • Symmetric (do not need to distinguish inner/outer) • Disadvantages: • Require memory for two hash tables • Data-driven!
Double-Pipelined Join Adapted to Iterator Model • Use multiple threads with queues • Each child (A or B) reads tuples until full, then sleeps & awakens parent • DPJoin sleeps until awakened, then: • Joins tuples from QA or QB, returning all matches as output • Wakes owner of queue • Allows overlap of I/O & computation in iterator model • Little overlap between multiple computations DPJoin QA QB A B
Experimental Results(Joining 3 data sources) DPJoin Optimal Std Suboptimal Std Normal: DPJoin (yellow) as fast as optimal standard join (pink) Slow sources: DPJoin much better
Insufficient Memory? • May not be able to fit hash tables in RAM • Strategy for standard hash join • Swap some buckets to overflow files • As new tuples arrive for those buckets, write to files • After current phase, clear memory, repeat join on overflow files
Overflow Strategies • 3 overflow resolution methods: • Naive conversion — conversion into standard join; asymmetric • Flush left hash table to disk, pause left source • Read in right source • Pipeline tuples from left source • Incremental Left Flush — “lazy” version of above • As above, but flush minimally • When reading tuples from right, still probe left hash table • Incremental Symmetric Flush — remains symmetric • Choose same hash bucket from both tables to flush
Simple Algorithmic Comparison • Assume s tuples per source (i.e. sources of equal size), memory fits m tuples: • Left Flush: If m/2 < sm (only left overflows): Cost = 2(s - m/2) = 2s - m If m < s 2m (both overflow): Cost = 2(s + m2/2s - 3m/2) + 2(s - m) = 4s - 4m + m2/s • Symmetric Flush: If s < 2m: Cost = 2(2s - m) = 4s - 2m
Symmetric Increm. Left Naive Optimal (16MB) Experimental Results Low memory (4MB): symmetric as fast as optimal Medium memory (8MB): incremental left is nearly optimal • Adequate performance for overflow cases • Left flush consistent output; symmetric sometimes faster
Adaptive Operators: Collector Utilize mirrors and overlapping sources to produce results quickly • Dynamically adjust to source speed & availability • Scale to many sources without exceeding net bandwidth • Based on policy expressed via rules WHEN timeout(CustReviews) DO activate(NYTimes), activate(alt.books) WHEN timeout(NYTimes) DO activate(alt.books)
Summary • DPJoin shows benefits over standard joins • Possible to handle out-of-memory conditions efficiently • Experiments suggest optimal strategy depends on: • Sizes of sources • Amount of memory • I/O speed
The Unsolved Problem • Find interleaving points? When to switch from optimization to execution? • Some straightforward solutions worked reasonably, but student who was supposed to solve the problem graduated prematurely. • Some work on this problem: • Rick Cole (Informix) • Benninghoff & Maier (OGI). • One solution being explored: execute first and break pipeline later as needed. • Another solution: change operator ordering in mid-flight (Eddies, Avnur & Hellerstein).
More Urgent Problems • Users want answers immediately: • Optimize time to first tuple • Give approximate results earlier. • XML emerges as a preferred platform for data integration: • But all existing XML query processors are based on first loading XML into a repository.
Tukwila Version 2 • Able to transform, integrate and query arbitrary XML documents. • Support for output of query results as early as possible: • Streaming model of XML query execution. • Efficient execution over remote sources that are subject to frequent updates. • Philosophy: how can we adapt relational and object-relational execution engines to work with XML?
Tukwila V2 Highlights • The X-scan operator that maps XML data into tuples of subtrees. • Support for efficient memory representation of subtrees (use references to minimize replication). • Special operators for combining and structuring bindings as XML output.
Example XML File <db> <book publisher="mkp"> <title>Readings in Database Systems</title> <editors> <name>Stonebraker</name> <name>Hellerstein</name> </editors> <isbn>123-456-X</isbn> </book><company ID="mkp"> <name>Morgan Kaufmann</title> <city>San Mateo</city> <state>CA</state> </company> </db>
Example Query WHERE <db> <book publisher=$pID> <title>$t</> </> ELEMENT_AS $b </> IN "books.xml", <db> <publication title=$t> <source ID=$pID>$p</> <price>$pr</> </> </> IN "amazon.xml", $pr < 49.95 CONSTRUCT <book> <name>$t</> <publisher>$p</> </>
Query Execution Plan
X-Scan • The operator at the leaves of the plan. • Given an XML stream and a set of regular expressions – produces a set of bindings. • Supports both trees and graph data. • Uses a set of state machines to traverse match the patterns. • Maintains a list to unseen element Ids, and resolves them upon arrival.
Other Features of Tukwila V.2 • X-scan: • Can also be made to preserve XML order. • Careful handling of cycles in the XML graph. • Can apply certain selections to the bindings. • Uses much of the code of Tukwila I. • No modifications to traditional operators. • XML output producing operators. • Nest operator.
In the “Pipeline” • Partial answers: no blocking. Produce approximate answers as data is streaming. • Policies for recovering from memory overflow [More Zack]. • Efficient updating of XML documents (and an XML update language) [w/Tatarinov] • Dan Suciu: a modular/composable toolset for manipulating XML. • Automatic generation of data source descriptions (Doan & Domingos)