1 / 33

Outline

NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences Dept. University of Wisconsin-Madison SIGMOD 2000 Talk by Naresh Kumar. Outline. Motivation What is NiagaraCQ ?

whitby
Télécharger la présentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NiagaraCQ : A Scalable Continuous Query System for Internet Databases(modified slides available on course webpage)Jianjun Chen et al Computer Sciences Dept. University of Wisconsin-MadisonSIGMOD 2000Talk by Naresh Kumar

  2. Outline Motivation What is NiagaraCQ ? General strategy of incremental group optimization Query split scheme with materialized intermediate files Incremental grouping of selection and join operators Experimental Details

  3. The two fundamental difference btn DBMS and DSMS In addition to managing traditional stored data, DSMS must handle multiple continuous, unbounded, possibly rapid data streams A DSMS supports long running continuous queries. Data Stream Management System(DSMS)‏

  4. Continuous Queries(CQ)‏ Continuous queries are persistent queries that allow user to receive results when they become available. Example Notify me when ever price of Dell stock drops by more than 5% A broad classification Change based Timer based

  5. Motivation Continuous queries are growingly popular. Notifies user with out querying repeatedly. Much useful in internet where information changes frequently Challenges: Need to be able to support million of queries due to scale of internet. Solution - NiagaraCQ

  6. Previous Attempts Previous group optimization efforts focused on finding an optimal plan for a small number of similar queries Not applicable to a continuous query system for the following reasons: Computationally too expensive to handle a large number of queries. Not designed for an environment like the web where CQ s are added or removed dynamically.

  7. What is NiagaraCQ ? NiagaraCQ–A CQ system for the internet It is built on the assumption that Many queries tend to be similar to one another. Similar queries can be grouped together It supports scalable continuous query processing over multiple, distributed XML files

  8. NiagaraCQ - Approaches Advantages of grouping : Grouped queries can share computation. They can reside in memory saving IO-cost Avoid unnecessary invocations by testing many CQ together Handles both change based and timer based queries in a uniform way To ensure scalability: Incremental evaluation of CQ's Memory caching

  9. NiagaraCQ command language Creating a CQ CreateCQ_name XML-QL query Doaction { START start_time} { EVERY time_interval} { EXPIRE expiration_time} Delete CQ_name

  10. Expression Signature Query examples Where <Quotes><Quote><Symbol>INTC</></></> element_as $g in “http://www.stock.com/quotes.xml” construct $g Where <Quotes><Quote><Symbol>MSFT</></></> element_as $g in “http://www.stock.com/quotes.xml” construct $g Expression signatures = Quotes.Quote.Symbol in quotes.xml constant

  11. Query plans Trigger Action I Trigger Action J Select Symbol = “INTC” Select Symbol = “MSFT” File Scan File Scan quotes.xml quotes.xml

  12. Group Group – signature, constant table, plan Group Signature Common signature of all queries in the group Group constant table Constant_value Dest_buffer INTC Dest. I MSFT Dest. J

  13. The group plan

  14. Incremental Grouping Algo When a new query is submitted If the expression signature of the new query matches that of existing groups Break the query plan into two parts Remove the lower part Add the upper part onto the group plan else create a new group

  15. New Query **AOL added to Constant Table **new destination buffer allocated **Matching process continues until top

  16. Query split scheme • Matching Process will continue on the remainder of query plan until top of plan is reached • Thus, each continuous query is split into several smaller queries such that inputs of each of these queries are monitored using the same techniques that are used for the inputs of user­defined continuous queries. • Incremental group optimization is very efficient because it only requires one traversal of query

  17. Query split with materialized intermediate files Destination buffer for the split operator can be implemented in Pipelined scheme Intermediate file scheme Disadvantages of pipelined scheme It does not work for grouping timer based queries Gives a single complicated execution plan Combined plan may be very large and require resources beyond limit of system A large portion of query plan may not need to be executed at each invocation Split operator may block simple queries

  18. Query split with materialized intermediate files(cont...) Using intermediate files Split operator writes each output stream to a file Cut query plan into 2 parts at split operator Add a file scan operator to upper part to read intermediate file Intermediate files are monitored just like other data sources Intermediate file names are stored in the constant table Grouped CQ with same constant share same intermediate file

  19. The query split scheme

  20. Trade-offs Other advantages of materialized intermediate files Only the necessary queries are executed. Thus computation time reduced Tree structured query format – can easily scheduled and executed by general query engine Uniform handling of intermediate files and original data source files Bottle neck problem is avoided Disadvantages Split operator becomes a blocking operator Extra disk I/Os

  21. Range Predicates E.g. R.a < val or val1 < R.a < val2 Multiple such ranges Problem Intermediate files may contain duplicate tuples Idea: Virtual intermediate files Use an index to implement this

  22. Incremental grouping of selection predicates Multiple selection predicates in a query CNF for predicates on same data source Incremental grouping Choose the most selective conjunct and implement virtual file on this conjunct Example query Where <Quotes><Quote><Symbol>”INTC”</> <Current_Price>$p</></> element_as $g </> in “quotes.xml”, $p < 100 Construct $g

  23. Incremental grouping of join operators Join operators are usually expensive, sharing common join operations can significantly reduce the amount of computation. A join query Quotes.Quote.Change_Ratio constant in “quotes.xml” Where <Quotes><Quote><Symbol>$s</></> element_as $g </> in “quotes.xml”, <Companies><Company><Symbol>$s</></> element_as $t</> in “companies.xml” construct $g, $t

  24. Queries that contain both join and selection Example query : Where <Quotes><Quote><Symbol>$s</> <Industry>”Computer Service”</></> element_as $g </> in “quotes.xml”, <Companies><Company><Symbol>$s</></> element_as $t</> in “companies.xml” construct $g, $t Where to place the selection operator ? Below the join Eliminates irrelevant tuples Above the join Allows sharing Pick based on cost model

  25. Grouping timer-based queries Challenge Hard to monitor the timer events of queries Sharing common computation becomes difficult Event Detection Stores time events sorted in time order Each query has an ID

  26. Incremental evaluation Invoke queries only on changed data For each source file, NiagaraCQ keeps a delta file Also for the intermediate files Time stamp store the each tuple – for timer based queries Incremental evaluation of join operators requires complete data files

  27. Memory Caching Thousands of continuous queries can’t fit in memory What should we cache ? Grouped query plans What about non-grouped queries ? Favor small delta files Time window of the event list

  28. System Architecture

  29. CQ processing

  30. Experimental Results Example query : Where <Quotes><Quote><Symbol>”INTC”</></> element_as $g </> in “quotes.xml”, construct $g N = number of installed queries F= number of fired queries C = number of tuples modified

  31. Performance Results Case 1: F=N, C=1000 Case 2: F=100, C=1000

  32. Performance Results F=N=2000, vary data size

  33. Thank You

More Related