1 / 11

TelegraphCQ: Continuous Dataflow Processing for an Uncertain World

TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein,Wei Hong*, Sailesh Krishnamurthy, Sam Madden, Vijayshankar Raman**, Fred Reiss, and Mehul Shah University of California, Berkeley

zoltin
Télécharger la présentation

TelegraphCQ: Continuous Dataflow Processing for an Uncertain World

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein,Wei Hong*, Sailesh Krishnamurthy, Sam Madden, Vijayshankar Raman**, Fred Reiss, and Mehul Shah University of California, Berkeley *Intel Berkeley Laboratory **IBM Almaden Research Center http://telegraph.cs.berkeley.edu/

  2. Contents • Background and Motivation • Telegraph – Architecture • Window Semantics in TelegraphCQ • TelegraphCQ – Design Overview • TelegraphCQ – Architecture • Conclusion • All diagrams and contents are directly adapted/taken from the paper itself!

  3. TelegraphCQ – Background and Motivation • Adaptive Dataflow Architecture – systems that could adjust their processing on-the-fly in response to • Changes in user needs [HACO+99] • Intermittent delays in accessing data across WANs [UFA98] • Shared Processing • CACQ [MSHR02] • PSoup [CF02] • Limitations - • processing restricted to in-memory data • No scheduling and resource management for queries with little or no overlap • No Quality of Service (QoS) for adapting to resource limitations • No tradeoff between flexibility and overhead

  4. Telegraph - Architecture • Extensible set of composable dataflow modules/operators • Producer-Consumer design with Fjords API • Push as well as Pull queues • Ingress and Caching • Query Processing • Adaptive Routing

  5. Adaptive Processing – Eddies & SteMs • EDDY – • continuously route tuples according to a routing policy • per tuple basis routing requiring associated state to the tuple • SteMs – • Temporary repository of tuples • Stores homogeneous tuples • Supports build (insert), probe (search) and eviction (deletion)operations

  6. Fjords – InterModule Communication • Allow use of mixture of push and pull connections between modules • a pull-queue is implemented using a blocking dequeue on the consumer side and a blocking enqueue on the producer side. • A push-queue is implemented using non-blocking enqueue and dequeue; control is returned to the consumer when the queue is empty • Execute query over any combination of streaming and static data sources Flux – Scaling Up Dataflow Processing • Interposed between a producer-consumer operator pair in a pipelined, partitioned dataflow • Fault-tolerant, Load-balancing eXchange • Load-balancing via online repartitioning of the input stream and corresponding state of operators • Fault-tolerance by leveraging these state movement mechanisms to replicate an operator’s internal state and in-flight data

  7. Initial CQ Approaches CACQ • First CQ engine exploiting adaptive query processing framework • Modification of Eddies- execution of multiple queries by executing a single “super”- query as disjunction of all the queries • Tuple Lineage – state to determine the client • Grouped Filters – index for single variable Boolean factors over the same attribute for optimizing selections in the shared execution PSoup • Extends CACQ • Allows queries to access historical data – treats data and queries symmetrically • Adds support for disconnected operation-users can register queries

  8. Window Semantics in TelegraphCQ • Rich windowing schemes over both already-arrived as well as incoming data • Various window semantics are- • Snapshot query: execute exactly once over one window e.g. “Select the closing prices for MSFT on the first five days of trading” • Landmark query: fixed beginning point and a forward moving endpoint e.g. “Select all the days after the hundredth trading day, on which the closing price of MSFT has been greater than $50. Keep this query standing in the system for a thousand trading days” • Sliding query: forward moving beginning and end e.g. “On every fifth trading day starting today, calculate the average closing price of MSFT for the five most recent trading days. Keep the query standing for fifty trading days” • Temporal Band-Join: join tuples in one stream with those in another based on timestamp e.g. “For the five most recent trading days starting today, select all stocks that closed higher than MSFT on a given day. Keep the query standing for twenty trading days”

  9. TelegraphCQ – Design Overview • Adapted the architecture of PostgreSQL • Implemented the new system in C/C++ to leverage the open source PostgreSQL code base • Reused components with different levels of changes

  10. TelegraphCQ – Architecture • Three processes that comprise the TelegraphCQ server • FrontEnd • Wrapper • Providing Abstraction of External Source • Separate Process( non-blocking) • Executor • Execution Object Providing Execution Context for Multiple Queries • Dispatch Unit Performing Actual Work

  11. Conclusion • TelegraphCQ provides adaptive dataflow and shared processing architecture • Eddy and SteM form building blocks for adaptive processing • Features like Fjord’s inter-module communication (push and pull connections) and Flux – Fault-tolerant and Load-balancing Exchange • CACQ (tuple-lineage and group-filters) PSoup (Symmetrical treatment of data and queries) • Built over the PostgreSQL framework Thank you 

More Related