Streaming Queries over Streaming Data

Streaming Queries over Streaming Data Authors : • Sirish Chandrasekaran, UC Berkeley • Michael J. Franklin , UC Berkeley OriginallyPresented at the 28th VLDB Conference, Hong Kong, China, 2002 • Presented by Amit Choudhri for CMSC 891 J

I • Java – based system called PSoup • Processes both ad-hoc and continuous queries • Treats data and queries symmetrically as streams • Streams are duals of each other • New queries can be applied to old data • New data can be applied to old queries

PSoup uses query specifications of the form:SELECT select_list FROM from_list WHERE conjoined_boolean_factors BEGIN begin_time END end_time • Select – From – Where is the Standing Query Clause (SQC) • Begin – End specifies the input window for which results have to be computed.

State Modules (SteM): • Query SteM ( one for all query specifications ) • Data SteM ( one per data stream ) • Historical data querying : new queries probe old data • Continuous querying : new data streams probe old queries • RESULT MATERIALIZATION

Mechanism • Each registered query gets a unique QueryID • Client uses the QueryID as a handle for further invocations. • Between invocations, PSoup matches data to query predicates, stores matches in a Result Structure . • Input window is applied to Result Structure to materialize the current results and return them.

Selection queries over a single stream

Selection query processing : entry of new data

Join Queries over Multiple Streams

Implementation extensions to Telegraph for PSoup : • Eddy : tuple router • SteM : data structures for probe and insert methods over their contents • Red – Black tree – based structure for SteM indexing • Results Structure : to store metadata about the tuples that satisfied the SQC

Performance Analysis • NoMat vs. PSoup – P vs. PSoup – C : • PSoup – P uses bit array for results structure • PSoup – C uses linked list for results structure • Materializing results of queries supports higher query invocation rates • Indexing queries and lazily applying input windows improves the maximum data throughput.

Optimization – removing redundancy in join processing • Using “single query - multiple data “ composite tuples for common predicates . • Using “single data - multiple query “ composite tuples for new data insertion .

Conclusion • PSoup supports queries that require access to data that appeared both before and after the query specification. • PSoup supports disconnected operation by separating computation of results from their delivery using result materialization.

Further work • Make PSoup capable of archiving data streams to disk, instead of its current implementation as a main memory system. • Allow PSoup to be used as a query browser for temporal data instead of only for current window calculations.

Streaming Queries over Streaming Data

Streaming Queries over Streaming Data

Presentation Transcript

Pathfinding Over Streaming Terrain

Streaming Queries over Streaming Data

Multimedia Streaming Over WiMax Networks

Streaming Data, Continuous Queries, and Adaptive Dataflow

Adaptive Video Streaming over ICN

Streaming Video Over the Internet

Streaming Video over the Internet

Multimedia Streaming Over WiMax Networks

Adaptive Video Streaming over ICN

Queries over Streaming Sensor Data

Fjording The Stream An Architecture for Queries over Streaming Sensor Data

Video Streaming over ProtoRINA

Realtime Multimedia Streaming over Internet

Big Data - Streaming

Efficient Evaluation of XQuery over Streaming Data

XPath Queries on Streaming Data

Video Streaming over the Internet

Realistic Media Streaming over BitTorrent

Streaming the Data

MPEG Streaming over Mobile Internet

Streaming : Serie streaming

Streaming video over wireless link