1 / 8

Engine Design: Stream Operators Everywhere

Engine Design: Stream Operators Everywhere. Theodore Johnson AT&T Labs – Research johnsont@research.att.com. Contributors: Chuck Cranor Vladislav Shkapenyuk Oliver Spatscheck. Early Data Reduction. Goal : Query high-speed links using inexpensive off-the-shelf servers.

fauna
Télécharger la présentation

Engine Design: Stream Operators Everywhere

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Engine Design:Stream Operators Everywhere Theodore Johnson AT&T Labs – Research johnsont@research.att.com Contributors: Chuck Cranor Vladislav Shkapenyuk Oliver Spatscheck

  2. Early Data Reduction • Goal : Query high-speed links using inexpensive off-the-shelf servers. • OC48 : 2 x 2.4 Gb/sec., 7 million packets/sec. • OC192 : 2 x 7.2 Gb/sec., 21 million packets/sec. • Goal : Evaluate queries over every bit of every packet. • Problem : Not enough cycles in a second. • 3 Ghz / 21 Mpacket/sec = 142 cycles / packet • Solution : Push data reduction operators as far down the protocol stack as possible. • Into the hardware if possible. • View hardware bit twiddling as stream operators.

  3. Early Data Reduction in Gigascope • Gigascope was designed to monitor very high speed (optical) links using complex query sets. • Multiple levels of data reduction: • Data reduction in the NIC : depends on NIC capabilities • Snap length (projection) • BPF filters • Approximate filtering (bitmasks) • Data reduction queries (replace the NIC run time system) • Low level queries • Run queries on kernel input buffers • Preliminary filter for the query set • Other possibilities ….

  4. Network Interface card Example: Router Monitoring High Level Queries • Selection/projection/aggregation • Pre-filter Low Level Queries Kernel Libpcap / BPF filters Circular Buffer Router • Snap length (projection) • Approximate filter (selection) • Selection/projection/aggregation queries (replace run time system) Select Stream Network Tap

  5. Stream Operators • Problem : Great heterogeneity in the specifics of manipulating the hardware mechanism • Stream selection vs. NIC filters vs. kernel filters, etc. • Programmable NIC vs. bit-twiddling NIC vs. non-programmable NIC, etc. • Solution : • Define a set of stream operators to evaluate the stream query. • Selection, projection, (partial) aggregation • Merge, join, reorder ? • Define hardware capabilities as the types of queries they can execute • Multiple query optimization over the query set • Low level query nodes feed multiple user queries

  6. Example (network monitoring) selecttimestamp, sourceIP, destIP, source_port, dest_port, len, total_length, gp_header from GAMEPROTOCOL wheresample_hash[50, sourceIP, destIP] and protocol=17 and offset=0 • NIC : snap_len = 134 (projection) • Pre-filter : protocol=17 and offset=0 • Low-level query : selecttimestamp, sourceIP, destIP, source_port, dest_port, len, total_length, gp_header from GAMEPROTOCOL wheresample_hash[50, sourceIP, destIP] and protocol=17 and offset=0

  7. Other Operators? ordered stream • Merge : Some NICs deliver packets out of order … • Optical links are not duplex Almost ordered stream Stream Merge In Buffer Out Buffer In Buffer Out Buffer NIC NIC timestamp timestamp

  8. Summary • Early data reduction is critical for monitoring very high-speed streams • Selection, projection, aggregation. • Use stream operators to mask the complexity and heterogenity of hardware / kernel data reduction. • Issues : • Multiple query optimization • Push more complex operators down the stack? • Join? Stratified sampling? Sketches? • Optimization at low level / hardware level • Approximate filters • Avoid duplicate filters. Where to place them? • Re-organization when the query set changes.

More Related