1 / 35

Complex Event Analytics: Online Aggregation of Stream Sequence Patterns

This article provides an overview of complex event processing (CEP) and explores the challenges and problem definitions in aggregation queries. It introduces A-Seq, a framework for CEP aggregation, and discusses basic, advanced, and multi-pattern A-Seq. The article also discusses the genealogy of complex event processing and explores the implementation challenges in the modern world of data streams. It covers CEP evaluation mechanisms and presents examples and evaluation mechanisms for pattern detection. Open problems in CEP, such as multi-pattern evaluation and adaptive CEP, are also discussed.

cyr
Télécharger la présentation

Complex Event Analytics: Online Aggregation of Stream Sequence Patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Complex Event Analytics: Online Aggregation of StreamSequence Patterns YingmeiQiy, Lei Caoz, MedhabiRayz, Elke A. Rundensteiner

  2. Overview • Complex event processing (CEP) • what is it all about • why is it difficult • Aggregation queries in CEP • problem definition • why is the trivial solution insufficient • A-Seq – a framework for CEP aggregation • basic A-Seq • “advanced” A-Seq • multi-pattern A-Seq • Experimental results • If time permits – analysis and discussion

  3. Complex Event Processing Genealogy

  4. Stream Processing • In the modern world, we often deal with data streams rather than with relations • news feeds, tweets, sensor networks etc. • A necessary step for implementing the Internet of Things • New challenges and constraints • tight real-time requirements • very limited local memory • all algorithms are one-pass by definition • problem examples: • sliding-window statistics • heavy hitters • distinct items • mostly approximate solutions (ε-δ)

  5. Complex Event Processing • each data item is viewed as a primitiveevent belonging to a predefined event type • an event is defined by a combination of • type • occurrence timestamp • set of attributes • primitive events are combined into complex events which conform to user-defined patterns • logical connections between events • temporal constraints • mutual conditions • and more • the goal of a CEP system is to extract all complex events from the input stream(s)

  6. Example 1 – Security Surveillance System b b a b a b b a a c c a a c a d d d d SEQ(MainLobbyCameraEvent a, CorridorCameraEvent b, RestrictedAreaCameraEvent c) WHERE (a.person_id == b.person_id == c.person_id)

  7. Example 2 – Monitoring Stock Prices SEQ(GoogleStockPriceUpdate a, MicrosoftStockPriceUpdate b, AppleStockPriceUpdate c) WHERE ((a.price < b.price) AND (b.price < c.price)) WITHIN 10 minutes

  8. CEP Evaluation Mechanisms • A variety of methods and structures have been proposed for detecting complex event patterns • Event Processing Networks • Evaluation Trees • Non-deterministic Finite Automata (NFA) • the most prominent and widely employed model • this talk will focus on this mechanism

  9. NFA for Pattern From Example 2 ignore b,c ignore a,c ignore a,b SEQ(GoogleStockPriceUpdate a, MicrosoftStockPriceUpdate b, AppleStockPriceUpdate c) WHERE ((a.price < b.price) AND (b.price < c.price)) WITHIN 10 minutes Q1 Q2 Q3 F take a take b take c a.price < b.price b.price < c.price

  10. Pattern Detection Example a1p=1 a2p=1 a3p=4 b1p=7 b2p=9 b3p=5 c1p=6 a1 a1b3 с a2 a3 a1b1 a2b3 с a2b1 a3b1 a1b2 a3b3 с a2b2 a3b2 a1b3 a2b3 a3b3

  11. Complex Event Processing is difficult • The number of partial matches to be maintained during the detection process is exponential in pattern size • Results in a severe bottleneck affecting • throughput • response time • overall memory consumption

  12. Beyond the Sequence Pattern • Conjunctions (no temporal order is defined) • Disjunctions (multiple paths to the final state) • Negations • i.e., SEQ(A,B,!C,D) • Kleene closure (an unlimited number of event occurrences is allowed) • Combinations of the above • We will mostly focus on sequences today

  13. Open Problems in Complex Event Processing • Multi-pattern evaluation • Adaptive CEP • Parallel and distributed CEP • Detecting events under uncertainty

  14. CEP Aggregation SEQ(GoogleStockPriceUpdate a, MicrosoftStockPriceUpdate b, AppleStockPriceUpdate c) WHERE ((a.price < b.price) AND (b.price < c.price)) WITHIN 10 minutes AGG COUNT [AGG AVERAGE(a.price)] [AGG VARIANCE(a.price)]

  15. A-Seq – The Idea • Existing CEP applications handle aggregation as a post-processing step • i.e., as a by-product of pattern detection • However, we don’t need to explicitly construct the pattern matches to count (or otherwise aggregate) them! • A-Seq operates by pushing the aggregation computation into the detection process

  16. Basic A-Seq: Dynamic Prefix Counting • Given a pattern define a counter for each prefix • When an event of type arrives, update the counter for using the following rule: • For just increment the counter • The answer for an aggregate COUNT query is the value of • what about other aggregates?

  17. DPC Example for SEQ(A,B,C,D) a2 b1 d1 a3 c1 c2 d2 a1 2 2 4 4 0 3 2 1 2 0 4 0 2 4 0 A AB ABC ABCD

  18. What About Sliding Windows? • Real-life patterns always include time window constraints (the WITHIN clause) • In presence of expiring events the above method will not work • We need to invalidate all partial matches containing the expired event • But A-Seq does not explicitly store the matches, so how can we know how many of them to invalidate?

  19. A-Seq Start Event Marking • Replicate the array of counters for each event that belongs to the leading type of the sequence • e.g., for SEQ(A,B,C,D), a separate structure will be maintained for every instance of A • Initialize a new list of counters upon arrival of such event • When an event expires, delete the corresponding list • The number of matches is the sum of all counters

  20. A-Seq SEM Example for SEQ(A,B,C,D) a2 b1 d1 a3 c1 c2 d2 a1 Current Time = 4 Current Time = 7 Current Time = 6 Current Time = 8 Current Time = 5 Current Time = 3 Current Time = 2 Current Time = 1 Current Time = 0 Time Window = 7 a1 ts=1 a3 ts=5 a2 ts=2 1 0 0 0 1 0 0 0 1 2 0 0 0 0 0 0 A A A AB AB AB ABC ABC ABC ABCD ABCD ABCD 1 1 1 2 2 1 0

  21. What About Patterns With Negation? • Sometimes we would like to prohibit some events from appearing at specific positions in a pattern • e.g., in SEQ(A,B,!E,C,D), a pattern match (A,B,C,D) is considered invalid if an event of type E appears anywhere between B and C • Usually solved as a post-processing steps in automata-based CEP systems

  22. The Immediate Re-Count Solution • When a negative event arrives, reset the counter corresponding to the longest positive prefix • In the above example of SEQ(A,B,!E,C,D), set when an event of type E is received

  23. A-Seq Complexity Analysis • Let n denote the length of the sequence pattern • Let k denote the expected number of events of the leading type in the time window, i.e., the frequency of • Then, the space required for A-Seq is O(kn) • Only O(k) operations are performed per event • What is the potential problem?...

  24. Handling Predicates • Recall the predicate (a.price < b.price < c.price) from an earlier example • When patterns are allowed to specify arbitrary constraints of this form, it is no longer possible to count the matches without evaluating them explicitly • In other words, A-Seq does not scale well to general sequence patterns • Only limited cases are supported • filter predicates • simple equivalence predicates

  25. Experimental Results A throughput gain of up to 3 orders of magnitude A memory consumption gain of up to 5 orders of magnitude

  26. Multi-Pattern CEP • Assume that our CEP engine is tasked with detecting the following sequence patterns: • What can we do to optimize system performance? (A,B,C) (A,B,D) (E,F,G,H) (F,G,H,D,I) (F,G,A,B) (C,B,D,J) (A,B,C) (A,B,D) (E,F,G,H) (F,G,H,D,I) (F,G,A,B) (C,B,D,J) (A,B,C) (A,B,D) (E,F,G,H) (F,G,H,D,I) (F,G,A,B) (C,B,D,J) (A,B,C) (A,B,D) (E,F,G,H) (F,G,H,D,I) (F,G,A,B) (C,B,D,J)

  27. Multi-Pattern A-Seq – Sharing Prefixes • The simplest case – all sequences share a common prefix • Build a prefix tree of the participating patterns, with each node counting the appearances of the corresponding prefix • Apply the basic A-Seq algorithm on this tree

  28. Sharing Prefixes Example p1=SEQ(A,B,C) p2=SEQ(A,B,D) a2 b1 d1 a3 c1 c2 d2 a1 2 0 4 1 2 3 0 2 0 ABC 0 2 4 A AB ABD

  29. Multi-Pattern A-Seq – Sharing Arbitrary Sub-Patterns • What about a common sub-pattern located in the middle of a pattern? • e.g., SEQ(A,B,C,D,E) and SEQ(F,G,C,D,H) • Can be achieved as follows • divide the first pattern into sub-patterns (A,B) and (C,D,E) • divide the second pattern into sub-patterns (F,G) and (C,D,H) • calculate the aggregates for the four new patterns while using the above technique to share (C,D,E) and (C,D,H) • combine the results to get the aggregates for the initial patterns (A,B,C,D,E) and (F,G,C,D,H) • Wait, but how do we split and combine patterns?

  30. The Chop-Connect Method • Let • Define • Then, for each event e of type , the number of matches for p containing this event equals to the product of: • the counter of upon the arrival of e • the number of matches for starting with e • An exact answer can be obtained by attaching the counters of to each counting structure of

  31. Chop-Connect Example for SEQ(A,B,C,D) a2 b1 d1 a3 c1 c2 d2 a1 Count(ABCD)c1=(1+1+0)*1=2 Count(ABCD)c2=(1+1+0)*1=2 a1 a3 c1 c2 a2 Count(ABCD)=4 0 0 1 0 0 0 1 0 0 0 0 0 1 1 1 1 CD C C A A A CD AB AB AB 1 1 1 1 1 1 1 0 0

  32. Chop-Connect Complexity Analysis • Let denote the lengths of respectively • Let denote the expected numbers (inside the time window) of events of the types respectively • Then, the space requirement is • operations are performed per event • For a general case, exponential in the number of sub-sequences

  33. Experimental Results Prefix Sharing Chop-Connect

  34. Conclusions • A-Seq provides an efficient and precise solution for certain, limited cases of single- and multi-pattern CEP aggregation • However, the requirement of keeping all intermediate results still cannot be avoided for more complex scenarios • arbitrary inter-event predicates • pattern types other than the sequence • It is unclear how to extend the described techniques to handle general, real-life patterns

  35. Questions?

More Related