1 / 32

Event Stream Processing with Out-of-Order Data Arrival

Event Stream Processing with Out-of-Order Data Arrival. Presenter: Mo Liu Presentation based on: Ming Li, Mo Liu , Luping Ding , Elke A. Rundensteiner, and Murali Mani Worcester Polytechnic Institute, Worcester MA USA DEPSA at ICDCS 2007 , June 29 th 2007, Toronto ON Canada. Outline.

zea
Télécharger la présentation

Event Stream Processing with Out-of-Order Data Arrival

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Event Stream Processing with Out-of-Order Data Arrival Presenter: Mo Liu Presentation based on: Ming Li, Mo Liu, Luping Ding , Elke A. Rundensteiner, and Murali Mani Worcester Polytechnic Institute, Worcester MA USA DEPSA at ICDCS 2007, June 29th 2007, Toronto ON Canada

  2. Outline • Introduction • Preliminary • Problem with Out-of-Order Event Arrival • Solution • Experiment • Conclusion • Related Work

  3. Introduction: Event Stream Processing • Raising interest in the database community • Wild-range and growing applications Example of Event Stream Processing: Shoplifting in Retail Management

  4. Introduction:Complex Event Processing (CEP) • Event Stream Processing Engine • Stream engine specific for event stream query: generic for detecting and extracting expected pattern sequence • Performance gain compared to stream system using joins to handle event sequence query SASE Approach

  5. Introduction:Limitations • Total Order Assumption in event arrivals • Order in which the events are received by the query system is the same as their timestamp order • By this assumption, “later arrival” means “larger timestamp” • What if Out-of-Order? • Out-of-Order data arrival is common in distributed computing environment (i.e., due to network traffic) • Systems based on total order assumption (i.e. SASE) miss qualified results and produce spurious results

  6. Outline • Introduction • Preliminary • Problem with Out-of-Order Event Arrival • Solution • Experiment • Conclusion • Related Work

  7. WD: D.ts – A.ts < 10 secs ( ts:timestamp ) SC: (A,B,D) SSC SS: (A,B,D) PSSC: W = 10 secs Input Event Stream Preliminary:Query Language EVENT <event pattern> [WHERE <qualification>] [WITHIN <window>] Example: EVENT SEQ (A, B, D) WITHIN 10 seconds Queries in SASE assume above language structure

  8. Preliminary:Finding Result Sequences • SSC (Sequence Scan and Construction) Sequence Scan: employs an NFA to detect matches Sequence Construction: constructs expected results • NFA with AIS (Active Instance Stack) AIS associates a stack with each state of the NFA storing the events that triggered the NFA transition to this state • RIP(Most Recent Instance in Previous Stack) field The field records the temporal order relevant to the query

  9. Preliminary:Finding Result Sequences (Cont.) • Example EVENT SEQ(A, B, D) WITHIN 10 Seconds * * A B D 0 1 2 3 [] a3 [a3] b6 [b6] d10 a3 b6 d10 [] a7 [a7] b11 [b11] d15 a3 b6 d15 a3 b11 d15 a7 b11 d15 WD [] a16 S1 S2 S3 b f f a… b a c b a d f c d 1 11 3 5 6 7 10 12 13 15 Timestamp 16 18 18…

  10. Preliminary:Purging Operator States • Example EVENT SEQ(A, B, D) WITHIN 10 Seconds * * A B D 0 1 2 3 PSSC: You see d15 Purge a3 and so on () a3 (b6) d10 (a3) b6 () a7 (b11) d15 (a7) b11 S1 S3 S2 b f f a… b a c b a d f c d 1 11 3 5 6 7 10 12 13 15 Timestamp 16 18 19…

  11. Outline • Introduction • Preliminary • Problem with Out-of-Order Event Arrival • Solution • Experiment • Conclusion • Related Work

  12. Problem with Out-of-Order at SSC:Incomplete Event Retrieval EVENT SEQ(A, B, D) WITHIN 10 Seconds SSC Missing Result b a c b a d f c d a f b f d 11 3 5 6 7 10 12 13 15 1 0 16 18 2 Received Order Out-of-Order Event Arrival * * Produced Result Correct Result A B D 0 1 2 3 a3 b6 d10 a7 b11 d15 a0 b1 d2 a3 b6 d10 a7 b11 d15 Missing! () a3 (b6) d10 (a3) b6 () a7 (b11) d15 (a7) b11

  13. Problem with Out-of-Order at SSC:Event Misplacement Produced Result Correct Result a3 b6 d8 a3 b11 d8 a3 b6 d8 [] a3 [a3] b6 [b6] d10 [] a7 [a7] b11 [b11] d15 Wrong! [b11] d8 Missing! S1 S2 S3 Incorrect AIS Appending b f f a c b a d f c d d b 11 3 5 6 7 10 12 13 15 1 8 18 16 Received Order Out-of-Order Event Arrival

  14. Problem with Out-of-Order at PSSC Purge in SS You see d15 then purge a3 and so on After that, OOO d8 comes  Missing Result! unauthorized AIS purge  CLAIM : Any data purge of active instance stack (AIS) is unauthorized unless total order on the data arrival holds for the input stream EVENT SEQ(A, B, D) WITHIN 10 Seconds * * A B D 0 1 2 3 () a3 (b6) d10 (a3) b6 () a7 (b11) d15 (a7) b11 a3 b6 d8 S1 S3 S2 b f f a c b a d f c d d b 11 3 5 6 7 10 12 13 15 1 8 18 16 Received Order Out-of-Order Event Arrival Example 3 If precise query result is required, and memory resources is limited, WD in SS would not be sufficient for handling Out-of-order event arrival!

  15. Outline • Introduction • Preliminary • Problem with Out-of-Order Event Arrival • Solution • Experiment • Conclusion • Related Work

  16. Solution in SSC • Event Retrieval Mechanism To avoid incomplete retrieval, all states of the NFA need to be set active before the retrieval over the event stream. b a c b a d f c d a f b f d 11 3 5 6 7 10 12 13 15 1 0 16 17 2 Received Order Out-of-Order Event Arrival * * Produced Result A B D 0 1 2 3 a0 b1 d2 a3 b6 d10 a7 b11 d15 … () a0 (a0) b1 (b1) d2 () a3 (b6) d10 (a3) b6 () a7 (b11) d15 (a7) b11

  17. Solution in SSC (Cont.) • AIS Construction Mechanism For avoiding event misplacement, use sort semantics instead of append semantics a3 b8 d10 a7 b8 d10 a3 b8 d15 a7 b8 d15 [] a3 [a3] b6 [] a7 [b8] d10 [a7] b8 [a7] b11 [b11] d15 S1 S3 S2 Correct AIS Appending b b f f a c b a d f c d b 11 3 5 6 7 10 12 13 15 1 8 18 16 Received Order Out-of-Order Event Arrival

  18. SSC Algorithm with Out-of-Order Handling Out-of-Order Handling Incorporated SSC: • Input: (1) Sequence Query “EVENT SEQ (E1, E2, …, Em) WITHIN W”; (2) AIS constructed from previously input events; (3) newly received event ei (under event type Ei) • Output: (1) updated AIS; (2) sequence output of SSC • 1. IF event type Ei is among {E1, E2, …, Em} • 2. insert ei into stack Si (using “sort semantics”) • 3. set ei’s RIP • 4. check the RIP values of the instances in stack Si+1 and reset the ones being affected by ei • 5. produce event sequences containing ei if any

  19. Optimization Out-of-Order Handling Incorporated SSC with AIS_CLOCK: • Input and output: Same as Algorithm 1 • 1. IF event type Ei is among {E1, E2, …, Em} • 2. IF ei.timestamp < AIS_CLOCK • 3. buffer ei • 4. insert ei into stack Si (using “sort semantics”) • 5. set ei’s RIP • 6. check the RIP values of the instances in stack Si+1 and reset the ones being affected • 7. produce event sequences containing ei if any • 8. ELSE • 9. buffer ei • 10. insert ei into stack Si (using “append semantics”) • 11. set ei’s RIP • 12. IF Ei = Em • 13. produce event sequences containing ei if any

  20. SEQ(A, B, D) Purge when f18 is met 18 > 3 + 10 + 4 W = 10 K = 4 [] a3 [a3] b6 [] a7 [a7] b8 [b8] d10 [a7] b11 [b11] d15 Solution for PSSC • Using K-Slack We apply K-Slack based on time units. It assumes that the out-of-ordering in event arrivals is within a range of k time units. That is, an event can be delayed for at most k time units. a3 b6 d8 b f f a c b a d f c d d b 11 3 5 6 7 10 12 13 15 1 8 18 16 Received Order

  21. Purge condition: ei.timestamp + W + K < CLOCK (After waiting for K time units, no out-of-order event with timestamp less than ei + W can arrive. Thus ei will no longer be able to contribute to forming a new candidate event sequence) • CLOCK: Its value equals to largest timestamp seen so far from the received events is maintained.

  22. PSSC Algorithm With Out-of-Order Handling • Out-of-Order Incorporated SSC Purge (PSSC): • Input: (1) current AIS; (2) CLOCK triggering from SSC • Output: updated AIS • 1. On receiving a CLOCK triggering • 2. for event instance e in AIS • 3. IF e.timestamp + W + K < CLOCK • 4. purge e

  23. Optimization 1: AIS partition We can divide each stack in AIS into two parts: outdated event instances(e.timestamp + W + K > CLOCK ) up-to-date event instances. (e.timestamp + W > CLOCK) SEQ(A, B, D) W=7 K=10 (large) SSC output when d13 comes Cost ! a3 b5 d18 a3 b5 d18 a3 b11 d18 a7 b11 d18 … [] b1 [] a3 [] a7 [a3] b5 [b5] d10 divider [a7] b11 [b11] d18 S1 S3 S2 b c b a c b a d f f d f 11 3 4 5 7 10 12 18 1 13 18 15 Received Order Out-of-Order Event Arrival

  24. Optimization 2: Lazy Purge For each CLOCK update, only the instance in the last AIS stack will be checked for data purge. For any instance is purged from there, we can purge instances in other AIS stacks following the RIP path. [b6] d10 [ ] a3 [a3] b6 [b11] d15 [ ] a7 [a7] b11

  25. Outline • Introduction • Preliminary • Problem with Out-of-Order Event Arrival • Solution • Experiment • Conclusion • Related Work

  26. Experiment 1:Sequence Scan and Construction (SSC) SEQ (A, B, C, D, E, F)) CPU gain on applying the AIS_CLOCK Out-of-order data percentage is 90% Y axis cost: Inserting events and resetting RIP

  27. Experiment 2: Applying AIS partition during the SSC purge Performance Gain On Memory Performance Gain on CPU cost

  28. Outline • Introduction • Preliminary • Problem with Out-of-Order Event Arrival • Solution • Experiment • Conclusion • Related Work

  29. Conclusion • In this work, we address the problem of processing event stream with out-of-order data arrival: • we analyze the problems state-of-the-art event stream processing technology would experience when faced with out-of-order data arrival • we propose new implementation and optimization strategies for the core stream algebra operators • we conduct an experimental study that clearly demonstrates the effectiveness of our proposed approach over existing solutions

  30. Outline • Introduction • Preliminary • Problem with Out-of-Order Event Arrival • Solution • Experiment • Conclusion • Related Work

  31. Related Work • Some initial work uses K-slack to investigate the out-of-order problem for homogenous-input stream systems • Aurora deals with out of order within operator-level Order-sensitive operators wait a certain period of time before closing each window • Cayuga system deals with out-of-order by waiting K time unite before all the processing, which has higher latency then ours • Stream punctuation confirms that a certain value or time stamp will no longer appear in the future input streams. It requires certain service to first be created and appropriately associated

  32. Thank you!

More Related