1 / 58

P EA CE- Ful Web Event Extraction and Processing as Bitemporal Mutable Events

This paper explores bitemporal event extraction and processing for accurate and up-to-date information retrieval from the web. It introduces a high-level language for defining complex events and accompanying business language for reacting to detected events.

omat
Télécharger la présentation

P EA CE- Ful Web Event Extraction and Processing as Bitemporal Mutable Events

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PEACE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events  TIM FURCHE and GIOVANNI GRASSO, Oxford University  MICHAEL HUEMER, Johannes Kepler University  CHRISTIAN SCHALLHART, Oxford University  MICHAEL SCHREFL, Johannes Kepler University  ACM Transactions on the Web(TWEB) 10, 3(2016). 16:1-16:46 nanadama 171019

  2. the Author Tim Furche Lecturer Oxford University Senior research manager Meltwater • He works on linear evaluation of Web queries on graph data, reasoning,web extraction, web and object search, streamed query evaluation PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  3. the Author’s papers Based on 2 previouspapers: • Tim Furcheet al. Schrefl:BitemporalComplex Event Processing of Web Event Advertisements. WISE (2) 2013: 333-346 • Tim Furcheet al. PeaCE-FulWeb Event Extractionand Processing. WISE (2) 2013: 523-526 Others DIADEM: domein-centric, intelligent, automateddataextractionmethodology. WWW Companion Volume 2012, DEXA 2012, PVLDB 2014. →20150707_kei/ OXPath: an extension of XPath for interacting with web applications and for extracting information thus revealed. WWW Companion Volume 2011, EDBT 2011, PVLDB 2011, VLDB Journal 2013 →20140520_kei/ PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  4. Web Event Extraction PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  5. PRELIMINAL: What is “Bitemporal”? • Temporal - Track system time and handle data on "Recording state". • Where did John Thomas lived on August 20? • Where was the blue van on October 12? • Bitemporal - handles data on both system time and validtime. By considering "actual state" and "state on record" together, you can grasp the information at that time. • Based on the information we were grasping as of September 1, where did John Thomas live on August 20? • As we recognized on October 23, where did you think the blue van was on October 12? http://jp.marklogic.com/what-is-marklogic/whats-new/bitemporal/ PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  6. 1. INTRODUCTION Most events are announced at firstand often exclusively on the web. ○ up-to-date, more complex, rapid interaction challenges: inaccurate advertisements that are possibly revised later. PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  7. 1. INTRODUCTION e.g.: Transit passengers(Narita→Dubai→Vienna?) They must continuously check information of • both of flight (delay) • related events (dinner, pick up service, conference, business trip etc.) Complex problem:Not only may schedule flights arrive late, but also the event announcements for such events may be advertised late. →Considering bitemporality of events; event’s occurrence time & advertisement(detected) time(on the Web) PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  8. 1.1. Background and Rationale of Our Approach to Web Event Extraction and Processing • Business events are mutable. • The authors’ approach: extracting events from Web resource and considering as business events(mutable) • They introduce • BICEPL, high-level language to define complex events • accompanying business language to define how to react to detected events PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  9. 1.1. Background and Rationale of Our Approach to Web Event Extraction and Processing • Event oriented approach(like Object-Oriented) • Condition-action rules are always associated with an event class • Event-occurrence is tied to real-world time • Complex events • high-level declarative approach to specify • define their semantics by mapping them to SQL • Lifting the perfect-world assumption • detecting and processing may be late(imperfect system) • changing attributes and occurrence time(imperfect human) PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  10. 1.2. Application Domains Domains • Flight Arrivals and Departure Announce(Running Example) • Logistics(distribution) and Health Care(Hospital) Chronon-based model • time advances at the size of chronon(1sec, 15sec, 1min...) PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  11. 1.3. Contributions and Organization PEACE(Processing Event Ads into Complex Events) PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  12. 2. COMPONENTS OF PEACE AND RUNNING EXAMPLE e.g. Business travelers and their arrangements Flight delay→Checking for connecting flights →Changing hotel reservations PEACE realize automatically DETECTING CHANGES and RESPONSE(mail for concerned, change request using API, etc. ) PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  13. 2.1. Components of PEACE Event Detector OXPath Wrapper Action Executor OXPath Wrapper Event Processor - BICEPL EventClass PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  14. 2.1. Components of PEACE • Event Detector: Observes web resources(e.g. Flight information Web Page) • Bitemporal Complex Event Processor: pulls events from event detectors or other event processors and derive new complex events. • Action Executor PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  15. Implementation of Running Example(Show 16:32, Fig. 8.) PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  16. 2.2. Event Classes in the Running Example Note: Since PEACE deals with bitemporal events, all event has “Occurrence Time” and “Detection Time” attributes. PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  17. 2.2. Event Classes in the Running Example PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  18. 2.2. Event Classes in the Running Example Details of complex event classes • MissedConnectingFlight A flight is considered to be missed if the business traveler has less than 30 minutes between his or her connecting flights. • ArrivedAtDestination The business traveler is considered to have arrived at his or her destination if all of his or her booked connecting flights took place and he or she did not miss any of them. PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  19. 2.3. Application Setup Tasks for PEACE setup: • Event wrappers in OXPATH, specifying web interactions leading to the events and the location of the attributes to be extracted per event.(HOW TO EXTRACT) • Action wrappers in OXPATH, performing some web interaction, such as posting a Twitter message.(HOW TO ACT) PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  20. 2.3. Application Setup Tasks for PEACE setup: • Subscribed event classes in BICEPL, specifying the schema of their events and capturing the event attributes extracted by the event wrappers, optionally with condition-action statements. (HOW TO EXPRESS Subscribed Events) • Complex event classes in BICEPL, specifying queries to aggregate subscribed events into complex events and condition-action statements applied on the arising events. (HOW TO EXPRESS Complex Events) PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  21. 2.4. Specifying Conditional Actions with PEACE Perfect system assumption • Events are immutable and known before or at the time they occur. Additional statements(Lifting perfect system assumption) • Events are mutable and have been delivered late. PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  22. 2.4. Specifying Conditional Actions with PEACE PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  23. 2.4. Specifying Conditional Actions with PEACE Example: • Published at 10:00, “arrival time of 2:00” →ANNOUNCEMENT&FUTURE • Updated at 1:00, “arrival time of 2:15” →CHANGE • At 2:15 →ONTIME Another case– If didn’t arrive at 2:15 • Update at 3:00, “arrival time of 3:30” →CHANGE&POSTPONE PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  24. 3. EXTRACTING EVENT ANNOUNCEMENTS & EXECUTING ACTIONS IN OXPATH OXPath: the language for highly efficient web automation and data extraction Focus on 4 extensions(from XPath): • Actions, such as mouse events or typing, to simulate user interactions • Kleene stars to iterate, for example, to access multiple pages within a paginated result • Style axis to query visual attributes to select, for example, all elements colored green • Extraction markers to extract data from the DOM into (nested) records and attributes PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  25. 3.1. Detecting Events~Example of detecting FlightArrival Events~ Fig.5 Form and result page on www.flightarrivals.com. PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  26. 3.2. Executing Actions~Example of Action for Posting on Twitter~ PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  27. 4. EVENT PROCESSING WITH BICEPL BICEPL (BI-temporal Complex Event Processing Language) • specify condition-action statements for all events • define complex event classes based on other constituent classes(extended to SQL-SELECT statement) BICEPL keeps preceding revision of each event(for bitemporal) →If keep all revision, buffer events indefinitely!(IDEAL) →Purges events with their lifespan(REALISTIC) “sliding window semantics” determined from freezing time and observation span PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  28. 4.1. Syntax of BICEPL ~ Subscribed Event <sclass>: subscribed event <freezing time>: how long a mutable event may changes PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  29. Example 4.1 ~ Subscribed Event Class FlightArrival ~ PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  30. 4.1. Syntax of BICEPL ~ Complex Event <cclass>: complex event <cond_action>: combination of condition and action <cond> Boolean combination • compares two values(NEW, OLD, NOW) • checks timing condition(ANNOUNCEMENT, CANCELLATION, etc.) PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  31. Example 4.2 ~ Complex Event Class MissedConnectingFlight ~ Condition-Action Clause PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  32. Example 4.3 ~ Complex Event Class ArrivedAtDestination ~ Condition-Action Clause PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  33. 4.2. Event-Condition-Action Model Underlying BICEPL ⊂+ is irreflexive(E + E for all E ∈ E)and assymetric(E ⊂+ E implies E +E for all E, E ∈ E) PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  34. 4.3. Mapping BICEPL into its Event-Condition-Action Model Of course, BICEPL class declarations cannot use directly for DBMS.→ Rewrite to SQL query by rewriting rules. • Subscribed Event Class→simple & directly(using ID clause) • Complex Event Class→Need to rewrite to SQL • SQL-Query Rewriting→add implicit occurrence time attribute occ. • Condition-Action Rewriting→Next Slide PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  35. 4.3. Mapping BICEPL into its Event-Condition-Action Model Rewriting Rules. Example 4.3 – Arrived At Destination Class. Rewrited POSTPONE statements. PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  36. 4.4. Buffering Semantics • A BICEPL program observes a seqence of pairs (Oi, ti) • Oi: the set of subscribed events detected up to ti . • Depending on the changing observations and wall-clock time, E triggers the execution of some actions. • Definition the semantices [[E]] (Oi, ti ,Oi-1, ti-1) as a set of action tuples at each time ti. • To obtain the corresponding complex event, fix the order of complex events Ck ti-1 ti PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  37. 4.4. Buffering Semantics Oi: the set of subscribed events detected up to ti.(Observed) Di :the set of complex events at time instant ti. (Derived) triggered(D i, D i-1 ): the triggered action at ti Definition 4.6: define the buffering semantics at ti PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  38. Table 1. Subscribed event repositories. Table 2. Complex event repositories. Publish on Twitter ”Not landed yet, …” t1 = 19:00, t2 = 19:01 PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  39. 4.5. Sliding Window Semantics • Previous definition of the buffering semantics has a trouble: buffer overflow occurs quickly. →Introduce lifespan of events and purge function. They define additional variables: Definition 4.8: define the sliding window semantics at ti PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  40. 4.5. Sliding Window Semantics The lifespan of events are detemined and set according to the semantics definition. • Every subscribed event have freezingTime • Every complex event have observationTime The rest part of this section discusses • how to deal with “imperfect system”(late detection due to network or communivation delays) • and changing freezingTime and observationTime but I omitted that. PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  41. 4.6. Semantic Equivalance Proposition: (Pure) Buffering Semantics and Sliding Window Semantics are equivalent? → On satisfy the monotone +4 sanity conditions, it’s true. (THEOREM 4.9) PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  42. 4.6. Semantic Equivalancemonotoneconditions for equivalance In most scenarios, cond(P) should be easy to achieve. Conditions: • The composition of event queries does not directly or indirectly include negation (or all quantification) over events of a subscribed event class. • The value of event attributes of a complex event are calculated only from events whose key is subsumed by the schema of the complex event. PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  43. 4.6. Semantic Equivalance4 conditions for equivalance These conditions are achieved by identifying proper values for freezingTime and observationSpan. PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  44. 4.6. Semantic Equivalance~semantic for Nonmonotone Programs~ This part is omitted by me. PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  45. 5. PEACE’S IMPLEMENTATION AND DEPLOYMENT Scalability & Performance • Scalability • Simple: 3 components, 1 event detector & action executor & event processor • Complex: many components for scale, many event detector for many web resources, and action executors • Performance Bottleneck • Event Detection & Action Execution >>> Event Processing PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  46. 5. PEACE’S IMPLEMENTATION AND DEPLOYMENT • Mobile Deployment: available on Ubuntu & Android • Visual Editor: for Eclipse Plug-in, GUI Event Flow Editor PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  47. 5. PEACE’S IMPLEMENTATION AND DEPLOYMENT • Simulation and Visualization: • Database Back Ends: Both SQLite & H2 PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  48. 6. PERFORMANCE EVALATION 2 parts of evaluations: • Entire PEACE system with all of its components • PEACE’s performance is dominated by OXPATH, only 0.6% of its runtime in the event processor(described later). • Focusing on the event processor and its scalability • They consider the performance of the event processor in stress tests, the sample size are one digit larger than typical requirements. • They CANNOT directly compare with other systems. PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  49. 6.1. PEACE Components Evaluation~ Summary ~ • Only 0.6 % of runtime spend in event processing. • Scalability: with x10 higher system load: • event extraction: x7 • event processing: x4 • local action execution: x8.5 PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

  50. 6.1. PEACE Components Evaluation~ Details ~ • the initial browser startup proves to be very costly • Avaliable more optimizations on web access(e.g. :using temporal indices per event) PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

More Related