1 / 59

Challenges, Approaches, and Solutions in Stream Reasoning http://streamreasoning.org

Challenges, Approaches, and Solutions in Stream Reasoning http://streamreasoning.org. Emanuele Della Valle DEI - Politecnico di Milano emanuele.dellavalle@polimi.it http://emanueledellavalle.org. Agenda. Motivation Concept Achievements Applications Conclusions.

maddy
Télécharger la présentation

Challenges, Approaches, and Solutions in Stream Reasoning http://streamreasoning.org

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges, Approaches, and Solutions inStream Reasoninghttp://streamreasoning.org Emanuele Della Valle DEI - Politecnico di Milano emanuele.dellavalle@polimi.it http://emanueledellavalle.org

  2. Agenda • Motivation • Concept • Achievements • Applications • Conclusions Stavanger, 2012-5-9

  3. MotivationIt‘s a streaming World! [IEEE-IS2009] 1/3 • Oil operations • Traffic • Financial markets • Social networks • Generate data streams! Stavanger, 2012-5-9

  4. MotivationIt‘s a streaming World! [IEEE-IS2009] 2/4 • … and want to analyse data streams in real time • In a well in progress to drown, how long time do I have givenits historical behavior? • Is public transportation where the people are? • Can we detect any intra-daycorrelation clusters among stock exchanges? • Who is driving the discussion about the top 10 emerging topics ? Stavanger, 2012-5-9

  5. MotivationIt‘s a streaming World! [IEEE-IS2009] 3/4 • e.g., Real Time Rome (mobile network data streams) Normalday Exceptionalday afternoon [source: http://senseable.mit.edu/realtimerome/ ] evening Stavanger, 2012-5-9

  6. MotivationIt‘s a streaming World! [IEEE-IS2009] 4/4 • e.g., Pulse of the Nation (social media streams) 12:00 [source: http://www.ccs.neu.edu/home/amislove/twittermood/ ] 23:00 happier Stavanger, 2012-5-9

  7. MotivationWhat are data streams anyway? • Formally: • Data streams are unbounded sequences of time-varying data elements • Less formally: • an (almost) “continuous” flow of information • with the recent information being more relevant as it describes the current state of a dynamic system time Stavanger, 2012-5-9

  8. MotivationThe continuous nature of streams • The nature of streams requires a paradigmatic change* • from persistent data • to be stored and queried on demand • a.k.a. one time semantics • to transient data • to be consumed on the fly by continuous queries • a.k.a. continuous semantics * This paradigmatic change first arose in DB community[Henzinger98] Stavanger, 2012-5-9

  9. Motivation – The continuous nature of streamsContinuous Semantics • Continuousqueries registered over streams that, in most of the cases, are observed trough windows window Registered Continuous Query input streams streams of answer Stavanger, 2012-5-9

  10. Motivation – The continuous nature of streamsTools exists [Cugola2011] • Types • Data Stream Management Systems • Complex Event Processors • Research Prototypes • Amazon/Cougar (Cornell) – sensors • Aurora (Brown/MIT) – sensor monitoring, dataflow • Gigascope: AT&T Labs – Network Monitoring • Hancock (AT&T) – Telecom streams • Niagara (OGI/Wisconsin) – Internet DBs & XML • OpenCQ (Georgia) – triggers, view maintenance • Stream (Stanford) – general-purpose DSMS • Stream Mill (UCLA) - power & extensibility • Tapestry (Xerox) – publish/subscribe filtering • Telegraph (Berkeley) – adaptive engine for sensors • Tribeca (Bellcore) – network monitoring • High-tech startups • Streambase, Coral8, Apama, Truviso • Major DBMS vendors are all adding stream extensions as well • IBM InfoSphere Stream • Microsoft streaminsight • Oracle CEP Stavanger, 2012-5-9

  11. Motivation New Requirements  New Challenges Typical Requirements • Processing Streams • Large datasets • Reactivity • Fine-grained information access • Modeling complex application domains • Continuous semantics • Scalable processing • Real-time systems • Powerful query languages • Rich ontology languages Stavanger, 2012-5-9

  12. MotivationAre DSMS/CEP ready to address them? Typical Requirements • Processing Streams • Large datasets • Reactivity • Fine-grained information access • Modeling complex application domains DSMS/CEP • Continuous semantics • Scalable processing • Real-time systems • Powerful query languages • Rich ontology languages Stavanger, 2012-5-9

  13. Motivation Is Semantic Web ready to address them? • The Semantic Web, the Web of Data is doing fine • RDF, RDF Schema, SPARQL, OWL, RIF • well understood theory, • rapid increase in scalability • BUT it pretends that the world is staticor at best a low change rateboth in change-volume and change-frequency • ontology versioning • belief revision • time stamps on named graphs • It sticks to the traditional one-time semantics Stavanger, 2012-5-9

  14. MotivationNew Requirements  New Challenges Typical Requirements • Processing Streams • Large datasets • Reactivity • Fine-grained information access • Modeling complex application domains Semantic Web • Continuous semantics • Scalable processing • Real-time systems • Powerful query languages • Rich ontology languages Stavanger, 2012-5-9

  15. Motivation New Requirements call forStream Reasoning Typical Requirements • Processing Streams • Large datasets • Reactivity • Fine-grained information access • Modeling complex application domains • Continuous semantics • Scalable processing • Real-time systems • Powerful query languages • Rich ontology languages Stream Reasoning Stavanger, 2012-5-9

  16. ConceptStream Reasoning Definition [IEEE-IS2010] • Making sense • in real time • of multiple, heterogeneous, giganticand inevitably noisydata streams • in order to support the decision process of extremely large numbers of concurrent user • Note: making sense of streams necessarily requires processing them against rich background knowledge, an unsolved problem in database Stavanger, 2012-5-9

  17. Concept Research Challenges • Relation with DSMSs and CEPs • Just as RDF relates to data-base systems? • Data types and query languages for semantic streams • Just RDF and SPARQL but with continuous semantics? • Reasoning on Streams • Theory • Efficiency • Scalability • Dealing with incomplete & noisy data • Even more than on the current Web of Data • Distributed and parallel processing • Streams are parallel in nature, … • Engineering Stream Reasoning Applications • Development Environment • Integration with other technologies • Benchmarks Stavanger, 2012-5-9

  18. Achievements • RDF Streams • Notion defined • C-SPARQL • Syntax and semantics defined as a SPARQL extension • Engine designed and implemented • Experimentswith C-SPARQL under simple RDF entailment regimes • window based selection of C-SPARQL outperforms the standard FILTER based selection • algebraic optimizations of C-SPARQL queries are possible • Complex event can be detected using a network of C-SPARQL queries at high throughputs • Experiment with C-SPARQL under RDFS++ entailment regimes • efficient incremental updates of deductive closures investigated • our approach outperform state-of-the-art when updates comes as stream • Streaming Linked Data Framework prototyped Stavanger, 2012-5-9

  19. AchievementsOutline • RDF Streams • Notion defined • C-SPARQL • Syntax and semantics defined as a SPARQL extension • Engine designed and implemented • Experimentswith C-SPARQL under simple RDF entailment regimes • window based selection of C-SPARQL outperforms the standard FILTER based selection • algebraic optimizations of C-SPARQL queries are possible • Complex event can be detected using a network of C-SPARQL queries at high throughputs • Experiment with C-SPARQL under RDFS++ entailment regimes • efficient incremental updates of deductive closures investigated • our approach outperform state-of-the-art when updates comes as stream • Streaming Linked Data Framework prototyped Stavanger, 2012-5-9

  20. Memo: Sensor Network Ontology Stavanger, 2012-5-9

  21. Memo: Sensor Network Ontology[ ] streaming part [ ] static part Stavanger, 2012-5-9

  22. Achievements RDF Stream [WWW2009,EDBT2010,IJSC2010] • RDF Stream Data Type • Ordered sequence of pairs, where each pair is made of an RDF triple and its timestamp • Timestamps are not required to be unique, they must be non-decreasing • E.g., (< :s1 ssn:generatedObservation :o1 >, 2010-02-12T13:34:41) (< :o1 a weather:SnowfallObservation >, 2010-02-12T13:34:41) (< :s1 om-owl:generatedObservation :o2 >, 2010-02-12T13:36:28) (< :o2 a weather:WindSpeedObservation >, 2010-02-12T13:36:28) (< :o2 ssn:result :a1 >, 2010-02-12T13:36:28) (< :a1 ssn:floatValue "35.4”^^xsd:float >, 2010-02-12T13:36:28) Stavanger, 2012-5-9

  23. AchievementsOutline • RDF Streams • Notion defined • C-SPARQL • Syntax and semantics defined as a SPARQL extension • Engine designed and implemented • Experimentswith C-SPARQL under simple RDF entailment regimes • window based selection of C-SPARQL outperforms the standard FILTER based selection • algebraic optimizations of C-SPARQL queries are possible • Complex event can be detected using a network of C-SPARQL queries at high throughputs • Experiment with C-SPARQL under RDFS++ entailment regimes • efficient incremental updates of deductive closures investigated • our approach outperform state-of-the-art when updates comes as stream • Streaming Linked Data Framework prototyped Stavanger, 2012-5-9

  24. MEMO: SPARQL Stavanger, 2012-5-9

  25. AchievementsWhere C-SPARQL Extends SPARQL Stavanger, 2012-5-9

  26. AchievementsAn Example of C-SPARQL Query Which are the sensors that have observing winds above 50 mph in the last half an hour? Which is the observed average wind? REGISTER STREAM AvgWindSpeed AS CONSTRUCT { ?sensw:avgWindSpeed ?avgWindSpeed } FROM STREAM <.../streams/ssnmeteostream> [RANGE 1h STEP 1h] WHERE { SELECT ?sens (AVG(?v) as ?avgWindSpeed) WHERE { ?sensom-owl:generatedObservation ?o . ?o a weather:WindSpeedObservation . ?o om-owl:result ?r . ?r om-owl:floatValue ?v . } GROUP BY ?sens HAVING (?avgWindSpeed > 5) } Stavanger, 2012-5-9

  27. AchievementsAn Example of C-SPARQL Query Which are the sensors that have observing winds above 50 mph in the last half an hour? Which is the observed average wind? REGISTER STREAM AvgWindSpeed AS CONSTRUCT { ?sensw:avgWindSpeed ?avgWindSpeed } FROM STREAM <.../streams/ssnmeteostream> [RANGE 1h STEP 1h] WHERE { SELECT ?sens (AVG(?v) as ?avgWindSpeed) WHERE { ?sensom-owl:generatedObservation ?o . ?o a weather:WindSpeedObservation . ?o om-owl:result ?r . ?r om-owl:floatValue ?v . } GROUP BY ?sens HAVING (?avgWindSpeed > 5) } Query registration (forcontinuous execution) RDF Streamaddedasnew ouput format FROM STREAM clause WINDOW SPARQ 1.1 features • Sub-queries • aggregates Stavanger, 2012-5-9

  28. AchievementsOutline • RDF Streams • Notion defined • C-SPARQL • Syntax and semantics defined as a SPARQL extension • Engine designed and implemented • Experimentswith C-SPARQL under simple RDF entailment regimes • window based selection of C-SPARQL outperforms the standard FILTER based selection • algebraic optimizations of C-SPARQL queries are possible • Complex event can be detected using a network of C-SPARQL queries at high throughputs • Experiment with C-SPARQL under RDFS++ entailment regimes • efficient incremental updates of deductive closures investigated • our approach outperform state-of-the-art when updates comes as stream • Streaming Linked Data Framework prototyped Stavanger, 2012-5-9

  29. AchievementsFROM STREAM Clause - Types of Window • physical: a given number of triples • logical: a variable number of triples which occur during a given time interval (e.g., 1 hour) • Sliding: they are progressively advanced of a given STEP (e.g., 5 minutes) • Tumbling: they are advanced of exactly their time interval Stavanger, 2012-5-9

  30. AchievementsEfficiency of Evaluation [IEEE-IS2010] • window based selection of C-SPARQL outperforms the standard FILTER based selection Stavanger, 2012-5-9

  31. AchievementsOutline • RDF Streams • Notion defined • C-SPARQL • Syntax and semantics defined as a SPARQL extension • Engine designed and implemented • Experimentswith C-SPARQL under simple RDF entailment regimes • window based selection of C-SPARQL outperforms the standard FILTER based selection • algebraic optimizations of C-SPARQL queries are possible • Complex event can be detected using a network of C-SPARQL queries at high throughputs • Experiment with C-SPARQL under RDFS++ entailment regimes • efficient incremental updates of deductive closures investigated • our approach outperform state-of-the-art when updates comes as stream • Streaming Linked Data Framework prototyped Stavanger, 2012-5-9

  32. AchievementsAlgebraic optimizations of C-SPARQL [EDBT2010] • Several transformations can be applied to algebraic representation of C-SPARQL • some recalling well known results from classical relational optimization • push of FILTERs and projections • some being more specific to the domain of streams. • push of aggregates. Stavanger, 2012-5-9

  33. Achievementsalgebraic optimizations of C-SPARQL [EDBT2010] • Push of filters and projections Stavanger, 2012-5-9

  34. AchievementsOutline • RDF Streams • Notion defined • C-SPARQL • Syntax and semantics defined as a SPARQL extension • Engine designed and implemented • Experimentswith C-SPARQL under simple RDF entailment regimes • window based selection of C-SPARQL outperforms the standard FILTER based selection • algebraic optimizations of C-SPARQL queries are possible • Complex event can be detected using a network of C-SPARQL queries at high throughputs • Experiment with C-SPARQL under RDFS++ entailment regimes • efficient incremental updates of deductive closures investigated • our approach outperform state-of-the-art when updates comes as stream • Streaming Linked Data Framework prototyped Stavanger, 2012-5-9

  35. AchievementsComplexEventDetectionasstreamcompositions • e.g., continuous detection of blizzards by analyzing multiple streams of data generated by weather sensors spread across a continental area Blizzard: a severe snowstorm with high winds and low visibility lasting at least three hours Stavanger, 2012-5-9

  36. AchievementsComplexEventDetectionasstreamcompositions Count Snow Fall [1 HOUR] [TUMBLING] Q Linked Sensor Data Most Liked POIs AVG Wind Speed [1 HOUR] [TUMBLING] [3 HOURS] [STEP 1 HOUR] Q Q AVG Temp [1 HOUR] [TUMBLING] Q Adapter LEGEND C-SPARQL Query Stavanger, 2012-5-9

  37. AchievementsComplexEventDetectionasstreamcompositions snowfall + strong winds + low temp  blizzards Live demonstration at http://streamreasoning.org/demos/blizzard-detection Stavanger, 2012-5-9

  38. AchievementsHigh Throughputs [JWS2012a] Stavanger, 2012-5-9

  39. AchievementsOutline • RDF Streams • Notion defined • C-SPARQL • Syntax and semantics defined as a SPARQL extension • Engine designed and implemented • Experimentswith C-SPARQL under simple RDF entailment regimes • window based selection of C-SPARQL outperforms the standard FILTER based selection • algebraic optimizations of C-SPARQL queries are possible • Complex event can be detected using a network of C-SPARQL queries at high throughputs • Experiment with C-SPARQL under RDFS++ entailment regimes • efficient incremental updates of deductive closures investigated • our approach outperform state-of-the-art when updates comes as stream • Streaming Linked Data Framework prototyped Stavanger, 2012-5-9

  40. AchievementsWhere’s the Reasoning? • Example: can we measure the the impact of a tweet? • Twitter allows two traceable ways of discussing a tweet: • reply: a user reply to a tweet of another user (it always retweet the original tweet) • retweet: a user propagates to his/her followers an interesting tweet • For example reply reply t2 t4 t7 reply reply retweet reply t1 t3 t5 t8 t6 retweet 50 min ago 10 min ago now 40 min ago 30 min ago 20 min ago Stavanger, 2012-5-9

  41. AchievementsExample of C-SPARQL and Reasoning 1/2 What impact have I been creating with my tweets in the last hour? Let’s count them … REGISTER STREAM OpinionSpreading COMPUTED EVERY 30s AS SELECT ?tweet (count(?tweet) AS ?impact FROM STREAM <http://ex.org> [RANGE 60m STEP 10m] WHERE { :t1 sr:discuss ?tweet } :discuss a owl:TransitiveProperty . :replyrdfs:subPropertyOf :discuss . :retweetrdfs:subPropertyOf :discuss . discuss discuss reply reply t2 t4 t7 discuss reply discuss discuss discuss reply retweet reply 7! t1 t3 t5 t8 t6 retweet discuss Stavanger, 2012-5-9

  42. AchievementsOur approach [ESWC2010] • The algorithm • deletes all triples (asserted or inferred) that have just expired • computes the entailments derived by the inserts, • annotates each entailed triple with a expiration time, and • eliminates from the current state all copies of derived triples except the one with the highest timestamp. Stavanger, 2012-5-9

  43. ImplementationComparative Evaluation on Materialization • base-line: re-computing the materialization from scratch • state-of-the-art [Ceri1994,Volz2005] • our approach [ESWC2010] % of the materialization changed when the window slides Stavanger, 2012-5-9

  44. AchievementsComparative Evaluation on Query Answering • comparison of the average time needed to answera C-SPARQL query using • backward reasoner • the naive approach of re-computing the materialization • our approach Backward reasoning Stavanger, 2012-5-9

  45. AchievementsOutline • RDF Streams • Notion defined • C-SPARQL • Syntax and semantics defined as a SPARQL extension • Engine designed and implemented • Experimentswith C-SPARQL under simple RDF entailment regimes • window based selection of C-SPARQL outperforms the standard FILTER based selection • algebraic optimizations of C-SPARQL queries are possible • Complex event can be detected using a network of C-SPARQL queries at high throughputs • Experiment with C-SPARQL under RDFS++ entailment regimes • efficient incremental updates of deductive closures investigated • our approach outperform state-of-the-art when updates comes as stream • Streaming Linked Data Framework prototyped Stavanger, 2012-5-9

  46. AchievementsStreaming Linked Data Framework [JWS2012a] Features • Accessing raw data stream from C-SPARQL • Publishing streams and C-SPARQL query results as Linked Data • Connecting C-SPARQL queries in a network • Recoding and replaying portions of stream • Supporting fast prototyping of applications Stavanger, 2012-5-9

  47. ApplicationsLocation Based Social Media Analytics [JWS2012b] Stavanger, 2012-5-9

  48. ApplicationsLocation Based Social Media Analytics [JWS2012b] http://youtu.be/XGOKe_lhSks Live demonstration at http://streamreasoning.org/demos/bottari Stavanger, 2012-5-9

  49. ApplicationsBlizzard Detection [JWS2012a] Stavanger, 2012-5-9

  50. ApplicationsBlizzard Detection [JWS2012a] Live demonstration at http://streamreasoning.org/demos/blizzard-detection Stavanger, 2012-5-9

More Related