1 / 26

Clarifying Sensor Anomalies using Social Network feeds

Clarifying Sensor Anomalies using Social Network feeds. Prasanna Giridhar * , Tanvir Amin * , Lance Kaplan + , Jemin George + , Raghu Ganti ++ , Tarek Abdelzaher *. * University of Illinois at Urbana Champaign + U.S. Army Research Lab ++ IBM Research, USA. INTRODUCTION.

aida
Télécharger la présentation

Clarifying Sensor Anomalies using Social Network feeds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clarifying Sensor Anomalies using Social Network feeds Prasanna Giridhar*, Tanvir Amin*, Lance Kaplan+, Jemin George+, Raghu Ganti++, Tarek Abdelzaher* * University of Illinois at Urbana Champaign +U.S. Army Research Lab ++IBM Research, USA

  2. INTRODUCTION • Explosive growth in deployment of physical sensors. • Many times activities recorded by these sensors deviate from the norm: • Closure of a freeway due to forest fire. • Change in building occupancy due to shutdown. • Unusual behavior tend to attract human attention and get reported socially as well.

  3. MOTIVATION • Several research works in the past for detecting events in the physical as well as the social domain. • Can we use the social media as a tool for explaining the underlying cause of anomalies? • A system for identifying the discriminative social feeds that can be correlated with sensor anomalies. • The more unusual the event, higher probability. • Evaluation performed on real time traffic data.

  4. System Work-flow STEP 1: Initialization of the system Continuous stream of tweets using parameters • Keywords • Location Continuous stream of data from physical sensors

  5. Detecting events in Sensors STEP 2: Identification of sensor anomalies • Run a black box algorithm. • Store attributes for sensors classified positively by the algorithm • Cluster the sensors which provide redundant data

  6. Detecting events in Sensors STEP 2: Identification of sensor anomalies • Run a black box algorithm. • Store attributes for sensors classified positively by the algorithm • Cluster the sensors which provide redundant data t1,t2

  7. Detecting events in Sensors STEP 2: Identification of sensor anomalies • Run a black box algorithm. • Store attributes for sensors classified positively by the algorithm • Cluster the sensors which provide redundant data

  8. Discriminative Social Feeds STEP 3: Identification of discriminative social feeds • Social feeds often have keywords describing an event • Keywords: malaysian, airlines, 370

  9. Keyword Signatures Single Keyword? Airlines

  10. Keyword Signatures Keyword pair? Malaysian, Airlines

  11. Keyword Signatures Keyword triplet? Malaysia, Airlines, 370 Malaysia, Airlines, Satellite

  12. Keyword Signatures • Signature profile on the twitter data collected • Ideal 1-to-1 mapping for keyword pair

  13. Possible Approaches Problem: Given a list of keyword pairs for the current and past window, how to find the most discriminating subset? Difference in rate of occurrences: (traffic,jam) 50 times today compared to past average of 35 (drunk, kills) 12 times today compared to a past average of 0. Increase in percentage: (traffic,jam) 1 time today compared to past average of 0 (drunk, kills) 12 times today compared to a past average of 2 Overcome disadvantages using Information Gain Theory

  14. Information Gain Theory and Entropy Entropy measures randomness introduced by a variable Using conditional entropy value determine information gain about an event by the keyword pair. This can be formulated as: Information Gain = H(Y) − H(Y|X) Y: variable associated with event; y=0 (normal) and y=1 (anomalous) X: variable associated with keyword pair; x=0 (absent) and x=1 (present)

  15. Rank the unusual events STEP 4: Ranking discriminative events • Identify tweets for discriminative pairs. • Score proportional to conditional entropy. • The lower the entropy value, the higher is the discriminating power.

  16. Mapping both events STEP 5: Matching tweets with sensor anomalies We align both the data based on spatiotemporal properties associated with the event. For example • Sensor ID40456 on I-15 Northbound with unusual activity • Unusual Tweet: “SFvSD game tonight, stuck @15N traffic!!!”

  17. Output Explanations STEP 6: Output the matched explanations • Final step is to provide the explanations. • A user interface which enables to track unusual events on a per-day basis.

  18. EXPERIMENTAL RESULTS • Twitter feeds collected for a period of 2 weeks: Aug 19 to September 01, 2013 with a radius of 30 miles • Three cities in CA: • Los Angeles • San Francisco • San Diego • Physical sensors data retrieved from PeMS (Caltrans Performance Measurement System http://pems.dot.ca.gov/ ) : 5 minutes report for flow, speed, occupancy, delay

  19. EXPERIMENTAL RESULTS Performance measured using Precision and Mean Average rank for our Information gain theory approach against other baseline approaches Table: Precision using different methods B1 corresponds to Difference in rate of occurrences and B2 to Increase in percentage. Table: Average position of tweets from the top

  20. INTERESTING EVENTS Sensor anomaly detected • Highway I-80 Eastbound in SF • Landmarks: Bay bridge • Duration: 4 days

  21. INTERESTING EVENTS

  22. INTERESTING EVENTS US101 blockage due to Bomb squad in LA

  23. INTERESTING EVENTS Traffic on 15N due to game in SD

  24. CONCLUSION • Abnormal behavior recorded in social medium. • Tool to explain the abnormalities. • Major activities explained with high precision. • Explanations ranked among top two tweets.

  25. Future Work • Scalability Issues • Credibility of social feeds • Geo localization of tweets

  26. THANK YOU Q+A

More Related