1 / 22

Declarative Support for Sensor Data Cleaning

Declarative Support for Sensor Data Cleaning. Shawn Jeffery Gustavo Alonso Michael Franklin Wei Hong Jennifer Widom UC Berkeley ETH Zurich UC Berkeley Arch Rock Stanford Corporation University

Télécharger la présentation

Declarative Support for Sensor Data Cleaning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Declarative Support for Sensor Data Cleaning Shawn Jeffery Gustavo Alonso Michael Franklin Wei Hong Jennifer Widom UC Berkeley ETH Zurich UC Berkeley Arch Rock Stanford Corporation University (Intel Research Berkeley) Presented By: Venkatesh (venky) Raghavan & Abhishek Mukherji Disclaimer: Slides adapted / taken from the talk given by S. Jeffery in Pervasive ‘06

  2. Current Approach Application Application Data Cleaning Data Cleaning • Each application implements its own data cleaning • Multiple accesses to a shared resource Raw, dirty data Sensor devices

  3. Data Cleaning - Infrastructure Approach Application Application Cleaned data • Data cleaning built, tested, and deployed once • One point of access to sensor devices Cleaning Infrastructure Raw, dirty data The Cleaning Infrastructure translates raw sensor data to cleaned data; applications are unaffected by the unreliable devices over which they are deployed.

  4. Challenges • How to build an infrastructure that supports: • Many types of sensors • Multiple applications • Different environments • Two facets to our solution: • Pipeline of sensor cleaning tasks • Declarative query processing

  5. Temporal and Spatial Granules • ESP (Extensible Sensor stream Processing) uses high-level abstractions: • Temporal Granules • Spatial Granules • Granules • Define units of time and space inside which the data are expected to be homogeneous Exploits the fact that many applications are not interested in individual readings or devices, but with higher-level data in time and space

  6. Temporal Granules • Sensor devices produce data at a frequent rate • Applications are concerned with data from a larger time period • Environment Monitoring application – model micro-climate of redwood tree • Reading required for every 5 minutes. • Solution: windowed processing to group readings

  7. Spatial Granules • Reading from devices physically close to each other are expected to be homogeneous • Spatial granules defines the unit of space in which this homogeneity is expected to hold.

  8. Sensor Cleaning Pipeline Virtualize • Cleaning Data Involves • A set of logically distinct operation • Each operation targets different aspects of the data, from finest (single readings) to coarsest (multiple sensors and various sources) • Uses temporal and spatial characteristics of sensor data Arbitrate Merge Smooth Point

  9. Program stages with declarative queries CQL: continuous query extension to SQL Data stream system as processing engine Real-time cleaning Declarative Query Processing SELECT S.city, AVG(temp) FROM SOME_STREAM S [RANGE ‘5 seconds’] WHERE S.state = ‘California’ GROUP BY S.city Window Clause

  10. Step 1: Point • Operates: Single value of sensor stream. • Purpose: Filter individual values • Errant (dirty / faulty) RFID tags • Obvious outliers • Conversion of raw data into tuples • Heat Sensors • Output data into voltages. We have to convert that raw data into temperature by looking into calibration of that sensor.

  11. Step 1: Point P P P P P P P P P P P P Point

  12. Step 2: Smoothing • Purpose: Interpolates (inserts) lost readings • Temporal interpolation • Outlier detection • Method:Window based queries Temporal Granules P P P P P S S P P S S P Smooth P P P P Point

  13. Step 3: Merge • Purpose: Spatial interpolation • Example: Within a spatial granule, by computing the average of the readings from different motes and omitting individual readings that are outside of two deviations from the mean. Spatial Granules M M Merge P P P P P S S P P S S P Smooth P P P P Point

  14. Outlier mote Average Functioning motes Step 3: Merge

  15. Step 4: Arbitrate • Purpose: Remove • conflicting readings • de-duplication Arbitrate A M M Merge P P P P P S S P P S S P Smooth P P P P Point

  16. Step 5: Virtualize Virtualize • Purpose: Multi-source integration V Arbitrate A M M Merge P P P P P S S P P S S P Smooth P P P P Point

  17. RFID Scenario Application Query 2 rfid_data Virtualize Each domain needs to modeled Arbitrate Query 4 arbitrate_input Merge Smooth Smooth Query 3 smooth_input Point Point On Sensor

  18. RFID Scenario Fig: Expected Output Fig: Query 2 result using raw RFID Data

  19. Smoothing Difference in Shelf 0 and Shelf 1 is likely due to issues with antenna ports on these particular RFID readers.

  20. Arbitration

  21. Arbitration RFID : r1 t t+1 t+2 Moving Average (Window (w) = 3 time-stamps At t+2, Shelf 0: count(r1) = 2 Shelf 1: count(r1) = 3 NOTE: Window size must be larger than the longest period of dropped reading. But not too large.

  22. An infrastructural approach to sensor data cleaning is necessary ESP: a pipelineddeclarative framework for building such infrastructure Application Application Cleaned data ESP Raw, dirty data Conclusion

More Related