1 / 14

Sensor Data Management: Challenges and (some) Solutions

Sensor Data Management: Challenges and (some) Solutions. Amol Deshpande, University of Maryland. RFID. Distributed measurement networks (e.g. GPS). Wireless sensor networks. Industrial Monitoring. Motivation. Unprecedented, and rapidly increasing, instrumentation of our every-day world.

werner
Télécharger la présentation

Sensor Data Management: Challenges and (some) Solutions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sensor Data Management:Challenges and (some) Solutions Amol Deshpande, University of Maryland

  2. RFID Distributed measurement networks (e.g. GPS) Wireless sensor networks Industrial Monitoring Motivation • Unprecedented, and rapidly increasing, instrumentation of our every-day world

  3. Sensor Network User • Extract all readings into a file • Run MATLAB/R/other data processing tools • Write output to a file/back to the database • Write data processing tools to process/aggregate the output (maybe using DB) • Decide new data to acquire Repeat Sensor Data Processing: Now Database Table raw-data

  4. Models to be applied to data in real-time (at least simple ones) Continuous (standing) queries e.g. alert monitoring Results to continuous queries Ad hoc queries (possibly against processed, modeled data) Sensor Data Processing: What we want Database User Sensor Network Table raw-data Data Table processed-data Tasks

  5. Data Management Challenges • Very, very large scale • Spatio-temporal querying essential • Need new indexing techniques, data description formats, techniques for “data ingest” (cleaning the data etc) • Much work in scientific data management • E.g. SkyServer • Data is typically imprecise, unreliable, or incomplete (data quality) • Measurement noise, failures in sensor/GPS data • High message loss rate in wireless/RFID Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.

  6. Data Management Challenges • Data is generated continuously and must be processed in real-time (distributed data streams) • Need different query processing paradigms • Typically very high data rates • Must be able to handle a large number of continuous queries efficiently • Much recent work on “Data Streams” • Research systems: TelegraphCQ [Berkeley], STREAM [Stanford], Aurora [Brown/MIT/Brandeis] etc… • Commercial systems: Streambase, TruViso, … Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.

  7. Data Management Challenges • Need for real-time statistical modelingof data • Eliminate spatial/temporal biases, handle missing data through extrapolation (e.g. regression, interpolation models) • Filter measurement noise (e.g. Kalman Filters) • Infer hidden variables, pattern recognition (e.g. HMMs) • Fault or anomaly detection • Forecasting/prediction (e.g. ARIMA) Temperature monitoring GPS Data Regression/interpolation models Kalman Filters …

  8. Data Management Challenges • The applications have strong acquisitional aspects • Data has to be actively acquired as needed • Typically high data acquisition costs(e.g. energy consumption in battery-powered devices) • Data provenance • Being able to trace something back to its origins • Data exploration and visualization • Data interoperability • Data security and privacy • … Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.

  9. My Research Interests • Managing imprecise and incomplete data • Support statistical modeling and querying of sensor data in relational databases • Clean, declarative abstractions • Real-time processing of streaming data • Probabilistic databases • Store and query data annotated with probabilities • Energy-efficient algorithms for wireless sensornets • Data acquisition, target monitoring, data compression .. • In-network query processing

  10. MauveDB • Written using Apache Derby Java open source DBMS • Supports an abstraction called model-based views • Declarative specification of models to be applied • Can query the output of the models using SQL • Models kept updated as new data/measurements arrive A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006 B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008

  11. MauveDB A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006 B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008

  12. MauveDB • Written using Apache Derby Java open source DBMS • Supports an abstraction called model-based views • Declarative specification of models to be applied • Can query the output of the models using SQL • Models kept updated as new data/measurements arrive • Status: • Support for Regression- and Interpolation-based views • Currently building support for views based on Dynamic Bayesian networks (Kalman Filters, HMMs etc) • Ongoing work: • Query processing and optimization, continuous queries • APIs for arbitrary models … A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006 B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008

  13. Probabilistic Databases • Motivation: Increasing amounts of uncertain data • From sensor networks • Imprecise data, data with confidence/accuracy bounds • Human-observed data • Statistical modeling/machine learning • Many models provide a distribution over a set of labels (e.g. HMMs) • Information extraction from text • Social networks • How to manage and query such data in relational databases ? • Different types of uncertainties • Complex correlation patterns • Much work in database community over last few years P. Sen, A. Deshpande; Representing and Querying Correlated Tuples in Probabilistic Databases; ICDE 2007

  14. Thanks ! • Questions ?

More Related