1 / 94

Improving Data Recovery From Embedded Networked Sensing Systems

Nithya Ramanathan. Improving Data Recovery From Embedded Networked Sensing Systems. Faults Lead to Low Quality Datasets. Faults that impact the end product Faults that result in missing data points Faults that lead to incorrect or confusing data. Data Lost to Data Lost to

holly
Télécharger la présentation

Improving Data Recovery From Embedded Networked Sensing Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nithya Ramanathan Improving Data Recovery From Embedded Networked Sensing Systems

  2. Faults Lead to Low Quality Datasets Faults that impact the end product • Faults that result in missing data points • Faults that lead to incorrect or confusing data Data Lost to Data Lost to Deployment Network Faults Sensing Faults Great Duck Island (2004)142%3 - 60% / sensor Redwoods (2005)260% 8 of 33 sensors Volcan Reventador (2006)332% 82% of total events Cold Air Drainage (2006)4 Humidity Sensors 34% 10% High rates of faults lead to incomplete data sets that are difficult to interpret

  3. Many sources of uncertainty complicate fault detection and diagnosis Harsh environments No fault-free period Unexplored environments New sensing hardware Minimal visibility Short-lived deployments

  4. Problem Statement: Detect and diagnose faults in the face of uncertainty

  5. Scientist’s Solution to Uncertainty Manual validation using a physical sample or high fidelity sensor • Context matters • Some “faults” are real • Not easy

  6. Manual approach falls short when managing a large network Chemical Data from Bangladesh `06 Ammonium Calcium Carbonate Chloride Nitrate pH

  7. Problem Statement: Detect and diagnose faults in the face of uncertainty Solution: Systems that efficiently focus a user’s attention by suggesting actions users can take in the field to fix and validate potential faults

  8. User-Centric System: Two Design Goals Limit user burden Transparency • Mechanisms that are visible to the user and easy to understand • Small number of metrics to simplify system reasoning Aemc.jpl.nasa.gov/activities/bio_regent.cfm

  9. Two User-centric Systems Sympathy detects and diagnoses network faults • Collect metrics from each node in the network • Uses a decision tree that is based on our analysis of faults Confidence detects and diagnoses sensor and network faults • Transparent feature space where similar sensor and network data group together • This space reduces fault detection and diagnosis to simple mechanisms that are easily modifiable in the field Replace Sensor Sensors Transmit Data Feature Space

  10. Contributions • Sympathy: Includes 3 system health metrics, a decision tree to diagnose faults, and a localization algorithm to reduce fault notifications • Confidence: Includes 3 sensor data features and a feature space where similar sensor and network data group together • Confidence: Dynamic algorithms which classify faults and can be updated with user feedback on-line • Implementation and evaluation of Sympathy and Confidence in real-world deployments

  11. Existing Techniques are Insufficient • Decision trees, rules, thresholds, behavioural models • Machine learning Tend to run autonomously Often require • An initial fault-free period to train an ML algorithm / model • A priori knowledge of the environment • That faults should be rare, and all rare events are faults • That faults apply either to sensing or to networking hardware Existing systems do not deal well with in-field uncertainty

  12. SympathyConfidence

  13. Node Crashed Node Rebooted Node Crash Have Neighbors Node Reboot Have Route No Neighbors Rx Pkts No Route Tx Pkts Bad Node Inbound Path Bad NodeOutbound Path Bad Sink Inbound Path Sympathy: Fixing Network Faults Insight: The amount of data collected at the sink is related to the existence of failures in the system. System design is simplified to tracking data flow Has a failure happened? Is data lost? Is it important? Where was data lost? What failure happened? Why was data lost? X X X SINK SINK

  14. Localization Reduces Failure Notifications For most tests, localization reduces primary notifications by at least 50% # Primary Failures # Total Failures

  15. Notification Latency Not Impacted by Various Scenarios • Different Scenarios • Various traffic scenarios (only routing traffic, application traffic every 30 and every 10 seconds)‏ • Multiple failures • Different stacks (Surge + MintRoute)‏ Empirical CDF Failure Detection Latency (secs)‏

  16. Sympathy has served as inspiration for … • The design of debugging systems • K. K. Chang, N. Ramanathan, D. Estrin, and J. Palsberg. “D.A.S.: deployment analysis system.” In Procs. Sensys. ACM, 2005. • R. Kumar and Z. Koradia. “Porting Sympathy to the Bridge Monitoring Application.” Project Report, 2006. • S. Gupta. “Detecting routing threats in a wireless sensor network using Sympathy.” Personal Communication, 2006. • A. Sheth, C. Doerr, D. Grunwald, R. Han, and D. Sicker. “MOJO: a distributed physical layer anomaly detection system for 802.11 WLANs.” In Procs. of Mobisys, 2006. • V. Jonnakuti. “Fault Diagnosis in Wireless Sensor Networks.” Project Report, 2007. • A new sensing architecture • J. I. Choi, J. W. Lee, M. Wachs, and P. Levis. “Opening theSensornet Black Box.” In Procs of the Intl Workshop on Wireless Sensornet Architecture (WWSNA), 2007. • A protocol evaluation metric • M. Wachs, J. I. Choi, J. W. Lee, K. Srinivasan, Z. Chen, M. Jain, and P. Levis. “Visibility: A New Metric for Protocol Design.” In Procs. of SenSys, 2007.

  17. SympathyConfidence

  18. Sympathy’s Approach Was Not Sufficient We attempted to design Sympathy-like rules to detect and diagnose data faults The static approach did not adapt to new environments www.radioblvd.com/WirelessPhoto.htm We built on the successes in Sympathy to design a new system to manage sensor and network health.

  19. Sampled Environment Soil H20 Data OK Replace Sensor Physical Observation PAR Data OK Confidence: A Dynamic System to Adapt to New Environments A simple system that works with minimal user input Two Insights It is possible to define a feature space that tends to group nodes and sensors with similar fault states Regularly spaced regions sufficiently identify groupings

  20. Distance LDR (0,0) Gradient Take Physical Sample Replace Sensor Outlier Detection and Classification Data Visualization Incorporatesuser feedback in real time User Actions to find and fix potential sensor and network faults

  21. Feature Selection (t1, y1) Began with features commonly used by scientists Guiding principles to select features • Quality indicator • Numerically quantifiable • Independent • Verifiable Leads to a transparent system (t2, y2) GRADIENT = (y2 – y1) / (t2 – t1)

  22. Features Selected System Health Features NODE DEADNESS APPLICATION DEADNESS CONGESTION Environmental Data Features GRADIENT DISTANCE LDR DISTANCE NLDR LDR DISTANCE LDR= max (y – LDRupper, LDRlower – y, 0)

  23. Features Selected System Health Features NODE DEADNESS APPLICATION DEADNESS CONGESTION Environmental Data Features GRADIENT DISTANCE LDR DISTANCE NLDR NLDR DISTANCE NLDR= max (y – NLDRupper, NLDRlower – y, 0) Separate feature spaces for system and sensor faults

  24. Feature Scaling 0 Not Faulty 10 Faulty • No need to precisely define good or bad • Many scaling functions will work float confidence_scale (float x, int S): return max (log2(X/S), 0) • The same scaling constant is sufficient for most sensors; validated for over 20 types of sensors

  25. Fault Detection: Dynamic Distance Thresholds N + 2 x N dN Dynamically calculate a threshold using simple outlier detection

  26. Bootstrapping Distribution Parameters Di • System operation is robust to many values of Di • As long as more than 50% of faulty data is beyond Di

  27. Fault Diagnosis: User-Driven Segmentation 10 Replace Sensor Take Physical Sample Distance LDR 0 Gradient 10 Actions increase the quantity and usability of data Sensors • Check sensor / connection • Recalibrate sensor • Take physical sample Network • Check node • Check congestion • Check sensor connection

  28. Incorporating User Interaction 10 Replace Sensor Take Physical Sample Distance LDR PAR Not Faulty 0 Gradient 10 Outcome-based feedback is easier to use and understand

  29. Performance Hypothesis • The system correctly detects and diagnoses at least 90% of all data in a wide range of deployment scenarios • Incorporating outcome-based feedback leads to improved system accuracy • Confidence performs better than common thresholding techniques, with less burden on the user.

  30. Methodology Primary performance metric is the fraction of non-faulty and faulty data that is correctly detected and diagnosed Evaluate adaptability in multiple contexts • Faults injected in real sensors and simulation • Real-world deployments, simulation, replayed datasets • Wide range of parameter settings Ground truth for detection and diagnosis of faults • Simulation vs Real life • Exploratory Deployments: • Manual analysis of domain experts • Physical sample analysis • Post-deployment analysis of sensors

  31. Bangladesh JR Number Sensors 33 130 Faulty / Diagnosed / Total # Points 8000 / 4000 / 15000 3800 / - / 35400 Non-Faulty Detected 98% 90% Faulty Detected 94% 83% / 99.9% Faulty Diagnosed 92% NA Part I: Detection / Diagnosis accuracy: Sensor Faults Base case: Meets 90% performance constraints

  32. Part I: Detection / Diagnosis accuracy: Sensor Faults Wide range of scaling factors (S): Meets 90% performance constraints

  33. Part I: Detection / Diagnosis accuracy Converge to correct threshold Meets 90% performance constraints as long as at least 50% of faulty data lies beyond Di

  34. Part I: Detection / Diagnosis accuracy: Network Faults Both detect all faults eventually Confidence: 0.16 false notifications / test Sympathy: 1.75 false notifications / test

  35. Part II: Interaction improves accuracy Quantify performance when system incorporates user feedback in the field at JR Correctly detects 90% Sampled Environment Soil H2O Data OK H2O: Correctly detects 96% PAR 1: Correctly detects 97% Distance LDR PAR 2: Correctly detects 98% Physical Observation PAR Data OK Gradient

  36. Part III: Comparison to Thresholding Outperforms static and dynamic threshold techniques with little burden on the user • Static thresholds are user-assigned • Dynamic thresholds discard data outside m + N*s

  37. Deployment Experience: SJR • 14 ammonium and nitrate ISEs / 7 temperature sensors • Data validation: redundant set of sensors (difficult!); and physical samples • Correctly reports 3 ISE faults and 1 temperature fault, and correctly diagnoses 3 / 4 faults

  38. Deployment Experience: AMARSS • 130 soil sensors • Fault injection • Soil & Moisture sensors plunged into a cold bag of water • Humidity & Temperature sensors moved to an enclosed pitcher of hot water • Results • Correctly detects all faults and suggests appropriate actions within 5 minutes of injection • Decision support example

  39. Compare Confidence and Sympathy By incorporating feedback from the user, Confidence is able to adapt to new or uncharacterized environments quickly • Manage tradeoff between user burden and system accuracy • Manually modifying static thresholds vs outcome-based feedback • Small number metrics • Both are transparent • Combined feature space vs decision tree • Transparency • Both are easy to understand • Sympathy provides no decision support

  40. Related Work: Thresholds, Rules, and Decision Trees • Static thresholds applied to: • Data Szewczyk et al., Tolle et al. • Data Features Krajewski et al. • (IP) Network features MOJO, Tulip • Dynamic thresholds • (ENS) Network features Memento

  41. Related Work: Machine Learning Training data can be obtained… • During an initial fault-free period • Kiciman et al., Fox et al. • Cannot expect a fault-free period in ENS deployments • On-line • Larkey et al., Nath et al. • Assume faults are rare and that statistical spatial relationships exist between communication neighbors • From historical / pre-labeled datasets • Magpie, Eskin et al., Hines et al., Neural networks • Requires access to a dataset from the environment, and sufficient knowledge to label this dataset • On-line Supervised Learning • Bohus et al. • System able to operate autonomously because “success” can be determined automatically, no need for human feedback

  42. Related Work: User-Driven Systems • Decision Support • Emerged when automated diagnosis difficult • AMDs: Miller et al., Marckmann et al. • User-Driven Segmentation • Ramel et al.

  43. Conclusion • Problem: Fault detection and diagnosis in the presence of in-field uncertainty • Solution: User-centric systems • Thesis Contributions • Sympathy: 3 system health metrics, a decision tree to diagnose faults, and a localization algorithm to reduce fault notifications • Confidence: 3 sensor data features and a feature space where similar sensor and network data group together • Confidence: Dynamic algorithms which classify faults and can be updated with user feedback on-line • Implementation and evaluation of Sympathy and Confidence in real-world deployments

  44. Collaborators Bangladesh Deployment UCLA (Los Angeles, California)‏ Jenny Jay Christine Lee Tiffany Lin Nithya Ramanathan Sarah Rothenberg UC Merced (Merced, California)‏ Tom Harmon BUET (Dhaka, Bangladesh)‏ Borhan Badruzzaman Sajib Sha alom MIT (Cambridge, Massachusetts)‏ Ashfaque Kandakher Charlie Harvey Rebecca Neumann Other Work Jeff Burke, Deborah Estrin, Eric Graham, Jeff Goldman, Mike Hamilton, Mark Hansen, Tom Harmon, Jenny Jay, Eddie Kohler, Mani Srivastava, Mike Taggart, Lixia Zhang Laura Balzano, Kevin Chang, Lew Girod, John Hicks, Martin Lukac,Tom Schoelhammer, The Data Integrity Group, Dgroup

  45. Future Work • Project Surya will replace traditional cooking methods with inexpensive solar and other energy-efficient cookers, and document their role in reducing emissions of CO2 and soot. • Deploy air filters in each house-hold to document indoor reductions • Use cell-phones for automated data collection and analysis • Will improve data integrity • Lead to other research challenges

  46. Fault Diagnosis Replace Sensor Take Physical Sample Distance LDR Fault Detection Interactivity Triangle Not Faulty Gradient Feature Selection & Scaling F = <f1, f2, f3> Sensors Base-station

  47. Rejected Graphs

  48. Part I.1: Design Decisions Feature Selection

  49. Part I.1: Design Decisions Online Clustering Number Regions

  50. SYMPATHY BACKUP

More Related