Improving Data Recovery From Embedded Networked Sensing Systems

Nithya Ramanathan Improving Data Recovery From Embedded Networked Sensing Systems

Faults Lead to Low Quality Datasets Faults that impact the end product • Faults that result in missing data points • Faults that lead to incorrect or confusing data Data Lost to Data Lost to Deployment Network Faults Sensing Faults Great Duck Island (2004)142%3 - 60% / sensor Redwoods (2005)260% 8 of 33 sensors Volcan Reventador (2006)332% 82% of total events Cold Air Drainage (2006)4 Humidity Sensors 34% 10% High rates of faults lead to incomplete data sets that are difficult to interpret

Many sources of uncertainty complicate fault detection and diagnosis Harsh environments No fault-free period Unexplored environments New sensing hardware Minimal visibility Short-lived deployments

Problem Statement: Detect and diagnose faults in the face of uncertainty

Scientist’s Solution to Uncertainty Manual validation using a physical sample or high fidelity sensor • Context matters • Some “faults” are real • Not easy

Manual approach falls short when managing a large network Chemical Data from Bangladesh `06 Ammonium Calcium Carbonate Chloride Nitrate pH

Problem Statement: Detect and diagnose faults in the face of uncertainty Solution: Systems that efficiently focus a user’s attention by suggesting actions users can take in the field to fix and validate potential faults

User-Centric System: Two Design Goals Limit user burden Transparency • Mechanisms that are visible to the user and easy to understand • Small number of metrics to simplify system reasoning Aemc.jpl.nasa.gov/activities/bio_regent.cfm

Two User-centric Systems Sympathy detects and diagnoses network faults • Collect metrics from each node in the network • Uses a decision tree that is based on our analysis of faults Confidence detects and diagnoses sensor and network faults • Transparent feature space where similar sensor and network data group together • This space reduces fault detection and diagnosis to simple mechanisms that are easily modifiable in the field Replace Sensor Sensors Transmit Data Feature Space

Contributions • Sympathy: Includes 3 system health metrics, a decision tree to diagnose faults, and a localization algorithm to reduce fault notifications • Confidence: Includes 3 sensor data features and a feature space where similar sensor and network data group together • Confidence: Dynamic algorithms which classify faults and can be updated with user feedback on-line • Implementation and evaluation of Sympathy and Confidence in real-world deployments

Existing Techniques are Insufficient • Decision trees, rules, thresholds, behavioural models • Machine learning Tend to run autonomously Often require • An initial fault-free period to train an ML algorithm / model • A priori knowledge of the environment • That faults should be rare, and all rare events are faults • That faults apply either to sensing or to networking hardware Existing systems do not deal well with in-field uncertainty

SympathyConfidence

Node Crashed Node Rebooted Node Crash Have Neighbors Node Reboot Have Route No Neighbors Rx Pkts No Route Tx Pkts Bad Node Inbound Path Bad NodeOutbound Path Bad Sink Inbound Path Sympathy: Fixing Network Faults Insight: The amount of data collected at the sink is related to the existence of failures in the system. System design is simplified to tracking data flow Has a failure happened? Is data lost? Is it important? Where was data lost? What failure happened? Why was data lost? X X X SINK SINK

Localization Reduces Failure Notifications For most tests, localization reduces primary notifications by at least 50% # Primary Failures # Total Failures

Notification Latency Not Impacted by Various Scenarios • Different Scenarios • Various traffic scenarios (only routing traffic, application traffic every 30 and every 10 seconds)‏ • Multiple failures • Different stacks (Surge + MintRoute)‏ Empirical CDF Failure Detection Latency (secs)‏

Sympathy has served as inspiration for … • The design of debugging systems • K. K. Chang, N. Ramanathan, D. Estrin, and J. Palsberg. “D.A.S.: deployment analysis system.” In Procs. Sensys. ACM, 2005. • R. Kumar and Z. Koradia. “Porting Sympathy to the Bridge Monitoring Application.” Project Report, 2006. • S. Gupta. “Detecting routing threats in a wireless sensor network using Sympathy.” Personal Communication, 2006. • A. Sheth, C. Doerr, D. Grunwald, R. Han, and D. Sicker. “MOJO: a distributed physical layer anomaly detection system for 802.11 WLANs.” In Procs. of Mobisys, 2006. • V. Jonnakuti. “Fault Diagnosis in Wireless Sensor Networks.” Project Report, 2007. • A new sensing architecture • J. I. Choi, J. W. Lee, M. Wachs, and P. Levis. “Opening theSensornet Black Box.” In Procs of the Intl Workshop on Wireless Sensornet Architecture (WWSNA), 2007. • A protocol evaluation metric • M. Wachs, J. I. Choi, J. W. Lee, K. Srinivasan, Z. Chen, M. Jain, and P. Levis. “Visibility: A New Metric for Protocol Design.” In Procs. of SenSys, 2007.

SympathyConfidence

Sympathy’s Approach Was Not Sufficient We attempted to design Sympathy-like rules to detect and diagnose data faults The static approach did not adapt to new environments www.radioblvd.com/WirelessPhoto.htm We built on the successes in Sympathy to design a new system to manage sensor and network health.

Sampled Environment Soil H20 Data OK Replace Sensor Physical Observation PAR Data OK Confidence: A Dynamic System to Adapt to New Environments A simple system that works with minimal user input Two Insights It is possible to define a feature space that tends to group nodes and sensors with similar fault states Regularly spaced regions sufficiently identify groupings

Distance LDR (0,0) Gradient Take Physical Sample Replace Sensor Outlier Detection and Classification Data Visualization Incorporatesuser feedback in real time User Actions to find and fix potential sensor and network faults

Feature Selection (t1, y1) Began with features commonly used by scientists Guiding principles to select features • Quality indicator • Numerically quantifiable • Independent • Verifiable Leads to a transparent system (t2, y2) GRADIENT = (y2 – y1) / (t2 – t1)

Features Selected System Health Features NODE DEADNESS APPLICATION DEADNESS CONGESTION Environmental Data Features GRADIENT DISTANCE LDR DISTANCE NLDR LDR DISTANCE LDR= max (y – LDRupper, LDRlower – y, 0)

Features Selected System Health Features NODE DEADNESS APPLICATION DEADNESS CONGESTION Environmental Data Features GRADIENT DISTANCE LDR DISTANCE NLDR NLDR DISTANCE NLDR= max (y – NLDRupper, NLDRlower – y, 0) Separate feature spaces for system and sensor faults

Feature Scaling 0 Not Faulty 10 Faulty • No need to precisely define good or bad • Many scaling functions will work float confidence_scale (float x, int S): return max (log2(X/S), 0) • The same scaling constant is sufficient for most sensors; validated for over 20 types of sensors

Fault Detection: Dynamic Distance Thresholds N + 2 x N dN Dynamically calculate a threshold using simple outlier detection

Bootstrapping Distribution Parameters Di • System operation is robust to many values of Di • As long as more than 50% of faulty data is beyond Di

Fault Diagnosis: User-Driven Segmentation 10 Replace Sensor Take Physical Sample Distance LDR 0 Gradient 10 Actions increase the quantity and usability of data Sensors • Check sensor / connection • Recalibrate sensor • Take physical sample Network • Check node • Check congestion • Check sensor connection

Incorporating User Interaction 10 Replace Sensor Take Physical Sample Distance LDR PAR Not Faulty 0 Gradient 10 Outcome-based feedback is easier to use and understand

Performance Hypothesis • The system correctly detects and diagnoses at least 90% of all data in a wide range of deployment scenarios • Incorporating outcome-based feedback leads to improved system accuracy • Confidence performs better than common thresholding techniques, with less burden on the user.

Methodology Primary performance metric is the fraction of non-faulty and faulty data that is correctly detected and diagnosed Evaluate adaptability in multiple contexts • Faults injected in real sensors and simulation • Real-world deployments, simulation, replayed datasets • Wide range of parameter settings Ground truth for detection and diagnosis of faults • Simulation vs Real life • Exploratory Deployments: • Manual analysis of domain experts • Physical sample analysis • Post-deployment analysis of sensors

Bangladesh JR Number Sensors 33 130 Faulty / Diagnosed / Total # Points 8000 / 4000 / 15000 3800 / - / 35400 Non-Faulty Detected 98% 90% Faulty Detected 94% 83% / 99.9% Faulty Diagnosed 92% NA Part I: Detection / Diagnosis accuracy: Sensor Faults Base case: Meets 90% performance constraints

Part I: Detection / Diagnosis accuracy: Sensor Faults Wide range of scaling factors (S): Meets 90% performance constraints

Part I: Detection / Diagnosis accuracy Converge to correct threshold Meets 90% performance constraints as long as at least 50% of faulty data lies beyond Di

Part I: Detection / Diagnosis accuracy: Network Faults Both detect all faults eventually Confidence: 0.16 false notifications / test Sympathy: 1.75 false notifications / test

Part II: Interaction improves accuracy Quantify performance when system incorporates user feedback in the field at JR Correctly detects 90% Sampled Environment Soil H2O Data OK H2O: Correctly detects 96% PAR 1: Correctly detects 97% Distance LDR PAR 2: Correctly detects 98% Physical Observation PAR Data OK Gradient

Part III: Comparison to Thresholding Outperforms static and dynamic threshold techniques with little burden on the user • Static thresholds are user-assigned • Dynamic thresholds discard data outside m + N*s

Deployment Experience: SJR • 14 ammonium and nitrate ISEs / 7 temperature sensors • Data validation: redundant set of sensors (difficult!); and physical samples • Correctly reports 3 ISE faults and 1 temperature fault, and correctly diagnoses 3 / 4 faults

Deployment Experience: AMARSS • 130 soil sensors • Fault injection • Soil & Moisture sensors plunged into a cold bag of water • Humidity & Temperature sensors moved to an enclosed pitcher of hot water • Results • Correctly detects all faults and suggests appropriate actions within 5 minutes of injection • Decision support example

Compare Confidence and Sympathy By incorporating feedback from the user, Confidence is able to adapt to new or uncharacterized environments quickly • Manage tradeoff between user burden and system accuracy • Manually modifying static thresholds vs outcome-based feedback • Small number metrics • Both are transparent • Combined feature space vs decision tree • Transparency • Both are easy to understand • Sympathy provides no decision support

Related Work: Thresholds, Rules, and Decision Trees • Static thresholds applied to: • Data Szewczyk et al., Tolle et al. • Data Features Krajewski et al. • (IP) Network features MOJO, Tulip • Dynamic thresholds • (ENS) Network features Memento

Related Work: Machine Learning Training data can be obtained… • During an initial fault-free period • Kiciman et al., Fox et al. • Cannot expect a fault-free period in ENS deployments • On-line • Larkey et al., Nath et al. • Assume faults are rare and that statistical spatial relationships exist between communication neighbors • From historical / pre-labeled datasets • Magpie, Eskin et al., Hines et al., Neural networks • Requires access to a dataset from the environment, and sufficient knowledge to label this dataset • On-line Supervised Learning • Bohus et al. • System able to operate autonomously because “success” can be determined automatically, no need for human feedback

Related Work: User-Driven Systems • Decision Support • Emerged when automated diagnosis difficult • AMDs: Miller et al., Marckmann et al. • User-Driven Segmentation • Ramel et al.

Conclusion • Problem: Fault detection and diagnosis in the presence of in-field uncertainty • Solution: User-centric systems • Thesis Contributions • Sympathy: 3 system health metrics, a decision tree to diagnose faults, and a localization algorithm to reduce fault notifications • Confidence: 3 sensor data features and a feature space where similar sensor and network data group together • Confidence: Dynamic algorithms which classify faults and can be updated with user feedback on-line • Implementation and evaluation of Sympathy and Confidence in real-world deployments

Collaborators Bangladesh Deployment UCLA (Los Angeles, California)‏ Jenny Jay Christine Lee Tiffany Lin Nithya Ramanathan Sarah Rothenberg UC Merced (Merced, California)‏ Tom Harmon BUET (Dhaka, Bangladesh)‏ Borhan Badruzzaman Sajib Sha alom MIT (Cambridge, Massachusetts)‏ Ashfaque Kandakher Charlie Harvey Rebecca Neumann Other Work Jeff Burke, Deborah Estrin, Eric Graham, Jeff Goldman, Mike Hamilton, Mark Hansen, Tom Harmon, Jenny Jay, Eddie Kohler, Mani Srivastava, Mike Taggart, Lixia Zhang Laura Balzano, Kevin Chang, Lew Girod, John Hicks, Martin Lukac,Tom Schoelhammer, The Data Integrity Group, Dgroup

Future Work • Project Surya will replace traditional cooking methods with inexpensive solar and other energy-efficient cookers, and document their role in reducing emissions of CO2 and soot. • Deploy air filters in each house-hold to document indoor reductions • Use cell-phones for automated data collection and analysis • Will improve data integrity • Lead to other research challenges

Fault Diagnosis Replace Sensor Take Physical Sample Distance LDR Fault Detection Interactivity Triangle Not Faulty Gradient Feature Selection & Scaling F = <f1, f2, f3> Sensors Base-station

Rejected Graphs

Part I.1: Design Decisions Feature Selection

Part I.1: Design Decisions Online Clustering Number Regions

SYMPATHY BACKUP

Improving Data Recovery From Embedded Networked Sensing Systems

Improving Data Recovery From Embedded Networked Sensing Systems

Presentation Transcript

... Networked Healthcare Systems ...

Real Time Operating Systems for Networked Embedded Systems

Networked Embedded and Control Systems

Process-Based Software Components for Networked Embedded Systems

Data Acquisition for Embedded Systems

Networked Embedded Systems

Complex networked systems

LENS: Language for Embedded Networked Sensing

Improving Data Quality in Wireless Sensing Systems

NETWORKED EMBEDDED SYSTEMS

Networked Embedded Systems Term and Master Thesis Topics 2012

Challenges of Resource-Constrained Networked Embedded Systems

Multi-Paradigm Evaluation of Embedded Networked Wireless Systems

Predictable Design of Embedded Systems using Networked Architectures

Tales in Embedded Sensing

Center for Embedded Networked Sensing University of California, Los Angeles CS4HS Workshop

NETS3303 Networked Systems

Programming Memory-Constrained Networked Embedded Systems

Center for Embedded Networked Sensing Sustainable Large-Scale Sensor Networks

Predictable Design of Embedded Systems using Networked Architectures

Networked Systems Security

Networked Embedded and Control Systems