Ahmed Helmy

Understanding and Utilizing Multi-Dimensional Correlations in Sensor Networks: A Protocol Design Perspective Ahmed Helmy Department of Electrical Engineering USC Viterbi School of Engineering University of Southern California helmy@usc.edu Web: ceng.usc.edu/~helmy, Lab: nile.usc.edu

Outline • Classifying Correlations • How to Utilize Correlations? • Insights for Protocol Design • Gradient-based Routing (RUGGED) • Active Query Routing (ACQUIRE) • Abnormality Detection and Filtering Inserted Data • WLANs as Sensor Networks (IMPACT) • Sensing access and usage patterns • Analyzing correlations in wireless users behavior • Issues

Correlation Classification • Dimensions of Correlation: • Spatial • Between neighboring nodes • Temporal • Across time (different samples) for the same node • Spatio-temporal • Moving target (e.g., vehicle), moving phenomenon (e.g., fire) • What is correlated? • Sensor readings (e.g., temperature, light, gradients) • Communication channel (e.g., loss, fading) • Localization information, …

How Can We Utilize Correlations? • In-network processing • Aggregation • Abstraction/ adaptive fidelity/ zoom-in • Prediction (model-based), enables Caching • Routing (gradients in time and space, etc.) • Abnormality detection (attacks, failures, mis-calibration) • Equivalence • Sampling smaller set of nodes (sleep/wake-up) • Topology control

RUGGED: RoUting on finGerprint Gradients in sEnsor Networks Jabed Faruque, Ahmed Helmy Department of Electrical Engineering University of Southern California faruque@usc.edu, helmy@usc.edu URL: http://nile.usc.edu, http://ceng.usc.edu/~helmy - Faruque, Psounis, Helmy, IEEE/ACM DCOSS 2005. - Faruque, Helmy, IEEE ICPS 2004.

Introduction • Sensor networks are envisioned to be widely used for habitat and environmental monitoring, among others • Every physical event produces a fingerprint in the environment • Usually diffusion laws are inherent property of many physical phenomena • f(d)  1/d, where • d = distance from the source,  = diffusion parameter, depends on the type of effect (e.g. for temperature  = 1, light  = 2)

Example (of diffusion): Isoseismal (intensity) maps (North Palm Springs earthquake of July 8, 1986) Ref.: Southern California Earthquake Center. (http://www.scec.org)

Why Natural Information Gradient is Important? • This natural information gradient isFREE • Routing protocols can use it to forward query packet(greedily) • - Locate event(s); e.g., fire, nuclear leakage. • Diffusion property is not limited to natural phenomena • - Time gradient • Existing approaches – flooding, expanding ring search, random-walk, etc. do not utilize this information gradient

Challenges • Erroneous reading of malfunctioning sensors • - Calibration error, obstacles. Cause local max/min • Environmental noise • In real life, sensors unable to measure below certain threshold. So, diffusion curve has finite tail • Non-uniform sensor distribution (gaps) Local Maximum Dip gap

Environment Model • Event’s effect follows the diffusion law • Discontinuity exists in the diffusion curve with finite tail • Environmental noise Objective • Design an efficient algorithm to locate source(s) in sensor networks, utilizing the natural information gradient i.e., the diffusion pattern of the event’s effect • - Gradient based- Fully distributed- Robust to node or sensor failure or malfunction- Capable of finding multiple sources

Basic Protocol A node can have two mode - flat region mode - gradient region mode A node forwards the query to neighbors with its information level To forward the query, each node uses following algorithm: 1. Information gradient region follows greedy approach - Forwards the query to the neighbors if the information level about the event improves 2. Unsmooth gradient region use probabilistic forward based on the Simulated Annealing concept - Probabilistic function is fp(x) = 1/xa, where x = hop count in the information gradient region and ‘a’ depends on the diffusion parameter () 3. Use flooding for the flat (ie. zero) information region - Decrease latency to reach gradient information region - Handles query in the absence of event Query ID prevents looping Once query is resolved, node uses the reverse path to reply

ng ng ng ng Mn ng ng ng ng E Q’ Q’ Q’ Q’ Q’ Q’ Q’ Q’ Q’ Q np np np np Mx np np np np E Q • All neighbors (np) of Mx have less information, so they forward the query to their neighbors probabilistically • All neighbors (ng) of Mn have more information, so they forward the query to their neighbors

Query Types • I. Single-value query- Search for a specific value and have a single response • II. Global Maxima search - Search for the maximum value of information in the system - Intermediate nodes suppress non-promising replies • III. Multiple Events detection (still presents a challenge) - Search for multiple events of same type Performance Metrics • Reachability i.e., success probability- Probability that the query will reach the source • Overhead in terms of average energy dissipation - Number of transmissions to forward the query and to get the reply - Reachability ~98% is achievable in presence of noise, gaps and flat region • For the probabilistic function fp(x) = 1/xa, a <  is recommended, but close to gives optimal trade-off between reachability and overhead

Comparisons • Existing gradient-based routing protocols can be categorized into two major approaches • Single-path approach - CADR [Chu2002], Min-hop [Liu2003], … • Multiple-path approach- GRAB [Ye2003], RUGGED [Faruque2004] Which approach to choose?

Objective • Analyze the performance of these general approaches to route a query - Model query success rate and overhead • Using probability tools • - For ideal and lossy wireless link conditions • Simulate the protocols based on these approaches in more realistic scenarios • Also investigate path quality metric • Compare both approaches using analytical and simulation results

Look-ahead = 1 Brief Description of Routing Approaches Single-path Query forwarding with look-ahead = 1 Multiple-path Query forwarding 57.4 41.5 57.4 41.5 41.5 57.4 57.4 41.5 57.4 57.4 S S 23.8 27.8 32.9 41.5 57.4 100 57.4 41.5 23.8 27.8 32.9 41.5 57.4 100 57.4 41.5 23.8 27.8 3.4 41.5 57.4 57.4 57.4 41.5 23.8 27.8 3.4 41.5 57.4 57.4 57.4 41.5 23.8 31.0 32.9 41.5 41.5 41.5 41.5 41.5 21.1 23.8 31.0 32.9 41.5 41.5 41.5 41.5 41.5 32.9 32.9 32.9 18.9 21.1 30.0 27.5 29.0 32.9 32.9 80.5 18.9 21.1 30.0 27.5 29.0 32.9 32.9 9.0 32.9 27.5 27.5 27.5 27.5 67.0 3.2 21.1 23.8 27.5 27.5 27.5 27.5 27.5 27.5 27.5 67.0 3.2 21.1 23.8 27.5 27.5 27.5 Q Q 23.8 17.2 92.1 21.1 23.8 23.8 4.1 98.1 23.8 23.8 23.8 23.8 17.2 92.1 21.1 23.8 23.8 4.1 98.1 23.8 23.8 23.8 Active Nodes 17.2 18.9 21.1 21.1 21.1 6.9 21.1 21.1 17.2 18.9 21.1 21.1 21.1 6.9 21.1 21.1 Active Node Candidate Node 17.2 18.9 3.8 18.9 17.2 18.9 3.1 18.9 17.2 17.2 17.2 17.2 17.2 17.2 17.2 17.2

12 10 13 12 18 14 9 12 10 11 14 15 10 10 9 7 7 8 8 Variations of Single-path Approach Depends on Next Active nodeselection policy • 1. Basic single-path approach • Selects a candidate node having maximum information and higher than current active node • Sensitive to local maxima • 2. Improved single-path approach • Selects a candidate node having maximum information - Information of the selected node can be less than the current active node Candidate node Active node

Comparisons-Query Success Rate (ideal and lossy link case,pc= 0.05) Ideal link case - analytical result Lossy link case - analytical result • Query success rate of the improved single-path approach drops drastically for lossy links while the multiple-path approach is quite resilient • ARQ may improve success rate of the improved single-path approach

Comparisons- Overhead Overhead of both approaches Energy saving of the multiple-path approach over improved single-path approach • Multiple-path approach creates extra paths due to probabilistic forwarding, so overhead increases • Single-path approach uses 1-hop look ahead at every step to decide on the forwarder • With the increase of malfunctioning nodes, the overhead of the single-path approach increases -The length of the path increases

Results– Path Quality (ideal link case) • Ratio of the average path length due to a routing approach over the shortest path length between a source and a sink • Multiple-path approach results shorter path which are close to the shortest path • With the increase of malfunctioning nodes, the path length of the single-path approach increases

Conclusions • Multiple-path approach causes less overhead when a source is < 20hops from sink • Multiple-path approach yields shorter paths • With increase of malfunctioning nodes, the query success rate of the multiple-path approach degrades gracefully- With lossy links • Query success rate of the single-path approach drops drastically • Multiple-path approach is quite resilient

Future work • Combine the benefits of both routing approaches in a hybrid routing approach • Develop more adaptive multiple-path approach to reduce the number of extra paths due to probabilistic forwarding • Implementation & evaluation in a test-bed • - on-going 150 sensor node new test-bed at USC • - continued work under the NSF-funded ACQUIRE project

ACQUIRE: ACtive QUery Forwarding In Sensor Networks Original team:Narayanan Sadagopan, Bhaskar Krishnamachari, Ahmed Helmy Current: Sundeep Pattem, Jabed Faruque, Rahul Orgaonkar, Yongjin Kim, Jung-Hyun Jun, Sapon Tanachaiwiwat, Shao-Cheng Wang Funding: NSF NETS NOSS, Intel (equipment) Department of Electrical Engineering USC Viterbi School of Engineering University of Southern California URL: http://ceng.usc.edu/~acquire

Develop a model of variation over time (or space) using measurements Use the model to predict data/readings. Only trigger updates or queries when data/readings deviate from predicted value. Depending on the data dynamics, we may be able to cache information collected earlier and answer queries without having to trigger new data collection.

ACtive QUery forwarding In sensoR nEtworks (ACQUIRE)* A mechanism for answering one-shot, complex queries for replicated data in sensor nets: One-shot (vs. continuous): answers are given based explicit queries about current readings. Complex (vs. simple): the query can contain several sub-queries. E.g: (x OR y) AND z. Replicated data: several sensors might have answer to a sub-query. Example: Micro Climate Data Collection Different sensor modalities Give a location where (Temp > 80 degrees OR Humidity > 40%) AND Wind speed > 20 mph * N. Sadagopan, B. Krishnamachari, A. Helmy, “Active Query Forwarding In Sensor Networks (ACQUIRE)”, AdHoc Networks Journal - Elsevier, Jan 2005 [Earlier version in SNPA ‘03]

Flooding Based Queries (Directed Diffusion) • Flooding: • Useful for long standing (continuous) queries • Replicated responses might make it very inefficient.

ACQUIRE • ACQUIRE • An active node “refreshes” data from its “neighborhood”. • The query is then forwarded to a node on the edge of the neighborhood

ACQUIRE • Key Features • In-network processing • Does not rely on geographic information or unicast routing protocol • Existence of these may considerably improve performance • d helps us span the space from random walk (d = 0) to flooding (d = D, the network diameter)

ACQUIRE • Look-ahead parameter, d • Determines the size of the “neighborhood” in hops. • Effects a tradeoff between the number of steps taken to resolve the query and the energy consumed. • Optimal look-ahead, d* • Depends on the query rate, refresh rate and the data dynamics (captured by the amortization factor, c) • May be achieved by localized schemes. • The higher the query rates & lower the data dynamics, the higher the optimal look ahead.

Performance of ACQUIRE C is the refresh/query ratio (e.g., 0.01 means refresh once every 100 queries) [the refresh overhead is amortized over the saving in queries]

ACQUIRE • Efficiency • 60-75% energy savings over Expanding Ring Search (analytical results) • Order of magnitude savings over flooding. • Future Work • Develop ACQUIRE in to a full fledged protocol that actively adapts the ‘d’ parameter for optimal performance • Evaluation over an experimental sensor network test bed. • ceng.usc.edu/~acquire

Correlations and Inserted Data • Main purpose of sensor networks: Collect Data • Sybil attacks may insert false data that affect operation of sensor networks: • Impersonating multiple IDs (at same/different times) • Outlier detection alone will not work • Approach: • Understand normal correlations between data • Detect outliers based on reference to normal behavior • Design protocol robust to massive amount of forged data

Single Attacker Scenario I Data: X from location (x,y) --Interesting events MobiQuitous 2005 5

Single Attacker Scenario II Data: X’ from location (x,y) --Normal events MobiQuitous 2005 6

Sybil Attack Scenario I Attackers (sybil nodes) Data: Wi from location (xi,yi) --Interesting events Source Source/forwarder Inactive node Aggregator Sink False Alarm MobiQuitous 2005

Attackers (sybil nodes) Sybil Attack Scenario II Data: Wi’ from location (xi,yi) --Normal events forwarder Source Inactive node Aggregator Sink Delayed or Failed Response MobiQuitous 2005

Data Correlation (Great duck island) T: Temperature, P: Pressure, H: Humidity ID: Sensor ID (only 4 neighboring sensors are shown)

Authentication Module Distributed Interactive Proof Statistical Analysis Module Correlation-coefficient analysis T*-test (Outlier threshold) Anomaly Relationship Test (ART) Architecture S. Tanachaiwiwat, A. Helmy, MobiQuitous 2005

(2)Request valid credential (3)Response with valid/invalid/no response (4) Send report to sink Compromised /Failed Sybil (5) Cross verify Anomaly Relationship Test (ART) Protocol Prover (attacker) Perform at verifiers only! (1)Correlation/T*-test source Verifier (aggregator) sink Verifier (forwarder) MobiQuitous 2005 9

Summary • Dynamic sliding window Correlation analysis and T*-Test can alleviate the attack effectively even under full scale attack from sybil nodes. • Remarks • Recognition of normal/abnormal/malicious events based on statistical analysis • Malicious data insertion can cause the problem to critical mission in WSN • Error is reduced by using Dynamic Sliding Window and careful choice of correlation threshold MobiQuitous 2005 22

WLANs as Sensor Networks Total Population: ~ 25,000 students Wireless Users: ~6000 students Access Points: ~400

IMPACT: Investigation of Mobile-user Patterns Across University Campuses using WLAN Trace Analysis* • Classes of future sensor networks will be attached to humans • What kinds of correlations exist between users? • Analyze measurements of wireless networks • Understand Wireless Users Behavior (individual and group) • Develop models to understand associations and friendship • Study of relationships and user behavior based on measurements of various University WLANs * W. Hsu, A. Helmy, “IMPACT: Investigation of Mobile-user Patterns Across University Campuses using WLAN Trace Analysis”, USC TR, July ‘05 (Under Submission)

Statistics of Studied Traces - Four major campuses - Month long traces studied - Total users in the study: over 12,000 users - Total Access Points in the study: over 1,300

Observations: On-line Time On-off behavior is very common for wireless users. This seems especially true for small handheld devices. There are clear categories of heavy and light users, the distribution of which is skewed and heavily depends on the campus.

Observations: Visited Access Points (APs) [percentage of visited APs] • Individual users access only a very small portion of APs in the network, less than 35% in all campuses. The long-term mobility of users is highly skewed in terms of time associated with each AP. On average a user spends more than 95% of time at its top five most visited APs.

Observations: Visited APs • The majority of users experience low mobility while using the network. This is even true for portable devices such as PDAs. The actual handoff statistics depend heavily on the environment.

Observations: Similarity Index • We observe clear repetitive patterns of association in wireless network users. Typically, user association patterns show the strongest repetitive pattern at time gap of one day/one week.

Observations: Encounters • In all the traces, the MNs encounter a relatively small fraction of the user population; below 40% in most cases and never reaching above 60% in any case. Except for UCSD trace, on average a MN only encounters 1.88%-5.94% of the whole population. The number of total encounters for the users follows a BiPareto distribution, the parameters of which depends on the campus.

Encounter-graphs • Definition • When 2 nodes access the same AP at the same time we call this an ‘encounter’ • The encounter graph has all the mobile nodes as vertices and its edges link all those vertices that encounter each other

Ahmed Helmy

Ahmed Helmy

Presentation Transcript

Ahmed Elshahat

Sungwook Moon, Ahmed Helmy Dept. of Computer and Information Science and Engineering

Ahmed Helmy Computer and Information Science and Engineering (CISE) Department

HAFSA AHMED

BY Dr.Khaled Helmy

ALTAF AHMED

Ahmed Zeddam

Dr.Wahid Helmy pediatric consultant.

Mohammad Ahmed

Ahmed Fouad

Ahmed

Ahmed Damoni Ahmed Sabha

Ahmed Hussain

Ahmed Ali

Rayan Ahmed

Gul Ahmed

Wei-jen Hsu Advised by Dr. Ahmed Helmy

ahmed

ahmed