1 / 32

Troubleshooting Wireless Mesh Networks

Troubleshooting Wireless Mesh Networks. Victor Bahl bahl@microsoft.com joint work with Lili Qiu, Ananth Rao (UCB) & Lidong Zhou Microsoft Research April 1, 2004. Mesh Network Management.

aysel
Télécharger la présentation

Troubleshooting Wireless Mesh Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Troubleshooting Wireless Mesh Networks Victor Bahl bahl@microsoft.com joint work with Lili Qiu, Ananth Rao (UCB) & Lidong Zhou Microsoft Research April 1, 2004

  2. Mesh Network Management “Network management is a process of controlling a complex data network so as to maximize its efficiency and productivity” ISO’s definition of network management: • Fault Management • Configuration Management • Security Management • Performance management • Accounting

  3. Goals Assist with Mesh Router configuration Reactive and Pro-active Trouble Shooting • Investigate reported performance problems • Time-series analysis to detect deviation from normal behavior • Localize and Isolate trouble spots • Collect and analyze traffic reports from mesh nodes • Determine possible causes for the trouble spots • Interference, or hardware problems, or network congestion, or malicious nodes …. Respond to troubled spots • Re-route traffic • Rate limit • Change topology via power control & directional antenna control • Flag environmental changes & problems

  4. Nomenclature Mesh Management Module (M3) • Runs on every node Mesh Management Server (MMS) • Runs on gateway or designated nodes Mesh Network Management Protocol (MNMP) • Protocol (similar to SNMPv3) between M3 and MMS

  5. Focus of this talk • Gathering & Distribution Data • Cleaning Data • Fault Isolation & Diagnosis

  6. Challenges in Fault Diagnosis Characteristics of multi-hop wireless networks • Unpredictable physical medium, prone to link errors • Network topology is dynamic • Resource limitation calls for a diagnosis approach with low overhead • Vulnerable to link attacks Identifying root causes • Just knowing link statistics is insufficient • Signature Based Techniques don’t work well • Determining normal behavior is hard Handling multiple faults • Complicated interactions between faults and traffic, and among faults themselves

  7. Previous Approaches to Fault Diagnosis Protocols for Network Management • ANMP [singh99] • Guerrilla [shen02] Detecting Routing and MAC misbehavior • Watchdog & pathrater [Baker00] • MACMis [Vaidya03] Fault Management in Infrastructure mode • AirWave, AirDefense, UniCenter, Symbol’s WNMS, IBM’s WSA, Wibhu’s SpetraMon, …

  8. Our Approach Use a network simulator as a real-time diagnostic tool

  9. Fault Detection, Isolation & Diagnosis Process ManagerModule DiagnoseFaults Root Causes MeasuredPerformance Raw Data CleanData Inject CandidateFaults Performance Estimate Collect Data Agent Module Routes Link Loads Signal Strength Simulate • SNMP MIBs • Performance Counters • WRAPI • MCL • NativeWiFi

  10. Root Cause Analysis Module

  11. Our Fault Diagnosis Framework Advantages • Flexible & customizable for a large class of networks • Captures complicated interactions within the network, between the network & environment, and among multiple faults • Extensible in its ability of detecting new faults • Facilitates what-if analysis Challenges • To accurately reproduce the behavior of the network inside a simulator • To build a fault diagnosis technique using the simulator as a diagnosis tool

  12. Handling the Challenges Reproducing network behavior • Identify the set of traces to collect • Rule out erroneous data from the trace • Drive the simulator with the cleaned traces Building fault diagnosis • Use performance results from trace-driven simulation to establish the normal behavior • Deviation from the normal behavior indicates a potential fault • Identify root causes by efficiently search over fault space to re-produce faulty symptoms

  13. Why Simulator?

  14. Simulator Accuracy: RF Propagation RF propagation model versus measured signal strengths for IEEE 802.11a cards from different vendors

  15. Simulator Accuracy: Throughput Estimated versus actual throughput when channel conditions are good (IEEE 802.11a)

  16. Simulator Accuracy: Throughput (2) Estimated matches measured throughput till the channel conditions become poor

  17. Simulator Accuracy: Throughput Estimated matches measured throughput for poor channel conditions when loss rate is incorporated

  18. How Stable is the Channel? Good environmental conditions, received signal strength remains stable

  19. Data Collection What should we collect? • Network Topology/Connectivity Info (Neighbor Table) • Noise level & signal strength • Traffic load to direct neighbor • Loss rate to direct neighbor (retransmission count)

  20. Data Distribution Design Goal Minimize bandwidth consumption Techniques • Dynamic scoping • Each node takes a local view of the network • The coverage of the local view adapts to traffic patterns • Adaptive monitoring • Minimize measurement overhead in normal case • Change update period • Push and pull • Delta compression • Multicast

  21. Management Overhead • Info distributed: • Routing changes • Traffic counters (e.g. pkts. sent & rcv.) • Signal Strength Avg: 1 to 5 hops 40 Kb/sec 25 Kb/sec 15 Kb/sec BW requirement does not go up much with network size

  22. Measurement Overhead on Throughput

  23. Data Cleaning Data may not be pristine. Why? • Liars, malicious users • Missing data • Measurement errors Clean the Data • Detect Liars • Assumption: most nodes are honest • Approach: • Neighborhood Watch • Find the smallest number of lying nodes to explain inconsistency in traffic reports • Smoothing & Interpolation

  24. Example: Resiliency against Liars/Lossy Links Results Problem • Identify nodes that report incorrect information (liars) • Detect lossy links Assume • Nodes monitor neighboring traffic, build traffic reports and periodically share info. • Most nodes provide reliable information Challenge • Wireless links are error prone and unstable Approach • Find the smallest number of lying nodes to explain inconsistency in traffic reports • Use the consistent information to estimate link loss rates

  25. Fault Diagnosis Algorithm 1. Initialization: diagnosed fault set F = { } 2. Forward addition while (diff(MeasuredPerf, SimulatedPerf(F)) > threshold) {Find a candiate fault that explains the mismatch between current and predicted performance the most, and add it to F } 3. Backward deletion while (diff(MeasuredPerf, SimulatedPerf(F)) > threshold) {Find a fault in F that explains the mismatch the least. Delete it from F if excluding it results in little change } 4. Report F

  26. Performance 25 node random topology • Faults detected: • Random packet dropping • MAC misbehavior • External noise

  27. What-if Analysis Improvement on removing flows

  28. Mesh Visualization Module

  29. Thanks! http://www.research.microsoft.com/sn/mesh

  30. Backup

  31. Detection of Intentional Packet Drops Scenario - 49 node network - Randomly pick nodes that drop packets

More Related