Fault Tolerance for WLAN

Fault Tolerance for WLAN Speaker：Mark Yang 93.04.27

Outline • Hardware Fault Tolerance • Dependability enhancement for IEEE 802.11 wireless LAN with redundancy techniques • Tolerance to Access-Point Failures in Dependable Wireless Local-Area Networks • Comparison • Software Fault Tolerance • TCP-DCR: A novel protocol for tolerating wireless channel errors • Implementation of Explicit Wireless Loss Notification Using MAC-Layer Information • Comparison • Simulation • Conclusion 2 / 40

Dependability enhancement for IEEE 802.11 wireless LAN with redundancy techniques Dependable Systems and Networks, 2003. Proceedings. 2003 International Conference on , 22-25 June 2003 Pages:521 - 528 Hardware (1) 3 / 40

Hardware (1) – Abstract • Propose the alternate approach of tolerating the existence of "shadow regions" as opposed to prevention in order to enhance the connection dependability. • A redundantAP is placed in the shadow region to serve the mobile stations which roam into that region. • The secondary AP is connected to the same distribution system as the primary AP. (DS configuration) • The secondary AP acts as a wireless forwarding bridge for traffic to/from the mobile stations in the shadow region to the primary AP. (Forwardingconfiguration) 4 / 40

Hardware (1) – DS configuration • An additional AP is placed in the shadow area with the same frequency as the primary AP. • The secondary AP forwards the data between the mobile stations in the shadow area and the primary AP. The two APs communicate over thedistribution system. 5 / 40

Hardware (1) – Forwarding configuration • The secondary AP is placed at a certain location where it could communicate with both the mobile terminal in the shadow area and with the primary AP. • The secondary AP thus could forward the packet transmissions in both directions. 6 / 40

Hardware (1) – Aspects • Since the beacon interval is less than 100ms, the maximum detection delay for link failures is 100ms • A mobile station in the shadow region may only transmit data when it is granted a TXOP by the secondary AP • The primary AP sends the specification of TXOP to the secondary AP. • Simultaneously, the primary AP also broadcasts the same channel reservation message in the cell in the form of a QoS-Poll frame. • All stations in the non-shadowed area will receive the QoS-Poll frame and defer any transmission attempt until the channel reservation time is over. • With channel reserved, the secondary AP then sends the QoS-Poll frame to mobile stations in the shadow region sequentially so that they may send their data packets free of collisions. 7 / 40

Hardware (1) – Fault models • Reliability: • Availability: • Survivability: 8 / 40

Availability = 1-10-2 = 0.99 1 /λos ≈ 2.8 hours Hardware (1) – Numerical examples 9 / 40

Tolerance to Access-Point Failures in Dependable Wireless Local-Area Networks Object-Oriented Real-Time Dependable Systems, 2003. Proceedings. Ninth IEEE International Workshop on , 1-3 Oct. 2003 Pages:136 - 143 Hardware (2) 10 / 40

Hardware (2) – Abstract • Enhancing the dependability of wireless networks by focusing on tolerating AP failures and develop and evaluate a new fault-detection approach, based on signal-to-noise ratio. • Detection of AP Failures • Beacon-frame monitoring • Signal-to-noise ratio • Three techniques to recover from AP failures: • Access-Point Replication • Overlapping-Coverage • Link-Multiplexing 11 / 40

Hardware (2) – Beacon-frame monitoring • Handoff mechanism in 802.11 WLAN • Passive Scanning: A mobile station sweeps from channel-to-channel to detect the presence of Beacon frames which are periodically transmitted by the APs. • Active Scanning: A mobile station actively seeks out APs by broadcastingProbeRequest frames on every channel. • Need to distinguish between user mobility and AP's failure. • User mobility: • A few users trying to handoff to a new AP due to user mobility at a given point • of time. • Employ active scanning to discover new APs. • AP's failure: • The number of users trying to handoff to a new AP could be relatively large. • Using a passive scanning method instead to detect the presence of new AP. 12 / 40

Hardware (2) – Signal-to-noise ratio • Using the strength of the signal that a mobile station receives from an AP, as an indicator of the AP's "up/down" status. • Initial fault recovery mechanism if the signal-to-noise ratio (SNR) drops suddenly. 13 / 40

Hardware (2) – Access-Point Replication • Using an additional AP that is designated as a backup, and that can be activated once the primary AP fails. • Drawback: • The latency involved in detecting AP failures and performing the fail-over (authenticate  ACK  re-association request  re-association response) is relatively large (7.03 seconds). • Additional infrastructural costs – might not necessarily be actively used under fault-free conditions. 14 / 40

Hardware (2) – Overlapping-Coverage • If one AP fails, mobile stations associated with that AP can be transferred over to another AP whose coverage area intersects with that of the failed AP. • In IEEE 802.11, the channels used by neighboring AP be separated by at least five channels, this limited availability of channels can result in shadow areas. • Drawback: • Requires that some spare capacity be reserved at each AP to take over the additional users that the AP will have to support in case a neighboring AP (with overlapping coverage) fails. • The latency involved in detecting an AP failure and switching to a functional AP is relatively large. 15 / 40

Hardware (2) – Link-Multiplexing • Using redundant communication paths from a mobile station, with each path connecting a distinct wireless network-interface card at the mobile station to a distinct AP. • Using link-multiplexing over link-replication • Total bandwidth used for communication can remain the same as that used by a single link. • Increase in the amount of average delay in message transmission due to the multiplexing and demultiplexing. 16 / 40

Hardware (2) – Link-Multiplexing (cont.) • Requires additional software be installed at the client & server. • Utilize a library interpositioning (interceptor) approach to capture the network layer calls made by the application, and can be embedded inside a middleware layer at both the client and the server. • Fault-detection. • Intercepting network layer calls made by the application. • Multiplexing/de-multiplexing data from/to the application. 17 / 40

Hardware – Comparison 18 / 40

TCP-DCR: A novel protocol for tolerating wireless channel errors Accepted for publication in IEEE Transactions on Mobile Computing (February 2004) http://www.crhc.uiuc.edu/wireless/groupPubs.html Software (1) 19 / 40

Software (1) – Abstract • TCP-DCR delay the triggering of congestion response algorithms for a small bounded period of time T to allow the link level retransmissions to recover the loss due to channel errors. • If the packet is not recovered by link level retransmission by the end of the delay period, TCP-DCR protocol triggers the congestion recovery algorithms of fast retransmission and recovery. • Through simulations, TCP-DCR • Does not impact the fairness towards the native implementations of TCP. • Significantly better performance when channel errors contribute more towards packet losses in the network. 20 / 40

Software (1) – Behavior 21 / 40

t0 t0+(RTT/2 – rtt/2) t0+RTT/2 Software (1) – Choice of T t0+RTT/2+rtt/2  BS receives indication that the packet is lost t0+RTT/2+rtt  Packet is recovered at receiver t0+RTT+rtt Sender receives an ACK for the packet Sender would have to delay the congestion at least: (t0+RTT+rtt)-(t0+RTT) = rtt The interpacket delays are non-zero and the TCP sender may not know the value of rtt  The lower bound of T is one RTT Retransmission timeout is usually set to RTT + 4 times. The choice of T should be such that unnecessary retransmission timeouts are avoided. The upper bound of T is one RTT. 22 / 40

Software (1) – Simulation No Congestion Losses 23 / 40

12 TCP-SACK flows & 12 TCP-DCR flows congestion 10Mbps 5ms Software (1) – Simulation(cont.) Only Congestion Losses 24 / 40

Software (1) – Simulation(cont.) Channel Errors & Congestion Losses 12 TCP-SACK flows & 12 TCP-DCR flows TCP-DCR flows can make use of the link bandwidth not utilized effectively by the TCPSACK flows. 25 / 40

Implementation of Explicit Wireless Loss Notification Using MAC-Layer Information Wireless Communications and Networking, 2003. WCNC 2003. 2003 IEEE , Volume: 2 , 16-20 March 2003 Pages:1339 - 1343 vol.2 Software (2) 26 / 40

Software (2) – Abstract • TCP suffers a significant degradation in performance over wireless networks because it does not distinguish wireless link loss from congestion loss. • To overcome this problem, the Explicit Wireless Loss Notification (EWLN) scheme is proposed to explicitly inform wireless link loss to the TCP sender. • EWLN scheme that deploys the information from the MAC layer and takes into account the interplay with the error recovery mechanism at the link layer. • The sender's congestion control mechanism can be decoupled from the retransmission mechanism and set to react only to congestion related losses. 27 / 40

Link-level retry MAC Protocol Comparing the seqNo of the current and buffered packets To mobile terminal To next node Software (2) – MAC Protocol 28 / 40

Ewln_flag = 1 Send duplicate ACK Ewln_flag = 0 Congestion error Wireless link error and link-level can't recovery Normal Software (2) – Receiver 29 / 40

To avoid transmission duplication, retransmit only when the first ACK + EWLN Retransmit the packet upon receiving the first ACK with EWLN set. Software (2) – Sender 30 / 40

Receive 1 Not receive 2 Not receive 2 Not receive 4 Not receive 4 Not receive 4 If lose again? Not receive 4 Receive 6 Software (2) – Example Two packet losses occur over a wirless link in a single transmission window 31 / 40

No wirelesslink error Software (2) – Simulation 32 / 40

Software (2) – Simulation (cont.) congestion 33 / 40

Software – Comparison 34 / 40

Simulation – Environment • Paper : • TCP-DCR: A novel protocol for tolerating wireless channel errors • Software : • Linux 9 + NS 2.26 (DCR: modify tcp-sack1.cc) • Topology : • Tcl (additional setup): • Error Model (exponential) • Link Level Retransmission (LL/LLSnoop) 35 / 40

1st dupack 70.2242-69.9014 < 0.553 LL retransmission 85.656-85.4978 > 0.144 Fast recovery Time out Simulation – DCR code verify ack no 1131 received at 69.7674, cwnd=19 ack no 1131 received at 69.9014, cwnd=19 dcr start at 69.9014 [ack no=1131, delay time=0.553] ack no 1131 received at 70.2204, cwnd=19 ack no 1131 received at 70.2242, cwnd=19 delay fast recovery at 70.2242! [ack no=1131] ack no 1140 received at 70.7637, cwnd=19 dcr cancel at 70.7637 [ack no=1140] ack no 4128 received at 85.4946, cwnd=19 ack no 4129 received at 85.4962, cwnd=19 ack no 4129 received at 85.4978, cwnd=19 dcr start at 85.4978 [ack no=4129, delay time=0.144] ack no 4129 received at 85.6545, cwnd=19 ack no 4129 received at 85.656, cwnd=19 fast recovery begin at 85.656, dcr cancel! [ack no=4129] ack no 4142 received at 85.6816, cwnd=9 ack no 1417 received at 87.2862, cwnd=22 ack no 1417 received at 87.4168, cwnd=22 dcr start at 87.4168 [ack no=1417, delay time=0.849] ack no 1429 received at 91.4158, cwnd=1 dcr cancel at 91.4158 [ack no=1429] ack no 1429 received at 91.4769, cwnd=2 36 / 40

Simulation – Performance (1) Almost all errors are recovered by "Link level retransmission“. Some of "Fast-recovery" & "Timeout" events stall happen. 37 / 40

Conclusion • Papers for WLAN Fault Tolerance • Hardware Fault Tolerance : less • Software Fault Tolerance : more • Simulation / Experiment method • Hardware : Numerical examples or Experiment • Software : NS (Network Simulation tool) 40 / 40

Fault Tolerance for WLAN

Fault Tolerance for WLAN

Presentation Transcript

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault tolerance

Fault tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance