Epidemic Algorithms

Epidemic Algorithms by David Kjerrumgaard

Introduction • A new class of networked systems is emerging that involve very large numbers of small, low-powered, wireless devices. • The sheer number of devices involved in such networks as well as the resource constraints of the nodes in terms of energy, storage, and processing necessitate the development of extremely simple algorithms for discovery, routing, multicast, and aggregation.

While these algorithms are easy to build, they often exhibit complex and unexpected behavior when they are utilized in real world scenarios; making them difficult to simulate accurately. • In their paper, Ganesan et. al., provide a wealth of detailed empirical data from a network involving over 150 such nodes. • This data is intended to serve as the basis for algorithm design in the wireless space.

Instrumentation in the experiment focused on various levels of the protocol stack in an effort to isolate the various factors influencing the global behavior of the system. • At the Physical / Link Layer, they measured • Packet Loss • Effective Communication Range • Link Asymmetry • At the MAC Layer, they captured • Contentions • Collisions • Latency

At the Network / Application Layer the structure of the trees constructed was analyzed.

Epidemic Algorithms • Refers to network protocols that allow rapid dissemination of information from a source through purely local interactions. • Messages initiated from the source are rebroadcast by neighboring nodes, extending outward hop by hop until the entire network is reached.

The following logic depicts the schema for message handling in a generalized epidemic protocol: Let S be local state of node and R a random number. If message Mi, is received for the first time, then Take local action based on Mi : S  f1(Mi,S). Compose message Mi’ = f2(Mi,S). Make Boolean retransmit decision D = f3(S,R). if D is true, then Transmit Mi’ to all neighbors. • Flooding, in which the nodes always retransmit the message upon reception is the simplest example of an epidemic algorithm.

More sophisticated forms of flooding algorithms exist, including probabilistic, counter-based, distance-based, and cluster-based techniques that seek to minimize the amount of redundant packet transmissions. • This study employed a simple retransmit flooding protocol, which under ideal conditions would ripple outward uniformly from the the source. • When a node first receives the message, it immediately rebroadcasts once, and squelches further retransmissions.

Message Flooding

Flooding Anomalies • Several indications of non-uniform flood propagation were observed during this study including: • Backward Links: Links formed between nodes that extend backward geographically toward the source. • Stragglers: Nodes that missed the message entirely even though neighboring nodes did receive the message. • Long Links: Links that were formed when the message was received over a larger distance, usually exceeding many hops. • Clustering: Most nodes in the tree had few decedents, while a few nodes had many.

Backward Link Long Link Straggler Clustering

Related Work • Prior experimental studies in this area have tended to focus on routing in wireless Ad Hoc networks without addressing scaling due to a lack of infrastructure. • These studies were comprised of fewer than a dozen nodes and therefore did not address issues of scale.

The other tool used in analyzing the behavior of routing protocols in large-scale multi-hop wireless networks is simulation. • The results of such studies were discounted due in large part to the difficulty in simulating physical a link layer characteristics in a accurate fashion. • Ultimately, a protocol’s performance must be validated in the real world.

Experimental Platform • The study employed over 175 identically configured Rene motes equipped with: • 4 MHz Amtel processor • 8 Kb of programming memory • 512B of data memory • 916 MHz single-channel, low-power radio • 10 Kbps of raw bandwidth • Uniform antenna length & orientation (both unspecified) • TinyOS as the runtime system • Fresh AA batteries

Each node uses a variation of the Carrier Sense Multiple Access (CSMA) protocol with a random backoff duration between 6ms and 100ms • During the backoff period, the radio is powered off in order to conserve energy, effectively blocking all communication during this time. • The MAC protocol keeps trying to deliver packets until a clear channel is found. (No dropped packets)

Testing Methodology • Two separate sets of experiments were conducted for this study. The first set focused on the characteristics of links among all nodes in a large test bed, while the second set focused on the dynamics of the flooding.

Experiment # 1 • 169 nodes were arranged in a 13x13 grid on an open parking structure, with a grid spacing of 2 feet. • The goal of this experiment was to map the connectivity characteristics between all nodes at 16 different radio transmit power settings between 60 W and 72 W.

The base station periodically issued commands to all nodes to control the experiment, this ensured that only one node would transmit at a time thereby eliminating the possibility of collisions. • The receiving nodes transmitted in sequence in response to the commands sent by the base station. • At each power setting, each node was instructed to transmit 20 packets, 100 ms apart. Thus, a total of 54,000 ( 16 x 20 x 169 ) messages were sent during the four hour test.

Upon receipt of a message, the following information was extracted from the packet and logged in the receiver’s data memory: • Transmitter ID ( 1 – 169 ) • Sequence number of the message ( 1- 54000 ) • Transmit power setting ( 60 – 72 )

Analysis of Experiment # 1 • The analysis from the first set of experimental data focused on the physical and link layers. • The goals of the analysis were: • Explore packet loss statistics over distance. • Attempt to quantitatively define and measure the effective communication radius at each transmit power setting in a real-world scenario. • Establish a definition of what constitutes a bi-directional link and an asymmetric link, and measure the effects of each link type on communication.

Packet Loss Statistics • For this study, packets that fail to pass CRC checking are considered lost. • During the analysis they discovered that the distribution of packet loss over distance was non-uniform. This observation is in stark contrast to the uniform, simple binary relation on distance used in large-scale simulation studies, which model signal propagation using the function 1/r where  > 2. • The expected packet loss distribution:

The observed packet loss distribution

Radio Range • Often described in terms of signal strength, however from an algorithmic standpoint, successful communication is what matters. • During the analysis they discovered that the decay of packet loss with respect to distance does not experience the polynomial falloff expected based on the signal propagation function 1/r where  > 2. This was especially true at larger transmit power settings.

During the analysis they discovered that the throughput never reached 100%, even at short distances from the transmitter. They attributed this phenomenon to two factors: • Increased fading rate due to deployment on the ground. • Insufficient signal processing and forward error correction due to the limited computational and energy resources available on this computing platform.

Measuring the Connectivity Radius • Conceptually, the connectivity radius is thought of in terms of a circular cell. This approach simplifies algorithm analysis and allows a geometric approach. • We have already seen that this conceptualization does not fit the empirical data collected from this study. However, packet loss does decrease monotonically with distance.

The definition of connectivity radius is typically based on a packet loss threshold, which, in turn, is based on the ratio of “good links” to “bad links” • Good Links: Those communication links in which we can use forward error correction (FEC) and other techniques to improve the raw packet throughput to adequate levels. The packet reception probability of such a link is typically above 65%. • Bad Links : Those communication links in which we the use of forward error correction (FEC) and other techniques cannot be employed to boost the throughput to acceptable levels. The packet reception probability of such a link is typically below 25%.

Given the previous definitions, we can define the connectivity radius of a node N to be the radius R of the smallest circle that encompasses 75% of the nodes having a good link with N. • During the analysis they observed a linear variation of the connectivity radius with the transmit power setting on the mote.

Measure the Effects of Asymmetric and Bi-directional Links on Communication • Asymmetric Links: Those communication links that are “good” in one direction and “bad” in the other. • Bi-directional Links: Those communication links that are “good” in both directions.

While asymmetric links arise relatively infrequently in spare wireless networks, they are very common within a field of low-power wireless nodes. • The distribution of asymmetric links over the entire test network is shown below.

The analysis of the data collected during the first experiment reveals that for the range of transmit power settings studied, approximately 5-15% of all links are asymmetric with the percentage increasing inversely with the power setting. • At short distances from the transmitter, a negligible percentage of links are asymmetric, but this percentage grows significantly with increasing distance, especially at lower power settings. • The distribution of bi-directional and asymmetric links over distance is shown below:

Experiment # 2 • 156 nodes were arranged in a 13x12 grid on an open parking structure, with a grid spacing of 2 feet. • The base station was placed in the middle of the base of the grid and initiated flooding periodically, with each period lasting long enough to allow the flood to settle down. • Each receiving node rebroadcast the message immediately upon receipt of the flood message and then squelches all further broadcasts.

Eight different transmit power settings were studied and 10 non-overlapping floods were issued at each of these settings. • Upon receipt of a message, the following information was extracted from the packet and logged in the receiver’s data memory: • Transmitter ID ( 1 – 156 ), which was used to reconstruct the propagation tree. • Two locally generated timestamps, each with a granularity of 16 s. • The first timestamp recorded the total amount of time that a message was stored on a node before being retransmitted. • The second timestamp recorded the interval for which the node was in backoff mode.

Analysis of Experiment # 2 • The analysis from the second set of experimental data focused on the MAC and application layers. • The goals of the analysis were: • Capture different aspects of the message propagation including; maximum backoff interval, reception latency, settling time, useless broadcasts, and collisions. • Analyze the routing tree construction of the epidemic algorithm.

Medium Access Layer Analysis • Maximum Backoff Interval • A metric that captures the contention level within an interference cell, is the maximum backoff interval, which reflects the time till contention subsides in each cell. • The distribution of backoff intervals in the network indicates the extent of contention that each node perceives in the channel. • As transmit power increases, contention grows as a result of interference cell growth. • As contention increases, nodes are forced to backoff for increasingly longer intervals as shown below:

During the analysis they observed that the transmit power setting and the 95% backoff interval threshold were directly proportional to one another. • Insert Table 3 here

Reception Latency • Defined to be the total amount of time required by nodes in the network to receive an epidemic broadcast packet. • As expected, for higher transmit power settings the reception latency decreased proportionally with the network diameter. • An interesting observation is that a significant fraction of the total propagation time was taken to reach the last few nodes in each plot. • The following figure shows the relationship between the reception latency and the network diameter, which refers to the maximum number of hops from the source to any node in the network.

Settling Time • Defined to be the time taken for delivery of a single packet flood throughout the entire network, and is equivalent to the reception latency plus the maximum backoff interval. • The settling time is bounded as shown below: Max( MaxBackoffInterval, Reception Latency) Settling Time  MaxBackoffInterval + Reception Latency

At low transmit power settings, the settling time is closer to the reception latency than the maximum backoff interval. This suggests that the flood propagation delay has a more significant impact than the time taken for broadcasts to subside within the interference cells. • As the transmit power is increased, the settling time moves closer to the maximum backoff interval, suggesting that contention within each interference cell becomes the dominate factor. • The relationship between settling time and reception latency is shown in the following diagrams.

Useless Broadcasts • Defined to be the percentage of rebroadcasts that deliver a message only to nodes that have already received one. Typical causes for such broadcasts include: • All neighbors have already received the message • The rebroadcast suffers packet loss or collision • Analysis of the experimental data revealed that at higher transmit power settings, nodes in the network keep retransmitting the message long after 95% of the nodes already received it. • Conversely, the lowest transmit power setting examined had only a 60% useless broadcast rate.

Collisions • During the analysis they observed that for all power settings, the time required for all nodes to receive the flood is nearly identical. • At very high transmit power, the last 5% of the nodes take as much time to receive their packets as the first 95%. This phenomenon can be attributed to stragglers and backward links. • In broadcast-style epidemic transmission, a packet does not have an intended recipient, so CSMA without RTS/CTS is used. However, we are able to combine the global ordering the packet transmissions and link layer estimates of the communication cell to infer the impact of collisions. • The following charts show the relation between the number of colliding transmitters, stragglers, and backward links.

Epidemic Algorithms

Epidemic Algorithms

Presentation Transcript

Epidemic Algorithms for replicated Database maintenance

The Epidemic

Invisible Epidemic?

EPIDEMIC TECHNIQUES

AIDs Epidemic

1918 Epidemic!

Namibia’s epidemic

Epidemic spreading

The Epidemic

Epidemic Investigation

Epidemic Typhus

Epidemic Techniques

TB Epidemic

Epidemic disease

Epidemic Spreading

The AIDS Epidemic

Epidemic Typhus

Epidemic Techniques

Epidemic

Epidemic Algorithms and Emergent Shape

TB Epidemic

Ebola Epidemic