330 likes | 453 Vues
Basic Data Structures for IP lookups and Packet Classification. Prefix. Length format : b n-1 …b 0 / l ( l is prefix length) In IPv4, d3.d2.d1.d0/ l can also be used. Mask format : b n-1 …b 0 /m n-1 …m 0 (prefix length is l )
E N D
Basic Data Structures for IP lookups and Packet Classification
Prefix • Length format: bn-1…b0/l (l is prefix length) • In IPv4, d3.d2.d1.d0/l can also be used. • Mask format: bn-1…b0/mn-1…m0 (prefix length is l) • mj = 1 for all n – 1 j n – l+1, and mj =0 otherwise. • d3.d2.d1.d0/ m3.m2.m1.m0 for IPv4. • Ternary format: bn-1…bn-l+1*…* (prefix length is l) • bj = 0 or 1 for n – 1 j n – l + 1. • If tk is *, then tj must also be * for all j < k. • A single don’t care bit can be used to denote a series of don’t care bits, e.g., 1* denotes 1**** in the 5-bit address space.
Prefix • (n+1)-bit format: bn-1…bn-l+110…0 (l is prefix len) • for the prefix bn-1…bn-l+1* of length l in ternary format, there is one trailing ‘1’ followed by n – l 0’s. • or • (n+1)-bit format: bn-1…bn-l+101…1 • for the prefix bn-1…bn-l+1* of length l in ternary format, there is one trailing ‘0’ followed by n – l 1’s.
5-bit Prefixes: bn-1…bn-l+110…0 ***** 0**** 00*** 11*** 1 1 1 * * 0 0 0 * * 0 0 0 0 * 0 0 0 1 * 1 1 1 0 * 1 1 1 1 * 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 1 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 1 1 0 0 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 6-bit binary address space 000000 is not used
5-bit Prefixes:bn-1…bn-l+101…1 ***** 0**** 00*** 11*** 1 1 1 * * 0 0 0 * * 0 0 0 0 * 0 0 0 1 * 1 1 1 0 * 1 1 1 1 * 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 1 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 1 6-bit binary address space 111111 is not used
Prefix properties • Disjoint prefixes: • Two prefixes are said to be disjoint if they do not share any address. • Prefix enclosure: • A = bn-1…bj…bi* and B = bn-1…bj* and j > i. • Prefix A is enclosed by B (B A) since the IP address space covered by A is a subset of that covered by B, where is the enclosure operator. • A special case of overlapping. • Prefix comparison • The inequality 0 < * < 1 is used to compare two prefixes in the ternary representation of prefixes.
5 1 3 2 1 1 2 2 1 1 3 2 1 1 1 1 1 1 3 2 1 2 1 1 1 3 2 1 2 4 4 Prefix properties • The most specific prefixes (MSP): • The prefixes that do not cover any others. • Disjoint, so can be put in an array for binary search • Grouping prefixes in layers based on MSP. • Six layers at most for IPv4 tables
Prefix properties Number Prefix length
Prefix Forwarding table example • P1 is disjoint from the other three prefixes. • P2 P3 P4 • Longest prefix match(LPM), not exact match • enclosure makes (1) sorting prefixes and (2) binary searching prefixes difficult
Example Forwarding Table • Longest prefix match(LPM), not exact match • Prefix enclosure makes (1) sorting prefixes and (2) binary searching prefixes difficult. • So, trie based schemes emerge naturally
Add P5=1110* 0 P5 I Binary Trie (Radix Trie) Trie node Lookup 10111 A next-hop-ptr (if prefix) 1 B right-ptr left-ptr 1 C D 0 P2 1 1 F E P1 0 G P3 1 H P4
Binomial spanning tree 1111 1110 1100 2 1 0 3 0000 1000 0000 3 1000 2 1100 1 1110 0 1111 • A 4-cube and its corresponding binomial spanning tree.
Perfect code: Hamming code (7, 4) • 7-cube example: 0000000 1000000 0100000 0010000 0001000 0000100 0000010 0000001 = 7-cube 24(16) one-level binomial spanning trees
1 0 0 0 1 1 0 1 1 0 1 1 0 0 0 1 0 0 1 0 1 H7 = G7 = 1 0 1 1 0 1 0 0 0 1 0 0 1 1 0 1 1 1 0 0 1 0 0 0 1 1 1 1 Perfect code: Hamming code (7, 4) (a) Parity-check and generator matrices of Hamming code (7, 4). Syndrome ErrorPattern Inner product Transpose 000 0000-000 001 0000-001 010 0000-010 011 0010-000 100 0000-100 101 0100-000 110 1000-000 111 0001-000 r = received code Syndromes = (s2 s1 s0) = r.H7T Corrected code = r+ ErrorPattern[s] (c) Decoding table
Perfect code: Hamming code (7, 4) uCodeword 0000 0000-000 0001 0001-111 0010 0010-011 0011 0011-100 0100 0100-101 0101 0101-010 0110 0110-110 0111 0111-001 1000 1000-110 1001 1001-001 1010 1010-101 1011 1011-010 1100 1100-011 1101 1101-100 1110 1110-000 1111 1111-111 Generate 16 Codewords u.G7 16 codewords
Perfect code: Golay code (23, 12) • 212 3-level binomial spanning trees • C(23,0)+C(23, 1)+C(23,2)+C(23,3) = 1 + 23 + 23*22/2 +3*22*21/(3*2) = 24 + 23*11 + 23*11*7 = 24 + 253*8 = 24 + 2024 = 2048 = 211
Ranges • Why ranges? • Prefixes can also be represented by ranges. • The source/destination port fields of rule tables for packet classification are ranges. • Prefixes are special cases of ranges. • Prefix bn-1…bn-l+1* of length l is the range of addresses from bn-1…bn-l+10…0 to bn-1…bn-l+11…0, denoted as [bn-1…bn-l+10…0, bn-1…bn-l+11…0]. • Overlapping: • Two ranges are overlapping if they are not disjoint. • Partially overlapping: • Two ranges are partially overlapping if they are neither disjoint nor enclosing.
Elementary Intervals for Ranges • Definition: Let the set of k elementary intervals constructed from a set R of ranges in the address space of 0 … N – 1 be X = {Xi | Xi = [ei, fi], for i = 1 to k}. • X must satisfy the following: • e1 = 0 and fk = N – 1, • fi = ei+1 – 1 for i = 1 to k – 1, • all addresses in Xi are covered by the same subset of R (called the range matching set of Xi) denoted by EIi, and • EIiEIi+1, for i = 1 to k – 1.
Elementary Intervals for Ranges ID Prefix Range Minus-1 Traditional start finish start finish P1 000000/2 [0, 15] - 15 0 15 P2 010000/2 [16, 31] 15 31 16 31 P3 000100/4 [4, 7] 3 7 4 7 P4 100000/1 [32, 63] 31 - 32 63 P5 010110/5 [22, 23] 21 23 22 23 P6 110000/2 [48, 63] 47 - 48 63 P7 110000/4 [48, 51] 47 51 48 51 P8 110111/6 [55, 55] 54 55 55 55 P9 100000/3 [32, 39] 31 39 32 39
Elementary Intervals for Ranges • Graphical view EI1 {P1} X1 [0, 3] EI2 {P1,P3} X2 [4, 7] EI3 {P1} X3 [8, 15] EI4 {P2} X4 [16, 21] EI5 {P2,P5} X5 [22, 23] EI6 {P2} X6 [24, 31] EI7 {P4,P9} X7 [32, 39] EI8 {P4} X8 [40, 47] EI9 {P4,P6,P7} X9 [48, 51] EI10 {P4,P6} X10 [52, 54] EI11 {P4,P6,P8} X11 [55, 55] EI12 {P4,P6} X12 [56, 63]
Segment Tree w 23 y z 7 47 P1 P4P6 u v g q 15 3 54 31 15 P1 P3 P2 X3 [8,15] X1 [0,3] X2 [4,7] X6 [24,31] h s r P2 P4 t 21 39 51 55 leaf node P5 P9 P7 P8 X4 [16,21] X5 [22,23] X7 [32,39] X8 [40,47] X9 [48,51] X10 [52,54] X11 [55,55] X12 [56,63]
Interval Tree • Each node in an interval tree is associated with a key which must be covered by at least one range. • Depending on whether a node can store 1 or 1+ range, • fat interval tree • each node is allowed to store more than one range. • The number of nodes in the interval tree is O(N). • To insert a range R = [e, f], if R covers root’s key, R is stored in the root. Otherwise, R is inserted in the left (right) subtree of the root when f is smaller (e is larger) than the key of the root. • When R does not cover the key of any node which is traversed, a new node with the key selected from addresses e to f is created and inserted as the left or right child of the node which was last visited. • O(logN + k) time, k is # of prefixes that match the given address. • Prefix insertion and deletion are very expensive because ranges in some nodes may need relocations after tree rotations.
Interval Tree • thin interval tree: • each node of the interval tree stores exactly one range. • Since ranges may overlap, two comparison rules are used to compare if a range is smaller or larger than another range. For two ranges R1 = [e1, f1] and R2 = [e2, f2], • R1 < R2 if e1 < e2. If tie, the second rule applies. • R1 < R2 if R2 is a subrange of R1 (i.e. e1 = e2 and f2 < f1). • Also, a node stores a max value, Max(the finish endpoints of all ranges) stored in the subtree rooted at that node. • In contrast with the fat interval tree, prefix insertion and deletion take O(logN) time. However, O(min{N, klogN}) time is needed to find the longest matching prefix as well as the highest-priority matching prefix, where k is the number of matched prefixes for a given address.
Hash Table • Narrowing down the search space. • Index = Hash_function(key)%m, where key may be the first k bits of IP addresses and m is the size of the hash table. • Perfect hash: no collision • Minimal perfect hash: A perfect hash, where the size of its hash table is k for k different hashing keys.
Hash Table • Difficulties: prefixes and ranges can not be used as the keys of the hash functions directly. Array of m elements H(k1)%m k2 k1 H(k2)%m collision
Hash Table: 8-bit Segmentation table • A 8-bit segmentation table is usually used for IPv4 forwarding tables because there is no prefix of length shorter than 8. Array of 256 elements 0 Prefix: 0.x.y.z H(prefix)%256 (MSB 8 bits of prefix) 1 Prefixes with the same first 8 MSB bits Maybe empty set 255
Hash Table: 16-bit Segmentation table • Prefixes of length <= 16 must be stored properly. • For example, duplicate 0.0.b.c/15 into buckets 0 and 1 or store the port of 0.0.b.c/15 into elements 0 and 1. • Put them into another set (good for update but need to search two sets in the worst case). Array of 216 elements 0 Prefix: 0.0.y.z H(prefix)%216 (MSB 16 bits of prefix) 1 Prefixes with the same first 16 MSB bits Maybe empty set 216-1 Prefixes of length 16
Hash Table: Compression • Since there are many empty elements in the segmentation table, we can use bitmap to compress the segmentation table. 216-Bitmap containing M 1’s Array of M elements 0 Prefix: 0.0.y.z 1 1 0 0 . . . 0 1 1 0 0 1 1 Prefix: 0.1.y.z Prefixes with the same first 16 MSB bits Must be non-empty M-1
Bloom filter • H1(key) = P1 • H2(key) = P2 • H3(key) = P3 • H4(key) = P4 • … • Hk(key) = Pk • Hi() is a hash function, e.g. MD5 Bit vector of m bits 1 1 m bits 1 1
Bloom filter • After inserting n keys (kn bits), the probability that a particular bit is still 0 is (1-1/m)kn • So, the probability of a false positive is • p for the right-hand side is minimized when k = ln2m/n • m/n = 6, k = 4: p = 0.0561 • m/n = 8, k = 6: p = 0.0215 • m/n=12, k = 8: p =0.00314 • m/n=16, k=11: p =0.000458
Bloom filter • Update: • Update whole SC • Threshold: when the digests differ beyond a threshold, say, 5% or 10%, • Regular time intervals: every say 5 mins,
Counting Bloom filter • Deletion operation for local digest: • For each bit in the m-bit vector, use an l-bit counter to record the number of times that a particular bit is turned on by different URLs • l = 4 by experience • If deletion is not supported, cache summary must be rebuilt from scratch on a periodic basis to erase stale bits and prevent bit pollution