Efficient HEXA-Based Data Structures for Faster Packet Processing in Network Applications

HEXA : Compact Data Structures for Faster Packet Processing Authors: Sailesh Kumar, Jonathan Turner, Patrick Crowley ,and Michael Mitzenmacher Publisher:ICNP 2007 Present:Yu-Tso Chen Date:JUNE, 10, 2008 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

Outline • 1. Introduction • 2. Proposed Approach • 3. Architecture Description • 4. Performance Evaluation

Introduction • HEXA (History-based Encoding, eXecution and addressing) • Directed graphs are commonly used to implement various packet processing algorithms • Used a fixedconstant numbers of bits per node for structured graphs such as tries

Motivating Example Standard representation 1. 0, 2, 3 2. 0, 4, 5 3. 1, NULL, 6 4. 1, NULL, NULL 5. 0,7, 8 6. 1, NULL, NULL 7. 0, 9, NULL 8. 1, NULL, NULL 9. 1, NULL, NULL 1. - 2. 0 3. 1 4. 00 5. 01 6. 11 7. 010 8. 011 9. 0100 HEXA identifier by the input stream

Motivating Example • HEXA requires a hash function • Store only 3 bits worth of information for each node of trie • First bit is valid prefix • Second and third bits are if node has a left and right child

Motivating Example 1. f(-) =4 2. f(0) =7 3. f(1) =9 4. f(00) = 2 5. f(01) = 8 6. f(11) = 1 7. f(010) = 5 8. f(011) = 3 9. f(0100) = 6 • Require a minimal perfect hash function • Maintaining the one-to-one mapping 1. h(00 -) = 0 2. h(00 0) = 1 3. h(01 1) = 4 4. h(11 00) = 7 5. h(10 01) = 2 6. h(00 11) = 8 7. h(11 010) = 6 8. h(11 011) = 5 9. h(01 0100) = 3 Fast path Next hop

Example • Input stream 011 • Start at index f(-) = 4 next input is 0 has left childf(0) = 7 • f(0) next input is 1 has right childf(01) = 8 • f(01) next input is 1 has right childf(011) = 3 match and no child stop search. • Then to read the next hop is P4

Child’s discriminator give parent 1. h(00 -) = 0 2. h(00 0) = 1 3. h(01 1) = 4 4. h(11 00) = 7 5. h(10 01) = 2 6. h(00 11) = 8 7. h(11 010) = 6 8. h(11 011) = 5 9. h(01 0100) = 3 Fast path Next hop

Example (to be cont.) • Input stream 011 • Start at index h(00 -) = 0 next input is 0 has childh(00 0) = 1 • h(00 0) next input is 1 has childh(10 01) = 2 • h(01) next input is 1 has childh(011) = 5 match and no child stop search. • Then to read the next hop is P4

Devising One-to-one Mapping • Simplify the problem (perfect hash function) by HEXA identifier of a node can be modified without changing its meaning and keeping it unique. • Node identifier to contain few additional (say c) bits • We call these c-bits the node’s discriminator • Hence up to 2c memory locations, from which we have to pick just one • Use multiple-choice hashing and cuckoo hashing

Devising One-to-one Mapping • Instead of storing a bit for each left and right child • We store the discriminator if the child exists. • All-0 c-bit word to represent NULL • Only 2c-1 memory locations • This problem can be viewed as a bipartite graph matching problem. • G=(V1+V2, E) • Left set – Original directed graph • Right set – Locations memory • 2c edges connected to random right vertices

Memory mapping graph

Updating a Perfect Matching • Deletions are easy (Check by Fig.1) • Simply remove the relevant node from the hash table (and update pointers to that node) • Insert a node but its hash locations are already taken • Find a augmenting path in G • Augmenting path can be found via a BFS [1] • Remapping other nodes to other locations • By changing their discriminator bits

Bounded HEXA (BHEXA) 1. no,2, 1,7 2. no, 2, 3, 7 3. no, 2, 4, 6 4. no,5, 1,7 5. match, 2, 3, 7 6. match, 8, 1, 7 7. no,8, 1,7 8. no, 2, 9, 7 9. match, 2, 4, 6 • Cyclic graph – Aho-Corasick 1. - 2. -, a 3. -, b, ab 4. -, b, bb, abb 5. -, a, ba, bba, abba 6. -, c, bc, abc 7. -, c 8. -, a, ca 9. -, b, ab, cab

Motivating Example • Root node is “-” • All incoming edges into node 2 are labeled with “a” • Thus its identifier can either be – or a • The identifier of node 7 can be – or c • Both paths 1-a->2-b->3 and 9-b->4-a->5-b->3 lead to the node 3 • And two symbols in these paths are identical • Its identifier can either be – or b or ab • We must ensure the ones we choose are unique.

Memory Mapping • If a node has k choices then up to log2k additional bits to indicate the length of its identifier • Node 5 has 5 choices；hence 3-bits may be needed • Only c + log2k bits worth of information is required to be stored

Memory mapping graph 1. h(-) = 0 2. h(a) = 1 3. h(ab) = 5 4. h(bb) = 6 5. h(bba) = 9 6. h(bc) = 8 7. h(c) = 3 8. h(ca) = 4 9. h(b) = 2

Memory Mapping • We only store the length of bHEXA identifiers in the memory 1. h(-) = 0 2. h(a) = 1 3. h(ab) = 5 4. h(bb) = 6 5. h(bba) = 9 6. h(bc) = 8 7. h(c) = 3 8. h(ca) = 4 9. h(b) = 2 Standard implementation (13-bits per node) bHEXA uses about half memory (7-bits per node)

Example • Input stream abba • Start at h(-) = 0 next input is a h(a) = 1 • Next input is b h(ab) = 5 then to h(bb) = 6 • Then can find h(bba) = 9 match

Experimental Evaluation

Experimental Evaluation • Multi-bit tries

Incremental Updates PDF of the number of memory operations required to perform a single trie update. Left trie size = 100,000 nodes, Right trie size = 10,000 nodes.

Result on Strings • Implemented with Aho-Corasick • Spill fraction – number of automaton nodes that could not be mapped to a memory location • Summarize bHEXA achieve between 2-5 fold reductions in the memory

Result on Strings Cisco622 有很多state 有相同的bHEXA identifiers Plotting spill fraction: a) Aho-Coroasick automaton for random strings sets, b) Aho-Coroasick automaton for real world string sets, and c) random and real world strings with bit-split version of Aho-Corasick. 4 state-machines, each handling two bits of the 8-bit input character

Tanks for your attention

Devising One-to-one Mapping • If we require m=n, c is loglogn + O(1) to ensure perfect matching exists with high probability • Fig.2 we assume that the hash function is simply the numerical value of the identifier modulo 9

Memory Mapping • m=10 • h=(sum from i=1to k Si x i )mod10 • - = 0, a = 1, b = 2,c = 3

Efficient HEXA-Based Data Structures for Faster Packet Processing in Network Applications