Packet Classification using Extended TCAMs

Packet Classification usingExtended TCAMs Edward W. Spitznagel, Jonathan S. Turner, David E. Taylor Supported by NSF ANI-9813723, DARPA N660001-01-1-8930

Packet Classification Problem Filter Source Address Destination Address Source Port Destination Port Protocol Action fwd 7 fwd 2 d a c b 11xx 0101 1101 01xx 0010 01xx 101x xxxx - 2-4 3 3-15 - 3-15 * 0-15 UDP ICMP TCP * deny fwd 5 • Suppose you are a firewall, or QoS router, or network monitor ... • You are given a list of rules (filters) to determine how to process incoming packets, based on the packet header fields • Some fields in the rules are specified with bit masks; others with ranges • Goal: when a packet arrives, find the first rule that matches the packet’s header fields

Packet Classification Problem Filter Source Address Destination Address Source Port Destination Port Protocol Action fwd 7 fwd 2 d a c b 11xx 0101 1101 01xx 0010 01xx 101x xxxx - 2-4 3 3-15 - 3-15 * 0-15 UDP ICMP TCP * deny fwd 5 • Example: packet arrives with header (0101, 0010, 3, 5, UDP) • classification result: filter b is matched • filter c also matches, but, b occurs before c in the list • Easy to do when we have only a few rules; very difficult when we have 100,000 rules and packets arrive at 40 Gb/s

Geometric Representation Source Port 6 c b a Filter 010 Source Address xx1 xxx 2-3 0-7 7 Source Port 4 Source Address 2 0 0 2 4 6 • Filters with K fields can be represented geometrically in K dimensions • Example: b c c c c a

Related Work • TCAM-based parallel classification • CoolCAMs (Narlikar, Basu, Zane) for IP lookup • SRAM-based sequential classification • Recursive Flow Classification (Gupta, McKeown) • HiCuts (Gupta, McKeown) • Extended Grid of Tries (Baboescu, Singh, Varghese) • HyperCuts (Singh, Baboescu, Varghese, Wang) • SRAM: 6 transistors per bit (vs. 16 for TCAM), but the SRAM approaches use more bits per filter

Ternary CAMs • Most popular practical approach to high-performance packet classification • Hardware compares query word (packet header) to all stored words (filters) in parallel • each bit of a stored word can be 0, 1, or X (don’t care) • Very fast, but not without drawbacks: • High power consumption limits scalability • inefficient representation of ranges

Ternary CAM - Example Src. Addr. Dest. Addr. Packet: Query: 1110 0110 11100110 TCAM c b Filter a Source Address 11xx xxxx 0xxx Destination Address xxxx 0110 01xx Address Contents 0 11xxxxxx Match! 11100110 1 0xxx01xx Doesn’t Match 11100110 2 xxxx0110 Match! 11100110 Entry 0 (filter a) is the first matching filter

Range Matching in TCAMs Destination Port 6 Filter F Source Port 1-4 3-5 Destination Port 4 Source Port 2 0 0 2 4 6 • Convert ranges intosets of prefixes • 1-4 becomes 001, 01*, and 100 • 3-5 becomes 011 and 10* F

Range Matching in TCAMs Destination Port 6 Filter a b f d e c 01* 100 01* 001 100 Source Port 001 Destination Port 011 10* 10* 011 011 10* 4 Source Port 2 0 0 2 4 6 • With two 16-bit range fields,a single rule could require upto 900 TCAM entries! • Typical case: entire filter setexpands by a factor of 2 to 6 a b c d e f

Extended TCAMs • Extend standard TCAM architecture to enable classification with larger rulesets • Partitioned TCAM, for reduced power • inspired by CoolCAMs • differences in indexing, search and partitioning algorithms • Support range matching directly in hardware

Use of Partitioned TCAM • Main component of power use in TCAM search is proportional to number of entries searched • Partitioning the TCAM: • divide TCAM into blocks of entries • each block is enabled for search via an associated index filter

Use of Partitioned TCAM filter blocks: index filters: 9-10, xxxx 7-7, 110x 1-13, 001x 0-5, 1110 0-15, 0xxx 0-14, 1010 0-6, 1xxx 13-14, 11xx 1-2, 11xx 2-3, 00xx 11-14, 011x 7-15, 1xxx 11-15, 111x 12-12, 01xx 0-15, xxxx • Example: suppose we are given the following filters: a. 1-13, 001x b. 2-3, 00xx c. 9-10, xxx1 d. 11-14, 011x e. 12-13, 0xxx f. 0-14, 1010 g. 7-7, 110x h. 0-5, 1110 i. 1-2, 1x1x j. 13-14, 11xx k. 11-15, 111x A real Extended TCAM would have more blocks, and more filters per block.

Use of Partitioned TCAM 0-15, 0xxx 9-10, xxxx 1-13, 001x 0-5, 1110 7-7, 110x 13-14, 11xx 2-3, 00xx 1-2, 11xx 0-14, 1010 0-6, 1xxx 7-15, 1xxx 11-14, 011x 11-15, 111x 12-12, 01xx 0-15, xxxx • Example: classify packet with header values (2, 1010) • index block: second andfourth filters match • search second and fourthfilter blocks • find matching filters(1-2, 1x1x) and (0-14, 1010) filter blocks: index filters:

Use of Partitioned TCAM 0-15, 0xxx 9-10, xxxx 1-13, 001x 0-5, 1110 7-7, 110x 13-14, 11xx 2-3, 00xx 1-2, 11xx 0-14, 1010 0-6, 1xxx 7-15, 1xxx 11-14, 011x 11-15, 111x 12-12, 01xx 0-15, xxxx • The key to minimizing power consumption: Organize filters so that only a few TCAM blocks must be searched to find the filters matching a packet. • Use a filter grouping algorithm filter blocks: index filters:

i c k j h 14 g 12 f 10 8 d e 6 Index entry filters 0-15, 0xxx a, b, d, e 4 a b 2 0 0 2 4 6 8 10 12 14 a. 1-13, 001x b. 2-3, 00xx c. 9-10, xxxx d. 11-14, 011x e. 12-13, 0xxx f. 0-14, 1010 g. 7-7, 110x h. 0-5, 1110 i. 1-2, 11xx j. 13-14, 11xx k. 11-15, 111x 29 October 201415

i c k j h 14 g 12 f 10 8 6 Index entry filters 0-15, 0xxx a, b, d, e 4 0-6, 1xxx h, i 2 0 0 2 4 6 8 10 12 14 a. 1-13, 001x b. 2-3, 00xx c. 9-10, xxxx d. 11-14, 011x e. 12-13, 0xxx f. 0-14, 1010 g. 7-7, 110x h. 0-5, 1110 i. 1-2, 11xx j. 13-14, 11xx k. 11-15, 111x 29 October 201416

c k j 14 g 12 f 10 8 6 Index entry filters 0-15, 0xxx a, b, d, e 4 0-6, 1xxx h, i 7-15, 1xxx g, j, k 2 0 0 2 4 6 8 10 12 14 a. 1-13, 001x b. 2-3, 00xx c. 9-10, xxxx d. 11-14, 011x e. 12-13, 0xxx f. 0-14, 1010 g. 7-7, 110x h. 0-5, 1110 i. 1-2, 11xx j. 13-14, 11xx k. 11-15, 111x 29 October 201417

c 14 12 f 10 8 6 Index entry filters 0-15, 0xxx a, b, d, e 4 2 0 0-15, xxxx c, f 0 2 4 6 8 10 12 14 a. 1-13, 001x b. 2-3, 00xx c. 9-10, xxxx d. 11-14, 011x e. 12-13, 0xxx f. 0-14, 1010 g. 7-7, 110x h. 0-5, 1110 i. 1-2, 11xx j. 13-14, 11xx k. 11-15, 111x 0-6, 1xxx h, i 7-15, 1xxx g, j, k Next phase: 29 October 201418

14 12 10 8 6 Index entry filters 0-15, 0xxx a, b, d, e 4 2 0 0-15, xxxx c, f 0 2 4 6 8 10 12 14 a. 1-13, 001x b. 2-3, 00xx c. 9-10, xxxx d. 11-14, 011x e. 12-13, 0xxx f. 0-14, 1010 g. 7-7, 110x h. 0-5, 1110 i. 1-2, 11xx j. 13-14, 11xx k. 11-15, 111x 0-6, 1xxx h, i 7-15, 1xxx g, j, k Next phase: 29 October 201419

Creating a set of partitions • At most k filters per region (k = block size) • Regions within the same partition do not overlap • Total number of regions equals the index size

Range Matching Store a pair of values (lo , hi ) for each range match field Range check circuitry compares query values against lo and hi to determine if query is in range Transistors per bit of range field is twice that of ordinary TCAM But, for typical IPv4 applications, this results in just a 22% increase in overall transistor count

Performance Metrics • Power Fraction = • a measure of power usage, relative to a standard TCAM • smaller is better • Storage Efficiency = • higher is better; 1 is optimal index size + (# of partitions)(block size) number of filters number of filters index size + (# of blocks)(block size)

Different Block Sizes Block size=128 Block size=256 Block size=64 Block size =32 Block size=16

Results: Power Fraction Basic Algorithm Refined Blocksize = 256 Block size = 32 Block size = 64 Block size = 128

Results: Storage Efficiency Refined Basic Algorithm Blocksize = 256 Block size = 32 Block size = 64 Block size = 128

Current/Future Work • Computational complexity of filter grouping problem • Filter updates (add/delete operations) • Multi-level indices • Different partitioning algorithms • Application to SRAM/DRAM-based classification techniques

Summary • Packet Classification is important for many advanced network services • TCAMs scale poorly due to power consumption and inefficient range match representations • Extended TCAMs: solve these issues by using partitioned TCAM and hardware support for range matching • power consumption greatly reduced (typically to 5% or less of power used by a standard TCAM) • range match hardware: avoid inefficiency in representing ranges

Questions? ?

Packet Classification using Extended TCAMs

Packet Classification using Extended TCAMs

Presentation Transcript

Packet Classification

Packet Classification using Hierarchical Intelligent Cuttings

Packet Classification

Two-dimensional packet classification algorithm using a quad-tree

Packet Classification # 3

Packet Classification On Multiple Fields

Packet Classification Using Multi-Iteration RFC

Packet Classification using Hierarchical Intelligent Cuttings

OC-3072 Packet Classification Using BDDs and Pipelined SRAMs

IP-Lookup and Packet Classification

Packet classification using diagonal-based tuple space search

Fast Packet Classification Using Multi-Dimensional Encoding

Fast Packet Classification Using Bloom filters

Algorithms for Advanced Packet Classification with TCAMs

Packet Classification on Multiple Fields

Efficient packet classification using TCAMs

High-Speed Packet Classification Using Binary Search on Length

Packet Classification Using Coarse-Grained Tuple Spaces