1 / 22

High-performance TCAM-based IP Lookup Engines

High-performance TCAM-based IP Lookup Engines. Authors: Hui Yu, Jing Chenm Jianpian Wang and S.Q. Zheng Publisher: IEEE INFOCOM 2008 Present: 林呈俞 Date: 2008/9/24. Outline. Introduction Previous works MSMB scheme MSMB-PT scheme MSMB-LPT scheme Goals of this paper Proposed works

quang
Télécharger la présentation

High-performance TCAM-based IP Lookup Engines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High-performance TCAM-based IP Lookup Engines Authors:Hui Yu, Jing Chenm Jianpian Wang and S.Q. Zheng Publisher: IEEE INFOCOM 2008 Present:林呈俞 Date: 2008/9/24

  2. Outline • Introduction • Previous works • MSMB scheme • MSMB-PT scheme • MSMB-LPT scheme • Goals of this paper • Proposed works • M-MSMB-LPT scheme • MSMB-LPT-I scheme • Experimental results

  3. Introduction (1/3) • To achieve high IP lookup performance, it has been proposed to use TCAMs to implement IP-Lookup accelerators. • One TCAM-based routing table is shared by multiple packet streams in one line card or multiple line cards in practice. • Previous works on reconfiguring a TCAM into several independent blocks. • MSMB • MSMB – PT • MSMB – LPT

  4. Introduction (2/3) • MSMB (Multi – Selector and Multi – Block) scheme • Proposed in [6] to reconfiguring a TCAM into several independent blocks so that parallel IP lookup is possible. • With K TCAMs, instead of performing only one lookup in each cycle, all TCAMs can concurrently be used for different lookups. • One would need M parallel RDs for the this system.

  5. Popular-Prefix Table (PT): caching some of the prefixes recently used by all inputs. Introduction (3/3) • MSMB – PT (Popular – prefix table) scheme • This scheme is based on temporal locality of packet destinations. • In order to alleviate the TCAM contention problem caused by traffic bias.

  6. MSMB – LPT (Local PT) (1/2) • A flow is a stream of packets, for which the packets are transmitted as a bursty sequence. • For a given router R, the packets of flows arrive at same input of R exhibit bias of IP streams to a small set of IP prefixes. • For any bursty traffic period of an input of R, the bias of IP addresses is called the temporal locality of flows. • The major difference between MSMB – LPT and MSMB – PT are as follows • MSMB – LPT improve the performance of MSMB – PT by up to 250%(speedup), 80%(hit ratio), 82%(TCAM contention), and 71%(TCAM power consumption). • LPT helps to reduce the number of accesses to the TCAM blocks and TCAM contentions.

  7. Local Popular-Prefix Table (LPT): it used to dynamically store recently referenced IP prefixes requested from input i. Contention Resolver (CR): chooses one request according to a priority scheme and passes it to TCAM. MSMB – LPT (Local PT) (2/2)

  8. Goals of this paper • How to design a TCAM-based IP lookup engine that • improves MSMB-LPT without using more HW resources ? • satisfy given performance requirements ? • For lage m (inputs) • How to design a scalable TCAM-based IP lookup engine ? • How to find tradeoffs among cost, performance and reliability ?

  9. Proposed work (1/5) • Definitions: • MSMB – LPT has a configuration with (m, n, k) • m input • k TCAM blocks • LPT of size n • Total number of prefixes M (each block contains M/k prefixes). • The parameters m and k are carefully selected to achieve optimized cost and performance. • Are there better MSMB schemes for given m and k ? • Two proposed schemes: • M – MSMB – LPT • MSMB – LPT – I

  10. Proposed work (2/5) • Multiple(M) – MSMB – LPT • For large m (input), we propose to use w identical copies of MSMB – LPT of configuration (m’, n, k). • input i*m’ + jas the j-th input of the (i+1)-th MSMB-LPT. m’ = m / w

  11. Input(j-1)*m’ + 1 … Input(j-1)*m’ + 2 MSMB - LPTj k CRs and k TCAM blocks Inputj*m’ Proposed work (3/5) • Multiple(M) – MSMB – LPT • The w TCAM clocks TCAMj,u,have the same content as TCAMu in MSMB-LPT, where j = 1 ~ w. • We say that an M-MSMB-LPT has configuration (m, n, w, k). • if it has wMSMB-LPTs of configuration (m’, n, k). • In an M-MSMB-LPT scheme, w MSMB-LPTs operate completely independently.

  12. Proposed work (4/5) • MSMB – LPT – Interleaved TCAMs (MSMB – LPT – I) • An MSMB – LPT – I of configuration (m, n, w, k)has • m input, and the LPT of size n. • wk TCAM blocks that are partitioned into k groups, each called TCAM bundle. k bundles Input 1 Input 2 The w TCAM blocks in the j-th TCAM bundle contain the same content as that of TCAMjin the MSMB-LPT scheme. Input m

  13. Proposed work (5/5) Process runs concurrently j = 1~ k i = 1~ m ni – th key from input i • The concurrent TCAM – search processes are coordinated by CR, which can be implemented as a round robin m – to – w selector.

  14. Experimental results (1/9) • We conduct a serious simulations on M-MSMB-LPTand MSMB-LPT-I. • First – in – first – out (FIFO) replacement policy is used for LPT update. • Round – rodin (RR) arbitration is used for TCAM contention resolution. • Two packet traces are used in simulations. • 1. generating accroding to routing table described in [17]. • 2. derived from actual packet flows given in [19]. • The performance of an M-MSMB-LPT is determined by a single component MSMB-LPT. • The performance of MSMB-LPT and M-MSMB-LPT can be derived from the performance of MSMB-LPT-I with configurations (m, n, w, k)as follows. • (m, n, 1, k) = MSMB-LPT with (m, n, k). • (m, n, 1, k) = M-MSMB-LPT with (w*m, n, w, k). • Example: • MSMB-LPT-I with (6, n, 1, 4) can be used to indicate the performance of M-MSMB-LPT with (12, n, 2, 4) as well as (18, n, 3, 4) # bundles # blocks

  15. Experimental results (2/9) • Performance metrics • TCAM contention ratio • Speedup over naïve MSMB • TCAM utilization # contentions at TCAM blocks Total # key search time. Total # parallel cycles to complete IP lookup for all packets in a trace. AMSMB-LPT-I(j) : total # cycles in which TCAMj blocks is searched.

  16. Experimental results (3/9) • Power consumption

  17. Experimental results (4/9) • Speedup 48 TCAM blocks 16 TCAM blocks

  18. Experimental results (5/9) • Power consumption

  19. Experimental results (6/9) • Contention ratio • 36 inputs and 4 TCAM blocks in each bundle. • Increase the number of TCAM bundles. • From 1 to 2 • From 4 to 6 (36, n, w, 4)  w = 1, 2, 4, 6 1 2 3 4

  20. Experimental results (7/9) • Given the available TCAM resource such as • # TCAM bundles – 2 • # TCAM blocks in each bundle – 4 • It is important to know the expected contention ratio under different inputs. (m, n, 2, 4)  m = 6, 12, 18, 36 36 18 12 6

  21. Experimental results (8/9) • Speedup gain of increasing the TCAM bundle for a given # inputs. (36, n, w, 4)  w = 1, 2, 4, 6 6 4 2 1

  22. Experimental results (9/9) • The speedup changes with the number of inputs. (m, n, 2, 4)  m = 6, 12, 18, 36

More Related