510 likes | 682 Vues
TCP STREAM PROCESSING AT GIGABIT LINE RATES. David Vincent Schuehler Dissertation Defense Washington University in St. Louis Department of Computer Science and Engineering November 3, 2004. Outline. Motivation and Background Architecture and Related Work Live Internet Traffic Processing
E N D
TCP STREAM PROCESSINGAT GIGABIT LINE RATES David Vincent Schuehler Dissertation Defense Washington University in St. Louis Department of Computer Science and Engineering November 3, 2004
Outline • Motivation and Background • Architecture and Related Work • Live Internet Traffic Processing • Conclusion and Future Work
Motivation • Inspect data moving through networks • Enable application level data processing • Secure networks • Safeguard confidential data • Detect and prevent intrusions • Worms, viruses, spam, espionage • Mitigate denial of service attacks • Characterize and analyze network traffic • Operate at multi-gigabit data rates
Transmission Control Protocol • 86% to 90% of all Internet traffic uses TCP • Web, email, file transfer, remote login, secure communications • Provides virtual bit pipe between two end systems • Retransmission services • Data reordering services • Flow control services • Congestion avoidance services
Design Requirements • Architecture that is fast • Hardware-based system • High-performance (multi-gigabit networks) • Per-flow context storage & retrieval • Architecture that is scalable • Performance improves with advances in technology • In-line traffic processing model • Implementation using reasonable resources • FPGA implementation can be done in research lab • Framework that is flexible • Integrates with multiple applications • Multi-device coordination of TCP stream processing
Outline • Motivation and Background • Architecture and Related Work • Live Internet Traffic Processing • Conclusion and Future Work
TCP Processing Engine TCP Processing Engine Frame FIFO Checksum Engine Control & State FIFO Input State Machine Output State Machine TCP State Processing Flow Hash Computation State Store Manager
Challenges and Design Choices • Performance • Operate at multi-gigabit data rates • Hardware-based design exploiting pipelining and parallelism • Flow classification • Open addressing hash with limited bucket sizes • Context storage and retrieval • Requires memory read and write for each packet • 64-byte per-flow context - use burst read/write operations • Reassembly of out-of-order packets • Multiple processing modes (guaranteed and passive) • TCP processing • Flow monitoring instead of flow termination
Systems with TCP Processors • Load balancing systems • Content (cookie) based request routing • Delayed binding technique • Limited to scanning start of flow • TCP offload engines • Move TCP protocol processing to NIC • Targeting Gigabit NIC market • Intel, NEC, Adaptec, Lucent, and others • SSL Accelerators • Offload encryption/decryption • Protocol translation • Intrusion Detection Systems • Traffic Rates < 1Gbps • Perform content scanning and some stream reassembly
Related Work in TCP Processing • Software-based TCP processing • Ethereal, tcpdump, etc – require post processing • Snort w/TCP option – larger virtual packets • Cluster-based online monitoring system (Mao: WIDM’01) • Bro – rule based processing (Paxson: Computer Networks’99) • STAT/STATL – state based processing (Vingna: DISCEX’00) • Intel – Xeon as packet processor (Regnier: HotI’03) • Hardware-based TCP processing • Georgia Tech – 1 flow/circuit (Necker: FCCM’02) • University of Oslo – 1 flow/ circuit (Li: FPL’03) • Indiana University and Imperial College – Netflow statistics • University of Tokyo – multi-flow stream scanning (Sugawara: FPL’04) • Intel TCP processor – 8k connections, 9Gbps (Xu: HotChips’03) • Network processors • Intel IXP 1200, 2400, 2800, 2850 • Motorola PowerQUICC
Multi-Device Coordination • Encodes interface signals • Regenerates waveforms on separate device • Provides extensible format & self describing structure
Place & Route Results • Including Protocol Wrappers & Encoder/Decoder • Target Xilinx Virtex XCV2000E-8 • FPX Platform • Number of BLOCKRAMs • 95 out of 160 (59%) • Number of SLICEs • 7279 out of 19200 (37%) • Maximum clock frequency: 85.565MHz • Maximum data throughput: 2.7 Gbps • Maximum packets per second: 2.9M packets/sec • Min 29 clock cycles per packet (345 ns) • Throughput limited by memory latency
Content Scanning TCP circuit Scan circuit Control Interface Network Traffic
Outline • Motivation and Background • Architecture and Related Work • Live Internet Traffic Processing • Conclusion and Future Work
Washington University Network • 384 Mbps total Internet bandwidth • 300 Mbps Internet • 84 Mbps Internet2 • Approx 19,000 active end systems • Approx 10,000 students • Traffic analyzed for 5 week period • Aug 20th to Sep 24th • Over 1000 charts generated • Selected highlights presented
Washington University Network Internet / Internet2 To TCP Processor
Live Internet Traffic Analysis WUGS-20 Standalone FPX-in-a-Box External Stats Monitor
Collected Statistics Port Statistics TCP Statistics RTR Bypass Packets EGR Client Packets In EGR Bypass Packets In EGR TCP Checksum Update EGR Packets Out Configuration Information SSM New Connections SSM End Connections SSM Reused Connections SSM Active Connections INB Input Words INB Input Packets INB Dropped Packets INB Output Packets ENG TCP Packets ENG SYN Packets ENG FIN Packets ENG RST Packets ENG Zero Length Packets ENG Retransmitted Packets ENG Out-of-Sequence Pkts ENG Bad Checksums RTR TCP Data Bytes RTR Client Packets FTP SSH Telnet SMTP TIM Nameserv Whois Login DNS TFTP Gopher Finger HTTP POP SFTP SQL NNTP NetBIOS SNMP BGP GACP IRC DLS LDAP HTTPS DHCP Lower Upper Protocol Statistics Cells In Cells Dropped Cells Bypass Cells Out Frame Words In Frame Packets In IP Packets Dropped IP Packet Fragments IP Packets In IP Words In IP Packets Bypass IP Words Bypass IP Bad Checksum Scan Statistics String 1 String 2 String 3 String 4
Typical Daily Traffic Pattern Lowest activity Highest activity
IP and TCP Traffic Rates >90% TCP packets
Zero Length TCP Packets 20-40% zero length pkts
Fragmented IP Packets .25% Fragmented
Packet Sequencing 3x-4x more retransmitted
Packet Sequencing (cont) 3%-4% Retransmitted 1% Out of Seq
Worm/Virus Detection • Search for digital signatures • MyDoom (appeared 1/26/04) • Spread via email attachment • Opens back door via ports 3127-3198 • Contains SMTP engine to replicate itself • Contains denial of service attack (25% operational) • At Peak, 1 in 12 emails contained virus • Netsky (appeared 3/1/04) • Spread via email attachment • Scans drives C through Z looking for email addresses • Contains SMTP engine to replicate itself
Denial of Service Attack • TCP SYN Attack • 8 minutes in duration • 71,000 TCP pkts/sec avg (34,000 normal) • 40,000 TCP SYN pkts/sec avg (2,000 normal) • IP attack (non TCP traffic) • 3.5 minutes in duration • 91,000 IP pkts/sec peak (36,000 normal) • 57,000 Non-TCP pkts/sec peak (2,000 normal)
Attack Difficult to Detect TCP: 10:25 to 10:34am IP: 10:37 to 10:41am
Both Attacks Visible Non-TCP attack TCP attack
TCP SYN Attack 20x increase in SYN packets
Attack Directed at SSH Port counter saturated True spike at 2.4 M pkts
Non-TCP Attack 29x increase in non-TCP packets
Flow Classification and Attacks • State store contains 1 million records • Record removed after TCP FIN or RST • Stale records are not aged out • 500,000 to 800,000 active records normal • DoS attack can cause flow saturation • Table quickly settles back to normal range
Active State Store Records 400,000 new flows
Outline • Motivation and Background • Architecture and Related Work • Live Internet Traffic Processing • Conclusion and Future Work
Insights • 20%-40% zero length packets • Increase from 18% to 22% (Shalunov: Internet2‘01) • Implies larger amount of 1-way traffic • Optimization skips processing of these packets • 5% out of order packets • Agrees with results from (Jaiswal: Infocom‘03) • Flow classification tables need to be larger • Flow table ½ to ¾ full during normal processing • 1M entry table saturated during attack • Automated response systems required • Short lived attacks difficult to address manually
Contributions • Developed Architecture for TCP-Processor • Hardware-based system • High-performance (multi-gigabit networks) • Per-flow context storage & retrieval • Implemented TCP-Processor in Reprogrammable Hardware • Operates at 85Mhz on Xilinx Virtex 2000E FPGA • Maximum throughput of 2.7 Gbps • Maximum 2.9M packets/sec • Created inter-device protocol TCP applications • Multi-device coordination of TCP stream processing • Interfaces with TCP-Processor • Self-describing/extensible transport protocol • Analyzed live Internet traffic • Insight into Internet traffic profiles • Supported academic and commercial endeavors
Future Work • Packet defragmentation • Flow classification • Packet storage manager • 10Gbps and 40Gbps data processing • Histogram (packet size, packet type, etc) • Event rate detection • Traffic sampling and real-time analysis • Application integration
Advisor & committee John Lockwood (advisor) Chris Gill Ron Loui Ron Indeck Dave Schimmel ARL faculty & staff Jon Turner Patrick Crowley Fred Kuhns John DeHart CSE faculty & staff ARL & FPX students NTS Steve Wiese Global Velocity Matthew Kulig Reuters (formerly Bridge) Scott Parsons Deb Grossman John Leighton Recommendations Scott Parsons Don Bertier Andy Cox Chris Gray Reviewers Tanya Yatzeck James Hartley Family Jerry & Lois (parents) Chris & Kreslyn Nancy, Jeff & Nathan Friends Acknowledgments