OS support for (flexible) high-speed networking

uspace u k n kspace nspace OS support for (flexible) high-speed networking Herbert Bos Vrije Universiteit Amsterdam

CPU MEM NIC we’re hurting because of • hardware problems • software problems • language problems • lack of flexibility • network speed may grow too fast anyway • multiple apps problem

What are we building? different barriers flows

uspace kernel What are we building? if (pkt.proto == TCP) then if (http) then scan webattacks; else if (smtp) scan spam;scan viruses fi else if (pkt.proto == UDP) then if (NFS traffic) mem = statistics (pkt); else scan SLAMMER; fi • reconfigure when load changes • dynamic optical infrastructure: • - configure at startup time- construct datapaths

- no ‘vertical’ copies - no ‘horizontal’ copies within flow group - more than ‘just filtering’ in kernel (e.g.,statistics) Application B ‘filter’ reduce copying • FFPF avoids both ‘horizontal’ and ‘vertical’ copies Application A U K

minimise copying, context switching flowgraph = DAG different copying regimes endianness new languages combine multiple requests control  dataplane different buffer regimes

Software structure applications FFPF toolkit MAPI FFPF userspace libpcap userspace-kernel boundary FFPF kernelspace host- NIC boundary FFPF ixp2xxx

current status • working for Linux PC with • NICs • IXP1200s and IXP2400s (plugged in) • remote IXP2850 (with poetic license) • rudimentary support for distributed packet processor • speed comparable to tailored solutions

Thank you! http://ffpf.sourceforge.net/

bridging barriers

Three flavours • Three flavours of packet processing on IXP & host • Regular • copy only when needed • may be slow depending on access pattern • Zero copy • never copy • may be slow depending on access pattern • Copy once • copy always 11

combining filters (dev,expr=/dev/ixp0) >[ [ (BPF,expr=“…”) >(PKTCOUNT) ] | (FPL2,expr=“…”) ]> (PKTCOUNT)

Example 1: writing applications • int main(int argc, char** argv) { • void *opaque_handle; • intcount, returnval; • /** pass the string to the subsystem. It initializes and returns an opaque pointer */ • opaque_handle = ffpf_open(argv[1], strlen(argv[1])); • if(start_listener(asynchronous_listener, NULL)) • { • printf (WARNING,“spawn failed\n"); • ffpf_close (opaque_handle); return -1; • } • count = ffpf_process(FFPF_PROCESS_FOREVER); • stop_listener(1); • returnval = ffpf_close(opaque_handle); • printf(,"processed %d packets\n",count); • return returnval; • }

Example 1: writing applications • void* asynchronous_listener(void* arg) { • int i; • while (!should_stop()) { • sleep(5); • for (i=0;i<export_len;i++) { • printf("flowgrabber %d: read %d, written%d\n", i, get_stats_read(exported[i]), get_stats_processed(exported[i])); • while ((pkt =get_packet_next(exported[i]))) { • dump_pkt(pkt); • } • } • } • return NULL; • }

Example 2: FPL-2 • new pkt processing language: FPL-2 • for IXP, kernel and userspace • simple, efficient and flexible • simple example: filter all webtraffic • IF ( (PKT.IP_PROTO == PROTO_TCP) && (PKT.TCP_PORT == 80)) THEN RETURN 1; • more complex example: count pkts in all TCP flows • IF (PKT.IP_PROTO == PROTO_TCP) THEN R[0] = Hash[ 14, 12, 1024]; M[ R[0] ]++;FI

FPL-2 • all common arithmetic and bitwise operations • all common logical ops • all common integer types • for packet • for buffer ( useful for keeping state!) • statements • Hash • External functions • to call hardware implementations • to call fast C implementations • If … then … else • For … break; … Rof • Return

Example application: dynamic ports 1. // R[0] stores no. of dynports found (initially 0) 2. IF (PKT.IP_PROTO==PROTO_TCP) THEN 3. IF (PKT.TCP_DPORT==554) THEN 4. M[R[0]]=EXTERN("GetDynTCPDPortFromRTSP",0,0); 5. R[0]++; 6. ELSE // compare pkt’s dst port to all ports in array – if match, return pkt 7. FOR (R[1]=0; R[1] < R[0]; R[1]++) 8. IF (PKT.TCP_DPORT == M[ R[1] ] ) THEN 9. RETURN TRUE; 10. FI 11. ROF 11. FI 12. FI 12. RETURN FALSE;

efficient userspace ? • reduced copying and context switches • sharing data • flowgraphs: sharing computations kernel ? network card ? ? x “push filtering tasks as far down the processing hierarchy as possible”

spread of SAPPHIRE in 30 minutes -process at lowest possible level-minimise copying -minimise context switching -freedom at the bottom Network Monitoring • Increasingly important • traffic characterisation, securitytraffic engineering, SLAs, billing, etc. • Existing solutions: • designed for slow networksor traffic engineering/QoS • not very flexible • We’re hurting because of • hardware (bus, memory) • software (copies, context switches)  demand for solution:-scales to high link rates - scales in no. of apps - flexible

UDP with CodeRed eth0 TCP SYN bytecount HTTP U RTSP TCP IP “contains worm” UDP RTP UID 0 generalised notion of flow Flow: “a stream of packets that match arbitraryuser criteria”  Flowgraph

Application B ‘filter’ reduce copying • FFPF avoids both ‘horizontal’ and ‘vertical’ copies Application A U K

Extensible • modular framework • language agnostic • plug-in filters (device,eth0) -> (sampler,2) -> (BPF,”..”) -> (packetcount) (device,eth0) | (device,eth1) -> (sampler,2) -> (FPL-2,”..”) | (BPF,”..”) -> (bytecount)

R O O O O O O O W Buffers • PacketBuf • circular buffer with N fixed-size slots • large enough to hold packet • IndexBuf • circular buffer with N slots • contains classification result + pointer

Buffers O O O • PacketBuf • circular buffer with N fixed-size slots • large enough to hold packet • IndexBuf • circular buffer with N slots • contains classification result + pointer O O O O W R

Buffers X X X • PacketBuf • circular buffer with N fixed-size slots • large enough to hold packet • IndexBuf • circular buffer with N slots • contains classification result + pointer X X O O W R

R1 Buffer management O O O O O O O O O O  what to do if writer catches up with slowest reader? • slow reader preference • drop new packets (traditional way of dealing with this) • overall speed determined by slowest reader • fast reader preference • overwrite existing packets • application responsible for keeping up • can check that packets have been overwritten • different drop rates for different apps O O O O O O R1 W

IF (PKT.IP_PROTO == PROTO_TCP) THEN // reg.0 = hash over flow fields R[0] = Hash (14,12,1024) // increment pkt counter at this // location in MBuf MEM[ R[0] ]++FI • simple to use • compiles to optimised native code • resource limited (e.g., restricted FOR loop) • access to persistent storage (scratch memory) • calls to external functions (e.g., fast C functions or hardware assists) • compiler for uspace, kspace, and nspace (ixp1200) Languages • FFPF is language neutral • Currently we support: • BPF • C • OKE Cyclone • FPL

packet sources • currently three kinds implemented -netfilter -net_if_rx() -IXP1200 • implementation on IXPs: NIC-FIX -bottom of the processing hierarchy -eliminates mem & bus bottlenecks uspace kspace nspace

zero copy copy once on-demand copy Network Processors “programmable NIC”

Performance results pkt loss: FFPF: < 0.5% LSF: 2-3%

Performance results pkt loss: LSF:64-75% FFPF: 10-15%

Performance

Summary concept of ‘flow’  generalised copying and context switching minimised processing in kernel/NIC  complex programs + ‘pipes’ FPL: FFPF Packet Languages  fast + flexible persistent storage  flow-specific state authorisation + third-party code any user flow groupsapplications sharing packet buffers

Thank you! http://ffpf.sourceforge.net/

microbenchmarks

CPU MEM NIC we’re hurting because of • kernel: too little, too often • vertical + horizontal copies • context switching • popular ones: not expressive • expressive ones: not popular • same for speed • no way to mix and match • hardware problems • software problems • language problems • lack of flexibility • network speed may grow too fast anyway • multiple apps problem • heterogeneous hardware • programmable cards, routers, etc. • no easy way to combine • build on abstractions that are too hi-level • hard to use as ‘generic’ packet processor • 10  40Gbps: what will you do in 10 ns? • beyond capacity of single processor

OS support for (flexible) high-speed networking

OS support for (flexible) high-speed networking

Presentation Transcript

Behavioural Support Systems

SISA CSL Omni-OS Project ER1 Report June 2011

OS support for Teraflux A Prototype

Flexible Support Fund

Efficient and Flexible Architectural Support for Dynamic Monitoring

OS Support for Virtualizing Hardware Transactional Memory

Skills for Life Support in the Workplace

Support services for employers companies

OS Support / Indexing

OS Support for Web Services

DB2 10 for z/OS Overview

Lecture III: OS Support

Lecture III: OS Support

OS Support for Detecting Trojan Circuit Attacks

t-kernel – Reliable OS support for WSN

t-kernel – Reliable OS support for WSN

Pneumatic Lumbar Support

Office IT Support Services in Singapore

DB2 10 for z/OS Overview

OS Support for Virtualizing Hardware Transactional Memory

Flexible Support Fund ‘Making a Difference’