270 likes | 285 Vues
This overview provides context and explores the pros and cons of user and kernel space protocols for network virtualization. It also proposes a kernel-level model and discusses the Ethernet example using VLANs.
E N D
Endsystem Support for Network Virtualization Fred Kuhns
Overview • Context • Endsystem networking model • Protocol instances: user or kernel space • pros and cons • explore user space protocols • propose kernel level model
Context: Virtual (Diversified) Networking substrate link substrate router virtual router virtual link virtual end-system
ethernet switched LAN Simulates Star Topology for Substrate Links … VLANX1 VLANX2 VLANXN • Internetworking over a diversified network • Ethernet example: • VLANs are used to provide the equivalent of a virtualized “wire” connecting an endsystem to a specific access router. • All vnets on an endsystem share common VLAN • Use priority queuing (802.1P/Q) to isolate vnet traffic. • Use admission control (static or dynamic) to provide bandwidth guarantees to vnet traffic. • Substrate layer on endsystems enforce per VLAN and per vnet bandwidth constraints vNetX VR1 • Each host to substrate router connection is assigned a distinct VLAN. So N hosts implies N VLANs on ethernet. • Alternative is to define one VLAN tree for each protocol suite (i.e. vnet).
Low High Low High Low High Low High vNetX VR1 vnetX traffic uses high priority queues … Ethernet Hub with High and Low Priority TX queues
ethernet switched LAN Substrate Link as a VLAN Tree … VLANX • One VLAN is used for all virtual net traffic to/from a substrate router.
ethernet switched LAN Multiple Substrate Links … • Three VLANs are used for all virtual net traffic to/from a substrate router. • Corresponds to 3 substrate links: • Low priority: default for best-effort traffic • Medium priority for virtual nets with soft performance requirements (average bandwidth) • High priority for isochronous or low-delay, interactive applications VLANdgram VLANhigh VLANmed
VLI VLI VLI Multiple vNets per Host … ether addr/vlan ether addr/vlan ether addr/vlan vlan 1 vlan 2 vlan 3 • Substrate link: serves to connect an endsystem to a substrate router. Virtualization of a physical cable or wire. A packet enters one end, exists the other and is opaque within. Simplex or Duplex? • Substrate interface: (need better term?) endsystem abstraction representing a substrate link. • Ethernet: <interface, VLAN, dest>. • Could be an IP tunnel • Not required to be point-to-point. • Virtual link: represents the logical interconnection of adjacent network nodes for a given protocol suite. • Point-to-point. Simplex or Duplex? • Virtual interface: endsystem abstraction representing one end of a virtual link. Substrate defines mechanism for multiplexing onto common substrate link. For example a virtual link identifier (VLI) in a substrate header. Simplex or Duplex? ethernet LAN filter on ethernet address and vlan membership for substrate router
Multiple next hop VRs? Host A on vnetX vNetX VR2 vNetX VR3 VLANXA2 VLANXA3 ethernet switched LAN • Not a fundamental part of the model but it is consistent with the current model used for TCP/IP in endsystem. • Allows us to implement TCP/IP as a virtual net protocol and not change the basic model VLANXA1 vNetX VR1
VLI VLI VLAN VLI TCP/IP as an Example Protocol … IP Route Table standard ethernet Interface vint0 (eth0 + VLANX) LL Info = SR1 addr + VLI ethernet device direct connect VLANX ethernet LAN Substrate Interface: Ethernet interface. Destination address by ARP. Directly connected: destination IP address + ARP = enet addr Gateway: (Gateway’s IP + ARP = enet addr) + VLAN Virtual Interface: Directly connected: Not used, model only for internetworking Gateway: VLI assigned by substrate. ethernet dest. addr Substrate Router SR1 IP
File Interface ops TCP module … TCP1 TCP2 TCPn FS management Basic I/O Interface RAW IP UDP open files buffer cache tasks device driver txqueue rxqueue OS Kernel Block Diagram User Space (Applications) Socket Interface ops AST Processing callback routes IP task management SW int (AST) util TCP TC/ AST qdisc poll scheduler callout Q hardware independent layer clock handler process accounting scheduling time management Device independent I/O ethernet Interrupt Processing hardware dependent layer configuration: registers, MMU (TLB, cache, VM) bus and peripherals System Exception handlers eth0 uart timer OS ISR demux Hardware HW interrupt/Exception
User or kernel Space protocols? • Each has pros and cons • User space protocols: • easier to implement and debug • easier to introduce new protocols (not tightly dependent on socket layer knowing about the new protocol) • easier to isolate and protect protocols and apps from each other (leverage process model) • kernel level protocols • easier to integrate into existing framework (simplifies support for system interface functions like select/poll) • simplifies intra-protocol security and protection (since protocol runs within trusted kernel) • simplifies kernel demultiplexing to correct protocol context (endpoint) • increased efficiency
User Space Protocol Implementation • Uncommon outside of high-performance community, they want zero-copy and specialized demux keys. • Problems: asynchronous processing, life cycle, authentication and demiultiplexing to endpoints • latency in delivering packets (i.e. acks) to user space • increased overhead in per packet processing before a drop/keep decision is made • processing received acks • timeouts and retransmissions • establishing connections and security: snooping, masquerading • supporting select and poll • protocols where connection may outlive process (TCP’s TIMED_WAIT) • global routing and address resolution tables • global connection tables • need to know what other ports are being used (locally) • accepting/rejecting new connections
Assumptions • Assumptions: • Applications using different VNs (or no VN) will need to communicate using the various IPC mechanisms • We want to manage all aspects of Network I/O but not the use of other traditional resources (memory, files etc) • CPU, memory and interface bandwidth controlled at the virtual net granularity • intra-VN, implementers should have the mechanisms to support QoS and Security • simple mechanism for adding new protocols/VNs
User Space Protocols • Chandramohan A. Thekkath , Thu D. Nguyen , Evelyn Moy , Edward D. Lazowska, Implementing network protocols at user level, IEEE/ACM Transactions on Networking (TON), v.1 n.5, p.554-565, Oct. 1993 • Chris Maeda, Brian Bershad, Protocol Service Decomposition for High-Performance Networking, Proceedings of the 14th ACM Symposium on Operating Systems Principles. December 1993, pp. 244-255. • Aled Edwards , Steve Muir, Experiences implementing a high performance TCP in user-space, Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, p.196-205, 1995 • Kieran Mansley, Engineering a User-Level TCP for the CLAN Network, Proceedings of the ACM SIGCOMM workshop on Network-I/O convergence: experience, lessons, implications, Pages: 228 – 236, 2003
user-space protocols: Global Issues • Routing: Direct packets to/from correct endpoint/interface • How is traffic demultiplexed and sent to the correct endpoint/process? • In-kernel filters • Where are the routing tables and how are they maintained? • route fixed when connection established or located in shared memory • Control: I use IPv4 as an example • Address resolution protocols/tables? • Other control protocols. For example ICMP, IGRP, others? • Where are the routing protocols implemented? • Management: • Must manage a protocols namespace (for example, port numbers in IPv4). • Common programming technique, allow protocol instance to select local address part • specify port = 0 and addr = 0 then implementation will assign correct values • Passive connect model? • In IPv4 a server listens on a port (host:port:proto) for a connection request. To establish a connection a unique (to the endsystem) port number is assigned and new socket allocated. • socket-oriented system calls must be supported. On UNIX must support non-blocking I/O with select and poll. • Connection lifetime may outlast process. • For example TCP TIME_WAIT or simply waiting for a final ack or resending if no ack received. • Security: we must provide sufficient mechanisms for protocol developers • implementations must be able to guard against masquerading and eavesdropping
User Space: Configurations • Given these global issues there are two likely configurations: • all traffic passes through common protocol daemon in user space • control daemon implements basic set of control functions while user library implements majority of data path functions • prior work has shown the latter approach to be superior. • Having all traffic pass through a common protocol daemon => at least one extra copy operation (kernel -> daemon -> user process) • A better solution is for a daemon to insert relatively simple packet filters in kernel for established connections which directs packets to/filters packets from endpoints.
application vnetX: protocol library User-Space: Passive Open 0. listen/accept (passive open) vnetX control daemon: (namespace, lifecycle, connections) 4. new connection data copy socket layer 3. insert incoming and outgoing filters for vnetX connection 1. connection request (in) 5. data, established connections compare against connection specific outgoing filter 2. ack (out) vnet demux connection filters use VLI to access incoming filters and use to demux to filter set and/or socket. ethernet
application vnetX: protocol library User-Space: Active Open 0. connect vnetX control daemon: (namespace, lifecycle, connections) 4. new connection data copy socket layer 1. connection request (out) 3. insert incoming and outgoing filters for vnetX connection 5. data, established connections compare against connection specific outgoing filter 2. ack (in) vnet demux connection filters use VLI to access incoming filters and use to demux to filter set and/or socket. ethernet
application vnetX: protocol library User-Space: Datagram (Connectionless) daemon fills in local address and binds to socket. No restrictions on destination 0. open(any) vnetX control daemon: (namespace, lifecycle, connections) data copy 2. new connection (local address) socket layer 1. insert incoming and outgoing filters for vnetX connection 3. data established connections compare against “connection” specific outgoing filter vnet demux connection filters use VLI to access incoming filters and use to demux to socket. In this case only the local part is used. ethernet
application vnetX: protocol library User-Space: Datagram (Connectionless) daemon fills in both local and destination addresses. Destination restricted 0. open(local and remote addr) vnetX control daemon: (namespace, lifecycle, connections) 2. new connection(local and remote) data copy socket layer 1. insert incoming and outgoing filters for vnetX connection 3. data established connections compare against “connection” specific outgoing filter vnet demux connection filters ethernet use VLI to access incoming filters and use to demux to socket.
application vnetX: protocol library User-Space: App exits TCP enters TIME_WAIT after close vnetX control daemon: (namespace, lifecycle, connections) socket layer 3. remove filters 1. connection close (out) 2. ack (in/out) vnet demux connection filters ethernet drop
Extensible protocol frameworks in the kernel • Herbert Bos, Bart Samwel, Safe Kernel Programming in the OKE, Proceedings of the fifth IEEE Conference on Open Architectures and Network Programming, June 2002
OKE • Context: For performance reasons it is useful to permit third parties to load optimized modules into the kernel • Problem: Third party code is untrusted so loading into kernel will compromise system security and reliability. Could use safe execution environment like java but incurs expensive runtime checks. • Solution: create set of mechanisms and policies to permit non-root users to safely load untrusted application modules into kernel space with minimal impact on runtime performance. • Safety: use a trusted compile to enforce policies (constraints). The constraints are designed to ensure the untrusted module will not adversely affect the kernel (core and loadable modules) or unrelated processes. • Userprivileges: Vary enforced constraints based on user privileges (customizable language) • Termination: well defined termination boundaries to protect system state • Enforcement: Static and dynamic checks; language extensions • Ease of use: Familiar development environment using Cyclone (type safe, C extension) and kernel module. • Contribution: definition of safe kernel programming environment that meets competing needs: • performance • safety • ease of use • hosted in a commodity OS
Considerations • Identified areas where modules may impact system behavior • program correctness: language restrictions for safety and enforce coding conventions • Memory access: static and dynamic enforcement of memory access rules • Kernel module access: static and dynamic enforcement of kernel module (interface) access restrictions • Resource usage: Bounded (deterministic or limited)
Pushing protocols into the Kernel • Positives: • All the issues associated with user-space protocol simply go away. Global tables and lifetime of the kernel • Performance, efficiency, existing code base • Enhances intra-Protocol security • Simplifies integration with existing network I/O subsystems and interfaces • Negatives: • Isolation: More difficult to isolate system from protocol instances. Inter-protocol isolation difficult. • Security: Proving trust/security more difficult • Implementation and debugging more difficult in kernel
ops File Interface PF_VNET PF_INET FS management Socket Interface I/O Interface open files buffer cache Socket I/O Interface vnet ops vnet Proto vnet Proto state tables state tables ethetnet vnet Demux eth device driver eth0 VLAN Kernel-Space Protocols Rework! Application(s) /dev/protoX /dev/vnet User Space (Applications) … vnet:ep vnet:ep tcp:port udp:port rawIP … TCP vnet RAW IP UDP TCP1 TCP2 … TCPn TCP/IP … IP route to interface routes SW Interrupt HW Interrupt Hardware HW interrupt/Exception