Download
control update 2 phase 0 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Control Update 2, Phase 0 PowerPoint Presentation
Download Presentation
Control Update 2, Phase 0

Control Update 2, Phase 0

139 Vues Download Presentation
Télécharger la présentation

Control Update 2, Phase 0

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Control Update 2, Phase 0 Fred Kuhns

  2. Immediate Tasks/Milestones • Cross development environment for the xscale • Started with evaluation version of the montavista software. • To build missing libraries and software I installed our own cross-compiler and built missing systems and applications libraries. • Montavista: Due to our dependence on Radisys software and their dependence on montavista we (I) need to explore what it will take to obtain either their set of system headers or an official release. • Install local version of the Planetlab software • Planetlab central (PLC): maintains centralized database of all participants, physical nodes and slices. • Planetlab node software: interfaces with the PLC to obtain slice definitions, software repository, authentication and logging. • Status: Installed myPLC and created 2 local nodes. • myPLC continues to evolve though I have not kept up with the latest release. • myPLC: Early on the packet was not stable so once I got everything to built I stopped tracking changes. At some point we need to invest the time and energy to update our distribution.

  3. Immediate Tasks/Milestones • Build and test the drivers and applications provided by Intel, Radisys and IDT • currently all build, load and run. • the IDT code is a bit fragile and jdd continues to struggle with it. I have rewritten their system call wrappers to check for error returns from system calls and to verify parameters (so we don’t crash kernels). • John has found a problem with how they allocate memory in the kernel. I have not as of yet incorporated the IDT (TCAM) into my configuration utility. • Configure control processors (foghorn and coffee) for the Radisys boards. • Setup data hosts: There are two hosts currently configured to act clients (along with coffee and foghorn). • Install and configure as Planetlab hosts the two Intel general purpose processing boards • Todo.

  4. Immediate Tasks/Milestones • xscale utility library for managing NP state • Can read/write to any physical address in the system • Will verify addresses are valid • In some cases (not all) will also verify other attribute such as alignment and operation size (width of read/write operation) • Need to add support for TCAM (not needed for November as jdd has written utilites for configuing tcams) • Network processor configuration (from xscale) • Command interpreter (cmd): interpret system configuration scripts similar to those used in the simulation environment • debug utility: interactive or scripted read/write to physical memory • xscale Daemon (remote control) • Simple daemon implemented for reading/writing blocks of memory • Leverage core components from command interpreter • Uses xml-rpc for communication • debug utility, demo tool and test environment for interfaces • the guts of the daemon used to implemented the RLI interface • RLI interface daemon (initial version appears to work)

  5. For the November Demo • Command interpreter (utility) : Done • run module initialization scripts • debugging, read/write physical memory • XML-RPC Server for reading/writing to xscale physical memory. • Implementation and initial testing complete. • Can run either on the hardware (reading/writing physical memory) or in simulated environment on any host. For the latter case the library allocates an 8MB chunk of virtual memory to simulate the xscale env. • RLI interface daemon • uses xmlrpc to comm with xscale server and NCCP on hte RLI side. • Jyoti provided the template code ... she made it very easy for me! • Initial test worked with one caveat: • done.

  6. For Demo in November • Command line client for xscale server • debugging and simple (dynamic) configuration from GPE or CP hosts. • Packet generation • sp++ • No change since last presented to group • todo (Long term, not for November) sp++: • verify packet format • specify packet headers on command line/file • todo other: • tunnel create and maintenance for endsystems so we can use generic applications • at a higher level, we need a strategy for how we will connect to a meta-net and generate traffic for testing/demos

  7. Command Interpreter • Simple command interpreterBasic syntax: <expression> := <cmd_expr> (<EOL> | ‘;’) <cmd_expr> := <cmd> { <cmd_arg> } <cmd_arg> := <term_arg> {<infix_op> <term_arg>} <term_arg> := { <prefix_op> } <term_expr> {<postfix_op>} <term_expr> := <terminal> | <sub_expr> <sub_expr> := '(' <cmd_expr> ')‘ • Commands are either arithmetic expressions or some system defined operation (mem, vmem, set, etc.) • expr 3 + 5; expr $r * $t / 5; • But you need not explicitly include the expr command, if the command name is missing then expr is assumed. 3 + 5; $r * $t / 5; • Objects: • Arguments are objects and represent values in memory (physical memory) • Variable are names that refer to an object • An object is bound to a type and value when it is first assigned to. • Once bound to a type it may not change (but the value may). • The standard type conversions are supported. • Every operation returns an object, an object may or may not have a value • A variable/object if unassigned has no value (in a conditional it returns false).

  8. Supported types • Arguments: • variables ($name), immediate values and objects returned by a command. • An object is a scalar or vector of primitive types. • variable is defined when it appears in a set statement

  9. Example Command Script ## Loops while ( $indx < 4 ) { set result 0 # block scope if (($indx / 2) == 0) { $ipAddr[3] = $indx; $result = (doWrite $addr $ipAddr) $addr = $addr + 4 } else { $ethAddr[5] = $indx $result = (doWrite $addr $ethAddr) $addr = $addr + 6 } if ( $result == 0) { print "Write failed\n“ } $indx = $indx + 1 } # These would be errors: # $ethAddr = $ipAddr # print “Result = “ $result # Output: ./try.cmd # ethAddr { 55 44 33 22 11 00 }, ipAddr { c0 a8 00 04 } # XX Ethernet addr {01 02 03 04 05 06} # mem write 0x90000000 {c0 a8 00 00} # mem write 0x90000004 {c0 a8 00 01} # mem write 0x90000008 {55 44 33 22 11 02} # mem write 0x9000000e {55 44 33 22 11 03} #!/root/bin/cmd -f ## Comments: # a single ‘#’ char marks start of comment, # extends to end-of-line. ## Global scope, 6B Eth addr and 4B IP addr set ethAddr 55:44:33:22:11:00 set ipAddr 192.168.0.4 print “ethAddr " $ethAddr \ “, ipAddr " $ipAddr "\n" ## Block scope { ## alternative syntax for array of Bytes set ethAddr (dw1 1 2 3 4 5 6) print "XX Ethernet addr " $ethAddr "\n“ } ## Function definitions, run time binding ## of arguments, returning values defun doWrite addr obj { if ($addr == 0) { print "\tdoWrite: address is zero\n" return 0 # Return statements } print "mem write " $addr " " $obj "\n" return 1 } ## may use semicolon to terminate expression set indx (dw1 0); set addr 0x90000000;

  10. Remote Command Utility client [{--help|-h}] [{--lvl|-l} lvl] [{--serv|-s} url] [{--cmd|-c} cmd] [--kpa pa] [--type vt] [--cnt n] [--period n] [--inc v] --serv url: default url http://localhost:8080/RPC2 --cmd cmd: valid command list: get_version : get version of currently running server read_mem : read kernel physcial address space on server write_mem : write to kernel physcial address space on server --kpa pa: Kernel physical address to read/write --type vt: valType to read/write. Valid types: str : Character string char : Single character, elements of a string dbl : Double precision float, native format int : Signed integer, natural int for platform dw8 : Unsigned 8-Byte integer, same as uint64_t dw4 : Unsigned 4-Byte integer, same as uint32_t dw2 : Unsigned 2-Byte integer, same as uint16_t dw1 : Unsigned 1-Byte integer, same as uint8_t --cnt n: number of object of type vt to be read/written (default == 0) (-d|--data) x: Specify data for the memory write, should be last options --period n : Will perform a periodic update of an address. Units msec --inc v : Ammount to increment write each iteration (long)

  11. Software Location ${WUARL}/wusrc/ : wulib/ : General purpose C library (logging, high-res timers, comm, etc). There is a user space (libwu.a) and kernel space version (libkwu.a). wuPP/ : General purpose C++ library (messaging, exceptions, buffers, etc) Libraries: wulib cmd/ : Core command interpreter Library (scanning, parsing, evaluation) Libraries: libwu.a, libwupp.a ${WUARL}/IXP_Projects/xscale/ : mem/ : Configuration tool, adds memory operations to command interpreter. Libraries: cmd, wuPP, wulib ixpLib/ : Platform (7010) specific code for reading/writing memory. Opens dev file to communicate with the lkmwudev or in simulation mode allocates memory block to mimic platform’s environment. (wudev interface, kpa validation). Libraries: libwu.a wudev/ : Kernel loadable module, accepts commands from user space to manipulate kernel physical memory. Validates address and limited operation filtering. Libraries: libkwu.a • The cvs wu_arl repository (${WUARL})

  12. Software Location ${WUARL}/IXP_Projects/xscale/ : xrpcLib/: Common processing of XMLRPC messages for the xscale control daemon. Useful for clients and servers alike. Defines wrappers around the xmlrpc-c library. The remote client code is also located in this directory. Libraries: libwu.a, libwupp.a, libxmlrpc, ... ixpServer/ : The xscale control daemon. Uses the xmlrpc-c library for communication and the ixpLib for platform dependent operations. The xmlrpc-c library has a minimal http server that I use to support the web-based interface. Contains sample client code to test/verify server. Libraries: libwu.a, libwupp.a, ixpLib, libxmlrpc, ... • In a separate repository I use for courses and experimentation ${MYSRC}/src/ : misc/ : Test code for the planetlab node environment. A simple client and server for allocating UDP ports and passing open sockets between vservers. Libraries: libwu.a sp/ : This is the sp++ code. I still consider it experimental so I have kept it in this directory. It will soon move into the ${WUARL}/IXP_Projects/xscale/ directory. • Continued from previous page: cvs wu_arl repository (${WUARL})

  13. Software Location • xscale build environment: • for two reasons I have located the cross development environment on my desktop computer: 1) I need root access to filesystem and 2) arlfs’s NFS mounts routinely timeout causing some compiles to take an entire day (the corresponding local compile a few hours). • Root directory: /opt/crosstool. filesystem is backed up by cts • gcc version 3.3.X • All open source code compiled for xscale (and not modified by me) is located in /opt/crosstool/src. • If modified then I place in cvs under the “xscale” directory. • Reference xscale root filesystems /opt/crosstool/rootfs. • xscale control processors (foghorn and coffee) update their copies using rsync. • On CPs (foghorn and coffee) the xscale FS has symbolic link at /xscale. Place files in /xscale/xxx • On xscale’s currently executables we use are kept in /root/bin

  14. Next set of Tasks/Milestones • Dynamic node configuration • instantiate new meta-routers • modify lookup tables (substrate and MR owned) • modify data-plane tables • Integrate PlanetLab control and PLC database • slice definitions and user authentication • slice interface to control infrastructure: monitoring, table updates, communication between data-plane and control-plane. • Exception/Local deliver for IPv4 router • implement on GPE • control daemons for routing over meta-net • signaling and accounting • ARP support • phase 1: in LC for tunnels • phase 2: generic ARP engine for MR use • Node Resource management • creating/initializing new substrate node • allocating node resources for new meta-router (may be several meta-routers per meta-net on a node) • initializing and configuring allocated resources and enabling user access. • Provide per Meta-Net/Router accounting services (similar in spirit to PlanetLab’s accouting)

  15. Basic Slice Creation: No changes • Slice information is entered into PLC database. • Current: Node manager pools PLC for slice data. • Planned: PLC contacts Node manager proactively. • Node manager (pl_conf) periodically retrieves slice table. • updates slice information • creates/deletes slices • Node manager (nm) instantiates new virtual machine (vserver) for slice. • User logs into vserver using ssh • uses existing plab mechansism on GPE. NPE GPE root ctx per Slice contexts NM new Slice (Y) … X1 RM slice X Preallocated Ports (UDP) … … sys-sw vnet Eth1 Eth2 Ethernet Switch Eth3 Line card (NPE) Lookup table (TCAM) filter result TUNX VLANX Eth2 … default VLAN0 Eth1 Default configuration: forward traffic to the (single) GPE, in this case the user’s ssh login session.

  16. 1 4 3 2 Requesting NP • User requests shared-NP • Specify code option • Request UDP port number for overlay tunnel • Request local UDP port for exception traffic • Substrate Resource Manager • Configure SW: Assign local VLAN to new meta router. Enable VLAN on switch ports. • Configure NPE: allocates NP with requested code option (decision considers both current load and available options) • Configure LC(s): Allocate an externally visible UDP port number (from the preallocated pool of UDP ports for the external IP address). Add filter(s) • Ingress packet’s destination port –to- local (chassis) VLAN and MAC destination address • Egress IP destination address (??) –to- MAC destination address and RTM physical output port • Configure GPE: Open local UDP port for exception and local delivery traffic from NPE. Transfer local port (socket) and results to client slice GPE NPE root ctx per Slice contexts NM … X RM slice X Slice Y Preallocated Ports (UDP) … … sys-sw vnet Y Eth1 Eth2 Ethernet Switch VLANY Exception and local delivery traffic. Only need to install filter in TCAM. Eth3 Line card (NPE) Lookup table (TCAM) filter result TUNX VLANX Eth2 TUNY VLANY Eth2 … default VLAN0 Eth1 Meta-network traffic uses UDP tunnels. Only need to install filter in TCAM.

  17. Configure Ethernet Switch: Step 1 • Allocate next unused VLAN id for meta-net. • In this scenario can a meta-net have multiple meta-routers instantiated on a node? • If so then do we allocate switch bandwidth and a VLAN id for the meta-net or for each meta-router? • Configure Ethernet switch • enable VLAN id on applicable ports • need to know line card to meta-port (i.e. IP tunnel) mappings • if using external GigE switch then use SNMP (python module pysnmp) • if using Radisys blade then use SNMP??? • set default QoS parameters, which are??? • other ??

  18. Configure NPE: Step 2 • vlan table: • code option and instance number • memory for code options • instance: base address, size and index/instance • each instance is given an instance number to use for indexing into a common code option block of memory • each code option is assigned a block of memory • code option: base address and size. Also Max number of instances that can be supported. • Select NPE to host client MR • Select eligible NPEs (those that have the requested code option) • Select best NPE based on current load and do what??? • Configure NPE • Add entry to SRAM table mapping VLAN:PORT to MR instance • What does this table look like? • Where is it? • Allocate memory block in SRAM for MR. • Where in SRAM are the eligible blocks located? • How do I reference the block? • 1) allocate memory for code option at load time 2) allocate memory dynamically • Allocate 3 counter blocks for MR • where are the blocks? • How are they referenced (i.e. named)? Using VM/PM address on NP? • Configure MR instance attributes • What attributes are needed by the different code options? • Tunnel header fields; Exception/local delivery IP header fields, QID, physical Port#; Ether ssrc of NPE??? • Set default QM QIDs, weights and number of queues? • ??

  19. Configure LC(s): Step 3 • User may request specific UDP port number • Open UDP socket (on GPE) • open socket and bind to external IP address and UDP port number. This prevents other slices or the system from using selected port • Configure line card to forward tunnel(s) to correct NPE and MR instance • Add ingress and egress entries to TCAM • how do I know IP–to-Ethernet destination address mapping for egress filter? • For both ingress and egress allocate QID and configure QM with rate and threshold parameters for MR. • Do I need to allocate a Queue (whatever this means)? • Need to keep track of qid’s (assign qid when create instance etc) • For egress I need to know the output physical port number. I may also need to know this for ingress (if we are using external sw).

  20. Configuring GPE: Step 4 • Assign local UDP port to client for receiving exception and local delivery traffic. • user may request specific port number. • use either a preallocated socket or open a new one. • use UNIX domain socket to pass socket back to client along with other results. • all traffic will use this UDP tunnel, this means the client must perform IP protocol processing of encapsulated packet in user space. • for exception traffic this makes sense. • for local delivery traffic the client can use a tun/tap interface to send packet back into Linux kernel so it can perform more complicated processing (such as TCP connection management). Need to experiment with this. • should we assign a unique local IP address for each slice? • Result of shared-NPE allocation and socket sent back to client.

  21. Run-Time Support for Clients • Managing entries in NPE TCAM (lookup) • add/remove entry • list entries • NPE Statistics: • Allocate 2 blocks of counters: pre-queue and post-queue. • clear block counter pair (Byte/Pkt) ??? • get block counter pair (Byte/pkt) • specify block and index • get once, get periodic • get counter group (Byte/pkt) • specify counter group as set of tuples: {(index, block), …} • SRAM read/write • read/write MR instance specific SRAM memory block • relative address and byte count, writes include new value as byte array. • Line card: Meta-interface packet counters, byte counters, rates and queue thresholds • get/set meta-interface rate/threshold • Other • Register next hop nodes as the tuple (IPdst, ETHdst), where IPdst is the destination address in the IP packet. The ETHdst is the corresponding Ethernet address. • Can we assume the destination ethernet address is always the same? • Issue: how do we map this to LC and physical interface? We need this information to configure output TCAM entries on line cards.

  22. Boot-time Support • Initialize GPE • Initialize NPE • Initialize LC • things to init • spi switch • memory • microengine code download • tables?? • default Line card tables • default code paths • TCAM

  23. IP Meta Router: Control • All meta-net traffic arrives via a UDP tunnel using a local IP address. • raw IP packets must be handled in user space. • complete exception traffic processing in user space. • local delivery traffic: can we inject in Linux kernel so it performs transport layer protocol processing? This would also allow application to use the standard socket interface. • should we use two different IP tunnels, one for exception traffic and one for local delivery? • Configuration responsibilities? • Stats monitoring for demo? • get counter values • support for traceroute and ping • ONL -like monitoring tool • Adding/removing routes: • static routing tables or do we run a routing protocol?

  24. IP-Meta Router • Internal packet format has changed. • see Jing’s slides • Redirect: not in this version of the meta-router

  25. XScale Control Software • Substrate Interface • Raw interface for reading/writing arbitrary memory locations. • substrate stats? • add new meta-router • Meta-router/Slice interface • all requests go through a local entity (managed) • not needed: authenticate client • validate request (verify memory location and operation) • Node Initialization • ??

  26. Virtual Networking – Basic Concepts Substrate Links interconnect adjacent Substrate Routers Substrate Router One or more Meta Router instances Meta Links interconnect adjacent Meta Routers. Defined within substrate link context substrate links may be Tunneled within existing networks: IP, MPLS, etc.

  27. Adding a Node Install new substrate router Define meta-links between meta nodes (routers or hosts) Create substrate links between peers Instantiate meta router(s)

  28. System Components • General purpose processing engines (PE/GP). • Shared: PlanetLab VM environment. • Local Planetlab node manager to configure and manager VMs • vserver, vnet may change to support substrate functions • Implement substrate functions in kernel • rate control, mux/demux, substrate header processing • Dedicated: no local substrate functions • May choose to implement substrate header processing and rate control. • Substrate uses VLANs to ensure isolation (VLAN == MRid) • Can use 802.1Q priorities to isolate traffic further. • NP blades (PE/NP). • Shared: user supplies parse and header formatting code. • Dedicated: User has full access to and control over the hardware device • General Meta-Processing Engine (MPE) notes: • Use loopback to enforce rate limits between dedicated MPEs • Legacy node modeled as dedicated MPE, use loopback blade to remove/add substrate headers. • Substrate links: Interconnect substrate nodes • Meta-links defined within their context. • Assume an external entity configures end-to-end meta-nets and meta-links • Substrate links configured outside of the node manager’s context

  29. Block Diagram of a Meta-Router Control/Management using Base channel (Control Net: IPv4) Meta Interfaces (MI): MI connected to meta-links 1G 2G 1G .5G 1G .5G 0 1 2 3 4 5 MPEk1 MPEk2 MPEk3 data path data path control .1G .1G 3G 3G .1G .1G MPEs interconnected in data plane by a meta-switch. Packet includes Meta-Router and Meta-PE identifier Some Substrate detected errors or events reported to Meta-Router “control” MPE. Meta Switch Meta-Router • Meta-Processing Engines (MPE): • - virtual machine, COTS PC, NPU, FPGA • - PEs differ in ease of “programming” and performance • - MR may use one or more PEs, with possibly different types The first Meta-Processing Engine (MPE) assigned to Meta-Network MNetk called MPEk1

  30. System Block Diagram RTM RTM 10 x 1GbE PE/NP PE/NP PE/GP PE/GP LC LC PCI GP CPU xscale xscale xscale xscale … … … NPU-B NPU-B NPU-A NPU-A TCAM GbE interface 2x1GE 2x1GE X X Fabric Ethernet Switch (10Gbps, data path) Base Ethernet Switch (1Gbps, control) I2C (IPMI) map VLANX to VLANY Node Server Loopback user login accounts Node Manager Shelf manager

  31. Top-Level View (exported) of the Node PE/NP (control, IPaddr) (platform, IXP2800) (type, IXP_SHARED) … PE/GP (control, IPaddr) (platform, x86) (type, linux_vserver) … S-Link (type, p2p) (peer, _Desc_) (BW, XGbps) … … … … PE/NP (control, IPaddr) (platform, IXP2800) (type, IXP_DEDICATED) … PE/GP (control, IPaddr) (platform, x86) (type, dedicated) … S-Link (type, p2p) (peer, XXX) (BW, XXGbps) … Exported Node Resource List (Processing engines, Substrate Links) Node Server Substrate Control user login accounts Node Manager

  32. MNetk Control and Management Plane MNetk Data Plane MPEk1 MPEk2 MPEk3 VLANk MNetk MI4 MNetk MNetk MNetk MI0 MI1 MI2 MI3 Substrate: Enabling an MR Allocate control-plane MPE (required) Meta-Router MR1 for MNetk Update host with local Net gateway Allocate data-plane MPEs Host (located within node) Enable VLANk on fabric switch ports PE PE PE 2 1 0 3 local Enable control over Base switch (IP-based) 4 10GbE (fabric) loopback 5 6 7 Update shared MPEs for MI and inter-MPE traffic … LC LC Line card Substrate Use loopback to define interfaces internal to the system node. Define Meta-Interface mappings

  33. Lookup table Lookup table map to Port, Meta Link pair map to Port Meta Link pair … … … … … … … Lookup table map to MR:MI … Block Diagram map received packet to MR and MI Each MR:MI pair is assigned its own rate controlled queue Line Card Line Card Lookup table Shared PE map to MR:MI MR1 … MR2 MR5:MI1 Dedicated PE MR3 Line Card Line Card Fabric Switch Fabric Switch Shared PE/NP MR4 … MR5 1 1 2 2 Meta-Interfaces are rate controlled Shared PE/GP “VM” manager VMM Node Server meta-router Meta-net control and management functions (configure, stats, routing etc). Communicate with MR over separate base switch. Internet Node M. VMM? meta-net5 control Base switch (control) ‘slice’/MN VMs? App-level service

  34. Partitioning the Control plane • Substrate manager • Initialization: discover system HW components and capabilities (blades, links etc) • Hides low level implementation details • Interacts with shelf manager for resetting boards or detecting failures. • Node manager • Initialization: request system resource list • Operational: Allocate resources to meta-Networks (slice authorities?) • Request substrate to reset MPEs • Substrate assumptions: • All MNets (slices) with a locally defined meta-router/service (sliver) have a control process to which it can send exception packets and event notifications. • Communication: • out-of-band uses Base interface and internal IP addresses • in-band uses data plane and MPE id. • Notifications: • ARP errors, Improperly formatted frame, Interface down/up, etc. • If meta-link is a pass-through link then the Node manager is responsible for handling meta-net level errors/event notification. For example link goes down.

  35. Initialization: Substrate Resource Discovery • Creates list of devices and their Ethernet Addresses • Network Processor (NP) blades: • Type: network-processor, Arch: ixp2800, Memory: 768MB (DRAM), Disk: 0, Rate: 5Gbps • General Processor (GP) blades: • Type: linux-vserver, Arch: X, Memory: X, Disk: X, Rate: X • Line Card blades: • not exposed to node manager, used to implement meta-interfaces • another entity creates substrate links to interconnect peer substrate nodes. • create table mapping line card blades, physical links and Ethernet addresses. • Internal representation: • Substrate device ID: <ID, SDid> • If device has a local control daemon: <Control, IP Address> • Type = Processing Engine (NP/GP): • <Platform, (Dual IXP2800|Xeon|???)>, <Memory, #>, <Storage, #> <Clock, (1.4GHz|???)> <Fabric, 10GbE>, <Base, 1GbE>, ??? • Type = Line Card • <Platform, Dual IXP2800> <Ports, {<Media, Ethernet>, <Rate, 1Gbps>}>, ??? • Substrate Links • <Type, p2p>, <Peer, Ethernet Address>, <Rate Limit>, … • Met-Link list <MLid, MLI>, <MR, MRid>, …

  36. Initialization: Exported Resource Model • List of available elements • Attributes of interest? • Platform: IXP2800, PowerPC, ARM, x86; Memory: DRAM/SRAM; Disk: XGB; Bandwidth: 5Gbps; VM_Type: linux-vserver, IXP_Shared, IXP_Dedicated, G__Dedicated; Special: TCAM • network-processor: NP-Shared, NP-Dedicated • General purpose: GP-Shared (linux-vserver), GP-Dedicated • Each element is assigned an IP address for control (internal control LAN) • List of available substrate links: • Access networks (expect Ethernet LAN interface): substrate link is multi-access • Attributes: Access: multi-access, Available Bandwidth, Legacy protocol(s) (i.e. IP), Link protocol (i.e. Ethernet), Substrate ARP implementation. • Core interface: assume point-to-point, Bandwidth controlled • Attributes: Access: Substrate; Bandwidth, Legacy protocol?

  37. Instantiate a router: Register Meta-Router (MR) • Define MR specific Meta-Processing Engines (MPE): • Register MR ID MRidk with substrate • substrate allocates VLANk and binds to MRidk, • Request Meta-Processing Engines • shared or dedicated, NP or GP, if shared then relative allocation (rspec) • shared: implies internal implementation has support for substrate functions • dedicated w/substrate: user implements substrate functions. • dedicated no/substrate: implies substrate will remove any substrate headers from data packets before delivering to MPE. For legacy systems. • indicate of this MPE is to receive control events from substrate (Control_MPE). • substrate returns MPE id (MPid) and control IP (MPip) address for each allocated MPE • substrate internally records Ethernet address of MPE and enables VLAN on applicable port • substrate assumes that any MPE may send data traffic to any other MPE • MPE specifies target MPE rather then MI when sending packet.

  38. Instantiate a router: Register Meta-Router (MR) • Create meta-interfaces (with BW constraints) • create meta-interfaces associated with external substrate links • request meta-interface id (MIid) be bound to substrate link x (SLx). • we need to work out the details of how a SL is specified • We need to work out the details of who assigns inbound versus outbound meta-link identifiers (when they are used). If downstream node then the some entity (node manager?) reports the outgoing label. This node assigns the inbound label. • multi-access substrate/meta link: node manager or meta-router control entity must configure meta-interface for ARP. Set local meta-address and send destination address with output data packet. • substrate updates tables to bind MI to “receiving” MPE (i.e. were substrate sends received packets) • create meta-interfaces for delivery to internal devices (for example, legacy Planetlab nodes) • create meta-interface associated with an MPE (i.e. the endsystem)

  39. Scenarios • Shared PE/NP, send request to device controller on the XScale • Allocate memory for MR Control Block • Allocate microengine and load MR code for Parser and Header Formatter • Allocate meta-interfaces (output queues) and assign Bandwidth constraints • Dedicated PE/NP • Notify device control daemon that it will be a dedicated device. May require loading/booting a different image? • Shared GP • use existing/new PlanetLab framework • Dedicated GP • legacy planetlab node • other

  40. IPv4 • Create the default IPv4 Meta-Router, initially in the non-forwarding state. • Register MetaNet: output Meta-Net ID = MNid • Instantiate IPv4 router: output Meta-Router ID = MRid • Add interfaces for legacy IPv4 traffic: • Substrate supports defining a default protocol handler (Meta-Router) for non-substrate traffic. • for protocol=IPv4, send to IPv4 meta-router (specify the corresponding MPE).

  41. General Control/Management • Meta routers use Base channel to send requests to control entity on associated MPE devices • Node manager sends requests to central substrate manager (xml-rpc?) • request to both configure, start/stop and tear down meta-routers (MPEs and MIs). • Substrate enforces isolation and policies/monitors meta-router sending rates. • Rate exceeded error: If MPE violates rate limits then its interface is disabled and the control MPE is notified (over Base channel).. • Shared NP • xscale daemon • requests: start/stop forwarding; Allocate shared memory for table; Get/set statistic counters; Set/alter MR control lock; Add/Remove lookup table entries. • Lookup entries can be added to send data packets to control MPE, packet header may contain tag to indicate reason packet was sent • mechanism for allocating space for MR specific code segments. • dedicated NP • MPE controls XScale. When XScale boots a control daemon si told to load a specific image containing user code.

  42. ARP for Access Networks • The substrate offers an ARP service to meta-routers • Meta-router responsibilities: • before enabling interface must register its meta-network address associated with meta-interface • send destination (next-hop) meta-net address with packets (part of substrate internal header). Substrate will use arp with this value. • if meta-router wants to use multicast or broadcast address then it mus also supply the Link layer destination address. So the substrate must also export the Link layer type. • substrate responsibilities • all substrate nodes on an access network must agree on meta-net identifiers (MLIs) • Issues ARP requests/responses using supplied meta-net addresses and met-net id (MLI). • maintain ARP table and timeout entries according to relevant rfcs. • ARP Failed error: If ARP fails for a supplied address then substrate must send packet (or packet context) to control MPE of meta-router.