1 / 58

Session E: Zoni

Session E: Zoni. Zoni. Richard Gass Intel. Sessions: (A) Intro 8.30-9.00 (B) Hadoop 9.00-10.00 Break 10.00-10.30 Hadoop 10.30-12:00 Lunch 12.00-1.30 Pig 1.30-2.00 (D) Tashi 2.00-3.00 Break 3.00-3.30

oralee
Télécharger la présentation

Session E: Zoni

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Session E:Zoni

  2. Zoni Richard Gass Intel

  3. Sessions: (A) Intro 8.30-9.00 (B) Hadoop 9.00-10.00 Break 10.00-10.30 Hadoop 10.30-12:00 Lunch 12.00-1.30 Pig 1.30-2.00 (D) Tashi 2.00-3.00 Break 3.00-3.30 Zoni 3.30-4.45 Wrap up 4.45-5.00 Overview Plans/Status User View Administration Installation Summary Agenda

  4. Overview

  5. Open Cirrus Stack Compute + network + storage resources Management and control subsystem Power + cooling Physical Resource set (Zoni) service Credit: John Wilkes (HP)

  6. Open Cirrus Stack Zoni clients, each with theirown “physical data center” Eucalyptus Tashi/HDFS NFS storage service Experiment Zoni service

  7. Open Cirrus Stack Virtual clusters Virtual cluster Virtual cluster Eucalyptus Tashi/HDFS NFS storage service Experiment Zoni service

  8. Open Cirrus Stack Application running On Hadoop On Tashi virtual cluster On Zoni On real hardware Web Service BigData App Hadoop Virtual cluster Virtual cluster Eucalyptus Tashi/HDFS NFS storage service Experiment Zoni service

  9. Zoni service Open Cirrus stack - Zoni • Initial PRS implementation from HP • Re-write from Intel (in collaboration • with HP) soon to be contributed to Apache Software Foundation • Zoni service goals • Provide mini-datacenters to users • Isolate mini-datacenters from each other • Zoni service approach • Allocate sets of physical co-located nodes, isolated inside VLANs. • Allow running without virtualization overhead • Necessary for predictable QoS • e.g. cache interference

  10. Goals • Reduce complexity in allocating physical resources • Gain User Confidence • Show users that we can efficiently allocate/deallocate resources • Stop the squatting • Incentives • HP’s tycoon (economic model) • Simple points scheme for good behavior or early return

  11. Isolate domains Provision system software Provide platform control On/Off Provide boot debug  VLAN  PXE IPMI  IPMI Responsibilities of Zoni

  12. VLAN • Virtual LAN technology allows a single physical network to appear as several isolated networks • Ethernet packets are tagged with a VLAN id • Switches and NICs enforce the policies associated with each VLAN • By associating Zoni domains with different VLANs, they can be isolated from each other • The Zoni system provides the interfaces necessary to abstract switch configuration programming across multiple switch vendors

  13. Pre- eXecution Environment PXE • Enables provisioning of OS image over the network • On machine boot, the NIC firmware contacts a PXE server via the DHCP process for the appropriate kernel and initrd to load • Once loaded, the init scripts in the initrd can pull the filesystem to the machine • In our environment, we download the desired filesystem to a ramdisk from a NFS server– enabling a very rapid provisioning (30 seconds or less) while leaving the host filesystem undisturbed

  14. Intelligent Platform Management Interface IPMI • Defines a standardized, abstracted, message-based interface to intelligent platform management hardware • Defines standardized records for describing platform management devices and their characteristics • Operates independently of the operating system • Enables cross-platform management

  15. Status/Plans

  16. Some History • Previous prototype developed at HP Labs • Focus on economic model • Nice web interface which will be available upon reconvergence of code

  17. Zoni Roadmap • Stage 1 • Manages all cluster hardware • Handles resource provisioning • Provides interfaces for VLAN definition/programming • Administrator is still in the allocation decision-making loop • Stage 2 • Introduces a request queue and primitive scheduler • Admin may still be in loop, definitely for special cases • Enables provisioning of OS to local disk • Enables virtual disk conversion to physical • Stage 3 • Incentives module added (Tycoon) • Tashi integration

  18. User View

  19. Zoni Roles • Admin: root of all authority • Controls the physical resources • User: requests domains • Controls the domain, once allocated

  20. Domains • A Domain is the unit of Zoni isolation • A simple domain is a set of compute nodes gathered into a single VLAN • Nodes are allocated from pools of available resources

  21. Zoni Domains * ISOLATION Domain 1 Services Server Pool 1 Gateway Domain 0 Services DNS PXE DHCP HTTP Domain 1 Domain 0 DNS PXE DHCP HTTP Server Pool 0 Server Pool 0

  22. The Zoni Interface • Users and Admins currently interact with the Zoni system through a command line interface • This interface both: • Queries and updates records in the Zoni database • Wraps the various commands that must be issued to effect changes in the cluster • Zoni is currently a centralized system; users log into the Zoni manager to issue commands • An RPC interface is planned for the near future

  23. Zoni Usage Usage: zoni <options> Standard options: --help [show this help message and exit] --version [show program's version number and exit] --verbose [be verbose] Common options: --nodeName <name> [Specify node] --switchPort <port> [Specify switchport switchname:portnum]

  24. Image Management Interface --addImage <img> [Add image to Zoni] --delImage <img> [Delete image]

  25. User Allocation Interface --createDomain <name> • May fail if name already exists --submitDomainRequest <name> --destroyDomain –domain <name> --requestNodes --domain <name> [--count <N>] [--nodeName <name>] [--cores <n> …] • Add the requested nodes to the domain --assignImage <kernel> <image> • Assign image to resource --associateNewVlan –domain <name> • Allocate an unused VLAN number to domain --createReservation <YYYYMMDD> <YYYYMMDD> • Specify duration of node reservation where start time may be “ASAP” --reservationNotes “notes” --updateReservation

  26. Admin Allocation Interface --allocateNode [Assign node to a user] --releaseNode [Release node allocation] --vlanIsolate <vlanid> [Specify vlan for isolation]

  27. Hardware Control --hardware [Make hardware call] --powerStatus [Get power status] --rebootNode [Reboot node (Soft)] --powerCycle [Power Cycle (Hard)] --powerOff [Power off node] --powerOn [Power on node]

  28. Query Interface --showReservations [Show current node reservations] --showResources [Show available resources to choose from] --procs <N> [Filter by number of processors] --clock <N> [Filter by processor clock] --memory <N> [Filter by amount of memory (Bytes)] --cpuflags “flags” [Filter by CPU flags] --cores <N> [Filter by number of cores] --showPxeImages [Show available PXE images to choose from] --showPxeImageMap [Show PXE images host mapping]

  29. Administration Interface --admin Enter Admin mode --addPxeImage [Add PXE image to database] --enableHostPort [Enable a switch port] --disableHostPort [Disable a switch port] --removeVlan <vlanId> [Remove vlan from all switches] --createVlan <vlanId> [Create a vlan on all switches] --addNodeToVlan <vlanId> [Add node to a vlan] --removeNodeFromVlan <vlanId> [Remove node from a vlan] --setNativeVlan <vlanId> [Configure native vlan] --restoreNativeVlan [Restore native vlan] --removeAllVlans [Removes all vlans from a switchport] --sendSwitchCommand “<command>” [Send Raw Switch Command, BE CAREFUL] --interactiveSwitchConfig “<switchname>” [Interactively configure a switch] --showSwitchConfig <nodename> [Show switch config for node]

  30. Administration

  31. Typical Workflow • Admin queries available systems • Admin requests systems with desired user configuration • i.e., cores, memory, image, duration, etc • Request goes in queue • Zoni locates resources and provides a list to admin/Tashi. • Admin/Tashi moves VMs to free resources • Add node to blacklist and tell hadoop to reload • Zoni allocates resources • Provides estimated time to get resources • User can query • Zoni sends notification when allocated • Zoni reclaims resources and adds them back into respective pools • User may extend time period before expiration

  32. System Servers Zoni client queries Zoni server for available resources User chooses machine attributes and submits a request for the resources for some time period Zoni queries DB to locate available resources VM VM VM VM VM Management Servers Results are sent back to the client VM VM VM VM VM VM VM VM DB VM VM VM VM Zoni server VM Node 1 : 8 Core, 16G memory, 6TB disk,30day Node 2 : 8 Core, 16G memory, 6TB disk,30 day Node 3 : 8 Core, 16G memory, 6TB disk,90 day Node 4 : 8 Core, 16G memory, 6TB disk,1 day Node 5 : 8 Core, 8G memory, 2TB disk, 90 day Node 6 : 8 Core, 8G memory, 2TB disk,90 day Node 7 : 8 Core, 8G memory, 2TB disk,90 day Node 8 : 8 Core, 8G memory, 2TB disk,90 day Node 9 : 8 Core, 8G memory, 2TB disk,90 day Node 10: 8 Core, 8G memory, 2TB disk,30 day … Tashi Cluster Manager VM VM VM VM VM VM Zoni client PXE server Administrator or Cluster Manager VM VM VM VM VM

  33. Request Queue System Servers VM VM VM VM VM Management Servers VM VM VM VM VM VM VM VM DB VM VM VM R1 VM Zoni server VM Tashi Cluster Manager VM VM VM VM VM VM Zoni client PXE server Administrator or Cluster Manager VM VM VM VM VM

  34. System Servers VM VM VM VM VM VM VM VM VM VM Management Servers VM VM VM Zoni processes request and identifies physical machines that satify the user request VM VM VM VM VM VM DB VM VM VM VM Zoni server VM Tashi Cluster Manager VM VM VM VM VM VM Zoni client PXE server Administrator or Cluster Manager VM VM VM VM VM

  35. System Servers VM VM VM VM VM Management Servers VM VM VM VM VM VM VM VM VM VM VM Zoni sends request to Tashi to free selected nodes VM VM DB VM VM VM VM VM Zoni server Tashi moves virtual machines off of selected nodes VM Tashi Cluster Manager VM VM VM VM VM VM Zoni client PXE server Administrator or Cluster Manager VM VM VM VM VM

  36. System Servers VM VM VM VM VM Management Servers VM VM VM VM VM VM VM Physical machines boot up with PXE image VM Zoni allocated the physical machines to the requested user and isolates them from the network using VLANs Zoni reboots the physical machine and sets PXE image to users VM DB VM VM VM VM Zoni server Tashi notifies Zoni that migration of virutal machines has completed VM VM VM Tashi Cluster Manager VM VM VM VM VM VM VM VM VM VM Zoni client PXE server PXE PXE PXE PXE Administrator or Cluster Manager VM VM VM Virtual disk image is converted to PXE image VM VM VM

  37. System Servers VM VM VM VM VM Management Servers VM VM PXE VM VM VM VM VM VM PXE DB Zoni updates reservation database VM PXE VM VM User connects to the machines and starts running experiments VM Zoni server VM VM VM Tashi Cluster Manager VM VM VM VM VM VM VM VM VM VM Zoni client Zoni client queries server for allocation PXE server Administrator or Cluster Manager VM VM VM VM VM VM

  38. After allocation • A returned Zoni node is typically untrusted • update the system to default settings • Clean physical node by PXE booting a reset image • Restore all setting to defaults (address, IPMI passwords) • Repartition and format disks • (Option) Trust images from some users • No re-format needed • Clean network configuration (VLAN)

  39. Example: Minicluster ./zoni –addimage amd64-rgass-testing:hardy:8.03 ./zoni –assignimage amd64-rgass-testing –nodename r1r1u25 ./zoni –allocatenode –nodename r1r1u25 –username rgass –reservationDuration 30 –vlanisolate 300 –notes “Practice allocation” ./zoni –addnodetovlan 300 –nodename r1r1u25 ./zoni –hardware –rebootnode –nodename r1r1u25

  40. Example: CloudConnect 1 • Network isolate a rack of machines and PXE boot them with a user’s kernel and initrd • Create a VM that acts as a SSH gateway and a NAT for the private cluster • Dynamically configure switches to support the networking experiment

  41. 100Mb/s Switch 100Mb/s Switch VLAN #1: Electrical Rack C region Rack A region Rack B region Rack D region Rack D Rack C Rack A Rack B M 1 Gb/s Switch M 4x1Gb trunk link VLAN #2: Optical - server - switch 4Gb/s Switch - manager M 1Gb/s Switch Example: CloudConnect 1 • Network isolate a rack of machines and PXE boot them with a user’s kernel and initrd • Create a VM that acts as a SSH gateway and a NAT for the private cluster • Dynamically configure switches to support the networking experiment

  42. Example: CloudConnect 2 for i in r1r1u12 r1r1u13 r1r1u14 r1r1u15;do ./zoni --admin --setnativevlan 300 -n ${i} ./zoni --admin --addnodetovlan 800 -n ${i} ./zoni --admin --addnodetovlan 801 -n ${i} ./zoni --admin --addnodetovlan 802 -n ${i} done ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface range ethernet g(25-28); spanning-tree disable" ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g25;switchport mode trunk;exit" ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g26;switchport mode trunk;exit" ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g27;switchport mode trunk;exit" ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g28;switchport mode trunk;exit“ ./zoni --admin --switchport sw0-r1r1:25 --setnativevlan 802 -v ./zoni --admin --switchport sw0-r1r1:26 --setnativevlan 804 -v ./zoni --admin --switchport sw0-r1r1:27 --setnativevlan 806 -v ./zoni --admin --switchport sw0-r1r1:28 --setnativevlan 808 -v for i in $(seq 12 16);do ./zoni --hardware --rebootnode -n r1r1u${i} done

  43. Future Work • Introduces a request queue and primitive scheduler • Enable provisioning of OS to local disk • Enables virtual disk conversion to physical • Integration with Tashi… • Would enable free exchange of resources between the Tashi pool and the free pool

  44. Installation

  45. Necessary Components • DHCP Server • PXE Server • NFS Server • DNS Server (optional) • Configurable switches • New switch types may require new Zoni modules • Hardware access method • E.g. IPMI /iLO/DRAC • IP-addressable PDUs enable rescue if IPMI becomes compromised

  46. Zoni Register * • Gather unique identifier from system • Mac Address / Dell Tag • Assign hostname (r1r2u24) • Switch/PDU info Example • J3GPGD r1r2u24 172.16.129.100 tashi_nm sw0-r1r2:9 pdu0-r1r2:18

More Related