1 / 58

FermiCloud Update EGI Technical Forum 2012

FermiCloud Update EGI Technical Forum 2012. Keith Chadwick Grid & Cloud Computing Department Fermilab. Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359. Outline. Personnel Mission, Strategy, Project, Phases Economic Model

Télécharger la présentation

FermiCloud Update EGI Technical Forum 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FermiCloud UpdateEGI Technical Forum 2012 Keith Chadwick Grid & Cloud Computing Department Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359

  2. Outline • Personnel • Mission, Strategy, Project, Phases • Economic Model • Authenticationand Contextualization • Fault Tolerance and High Availability (HA) • Monitoring, Accounting • Grid and Cloud “Bursting” • Virtualized MPI • Stakeholders • Summary FermiCloud - EGI Technical Forum 2012

  3. Fermilab Grid & Cloud Computing Department Keith Chadwick (Department Head) Gabriele Garzoglio (Associate Head) FermiGrid Services Steven C. Timm (Leader) Hyunwoo Kim (Visitor - KISTI) Kevin Hill (OSG Security) Faarooq Lowe (departed 30-Mar) Seo-Young Noh (Visitor – KISTI) Neha Sharma Karen Shepelak Daniel R. Yocum (departed 21 Jun) New Hire (starting 8 Oct) Posted Opening (soon) Distributed Offline Computing Services Gabriele Garzoglio (Leader) David Dykstra Giovanni Franzini (Summer Student) Ted Hesselroth (departed Sep ‘11) Hyunwoo Kim Tanya Levshina Parag Mhashilkar Marko Slyz Douglas Strain FermiCloud - EGI Technical Forum 2012

  4. FermiCloud – Mission, Strategy, Goals & History • As part of the FY2010 activities, the (then) Grid Facilities Department established a project to implement an initial “FermiCloud” capability with the goal of developing and establishing Scientific Cloud capabilities for the Fermilab Scientific Program, • Building on the very successful FermiGrid program that supports the full Fermilab user community and makes significant contributions as members of the Open Science Grid Consortium. • In a (very) broad brush, the mission of FermiCloud is: • To deploy a production quality Infrastructure as a Service (IaaS) Cloud Computing capability in support of the Fermilab Scientific Program. • To support additional IaaS, PaaS and SaaS Cloud Computing capabilities based on the FermiCloud infrastructure at Fermilab. • This project is split over several overlapping phases. FermiCloud - EGI Technical Forum 2012

  5. Overlapping Phases Phase 1: “Build and Deploy the Infrastructure” Today Phase 2: “Deploy Management Services, Extend the Infrastructure and Research Capabilities” Phase 3: “Establish Production Services and Evolve System Capabilities in Response to User Needs & Requests” Time Phase 4: “Expand the service capabilities to serve more of our user communities” FermiCloud - EGI Technical Forum 2012

  6. FermiCloud Phase 1:“Build and Deploy the Infrastructure” • Specify, acquire and deploy the FermiCloud hardware, • Establish initial FermiCloud requirements and select the “best” open source cloud computing framework that best met these requirements (OpenNebula), • Deploy capabilities to meet the needs of the stakeholders (JDEM analysis development, Grid Developers and Integration test stands, Storage/dCache Developers, LQCD testbed). Completed FermiCloud - EGI Technical Forum 2012

  7. FermiCloud Phase 2:“Deploy Management Services, Extend the Infrastructure and Research Capabilities” • Implement x509 based authentication (patches contributed back to OpenNebula project and are generally available in OpenNebula V3.2), perform secure contexualization of virtual machines at launch. • Perform virtualized filesystem I/O measurements, • Develop (draft) economic model, • Implement monitoring and accounting, • Collaborate with KISTI personnel to demonstrate Grid and Cloud Bursting capabilities and perform initial benchmarks of Virtualized MPI, • Target “small” low-cpu-load servers such as Grid gatekeepers, forwarding nodes, small databases, monitoring, etc., • Begin the hardware deployment of a distributed SAN, • Investigate automated provisioning mechanisms (puppet & cobbler). Almost Complete FermiCloud - EGI Technical Forum 2012

  8. FermiCloud Phase 3:“Establish Production Services and Evolve System Capabilities in Response to User Needs& Requests” • Deploy highly available 24x7 production services, • Both infrastructure and user services. • Deploy puppet & cobbler in production, • Develop and deploy real idle machine detection, • Research possibilities for a true multi-user filesystem on top of a distributed & replicated SAN, • Live migration becomes important for this phase. Underway FermiCloud - EGI Technical Forum 2012

  9. FermiCloud Phase 4:“Expand the service capabilities to servemore of our user communities” • Deploy a true multi-user filesystem on top of a distributed & replicated SAN, • Demonstrate interoperability as well/ accepting VM's as batch jobs, interoperating with KISTI cloud, Nimbus, etc., • Participate in Fermilab 100 Gb/s network testbed, • Perform more “Virtualized MPI” benchmarks and run some real world MPI codes, • OpenStack evaluation. Specifications Under Development FermiCloud - EGI Technical Forum 2012

  10. FermiCloud Economic Model • Calculate rack cost: • Rack, public Ethernet switch, private Ethernet switch, Infiniband switch, • $11,000 USD (one time). • Calculate system cost: • Based on 4 year lifecycle, • $6,500 USD / 16 processors / 4 years = $250 USD / year • Calculate storage cost: • 4 x FibreChannel switch, 2 x SATAbeast, 5 year lifecycle, • $130K USD / 60 Tbytes/ 5 years = $430 USD / TB-year • Calculate fully burdened system administrator cost: • Current estimate is 400 systems per administrator, • $250K USD / year / 400 systems = $1,250 USD / system-year FermiCloud - EGI Technical Forum 2012

  11. Service Level Agreements • 24x7: • Virtual machine will be deployed on the FermiCloud infrastructure 24x7. Typical use case – production services. • 9x5: • Virtual machine will be deployed on the FermiCloud infrastructure 8-5, M-F, may be “suspended or shelved” at other times. Typical use case – software/middleware developer. • Opportunistic: • Virtual machine may be deployed on the FermiCloud infrastructure providing that sufficient unallocated virtual machine “slots” are available, may be “suspended or shelved” at any time. Typical use case – short term computing needs. • HyperThreading / No HyperThreading: • Virtual machine will be deployed on FermiCloud infrastructure that [has / does not have] HyperThreading enabled. • Nights and Weekends: • Make FermiCloud resources (other than 24x7 SLA) available for “Grid Bursting”. FermiCloud - EGI Technical Forum 2012

  12. FermiCloud Draft Economic Model Results (USD) FermiCloud - EGI Technical Forum 2012

  13. FermiCloud / AmazonCost Comparison (USD) FermiCloud - EGI Technical Forum 2012

  14. FermiCloud - Authentication • Reuse Grid x509 based authentication: • Patches to OpenNebula to support this were developed at Fermilab and submitted back to the OpenNebula project (generally available in OpenNebula V3.2 onwards). • User authenticates to OpenNebula via graphical console via x509 authentication, EC2 query API with x509, or OCCI with x509, • Virtual Machines are launched with the users x509 proxy. FermiCloud - EGI Technical Forum 2012

  15. FermiCloud - Contextualization • Within FermiCloud, virtual machine contextualization is accomplished via: • Use users x509 proxy credentials to perform secure access via openSSL to an external “secure secrets repository”, • Credentials are copied into ramdisk within the VM and symlinks are made from the standard credential locations (/etc/grid-security/certificates) to the credentials in the ramdisk. • On virtual machine shutdown, the contents of the ramdisk disappear and the original credential remains in the “secure secrets repository”. • These mechanisms prevent the “misappropriation” of credentials if a VM image is copied from the FermiCloud VM library, • No credentials are stored in VM images “at rest” in the VM library. • This is not perfect – a determined user (VM administrator) could still copy the secure credentials off of their running VM, but this does not offer any additional risk beyond that posed by the administrator of a physical system. FermiCloud - EGI Technical Forum 2012

  16. FermiCloud – Fault Tolerance • As we have learned from FermiGrid, having a distributed fault tolerant infrastructure is highly desirable for production operations. • We are actively working on deploying the FermiCloud hardware resources in a fault tolerant infrastructure: • The physical systems are split across two buildings, • There is a fault tolerant network infrastructure in place that interconnects the two buildings, • We have deployed SAN hardware in both buildings, • We are finalizing the FermiCloud HA “head node” configuration, • We are evaluating GFS for our multi-user filesystem and distributed & replicated SAN. FermiCloud - EGI Technical Forum 2012

  17. FCC and GCC GCC The FCC and GCC buildings are separated by approximately 1 mile (1.6 km). FCC has UPS and Generator. GCC has UPS. FCC Business Continuity at Fermilab

  18. Distributed Network CoreProvides Redundant Connectivity Robotic Tape Libraries (3) Robotic Tape Libraries (4) FCC-2 GCC-A 20 Gigabit/s L3 Routed Network Deployment completed in June 2012 Nexus 7010 Nexus 7010 Grid Worker Nodes Disk Servers 80 Gigabit/s L2 Switched Network Fermi Grid Fermi Grid 40 Gigabit/s L2 Switched Networks Grid Worker Nodes Nexus 7010 Nexus 7010 Disk Servers FCC-3 GCC-B Fermi Cloud Fermi Cloud Note – Intermediate level switches and top of rack switches are not shown in the this diagram. Private Networks over dedicated fiber Business Continuity at Fermilab

  19. FermiCloud – Network & SAN“Today” GCC-B FCC-3 fcl316 To fcl323 fcl001 To fcl015 FCC-2 GCC-A Nexus 7010 Nexus 7010 Private Ethernet over dedicated fiber Brocade Brocade Nexus 7010 Nexus 7010 Satabeast Satabeast Brocade Brocade FCC-3 GCC-B Fibre Channel over dedicated fiber FY2011 / FY2012 FermiCloud - EGI Technical Forum 2012

  20. FermiCloud – Network & SAN (Possible Future – FY2013/2014) FCC-2 GCC-A GCC-B FCC-3 Brocade Brocade Brocade Brocade fcl0yy To fcl0zz fcl0xx To fcl0yy fcl316 To fcl330 fcl001 To fcl015 FCC-2 GCC-A Nexus 7010 Nexus 7010 Nexus 7010 Nexus 7010 FCC-3 GCC-B Fibre Channel Fibre Channel Brocade Brocade Satabeast Satabeast Brocade Brocade FermiCloud - EGI Technical Forum 2012

  21. FermiCloud-HAHead Node Configuration fcl-ganglia2 fcl-ganglia1 fermicloudnis2 fermicloudnis1 fermicloudrepo2 fermicloudrepo1 2 way rsync fclweb2 fclweb1 fcl-cobbler Live Migration fermicloudlog fermicloudadmin fermicloudrsv Multi-master MySQL fcl-mysql2 fcl-mysql1 fcl-lvs2 fcl-lvs1 Pulse/Piranha Heartbeat ONED/SCHED ONED/SCHED fcl301 (FCC-3) fcl001 (GCC-B) FermiCloud - EGI Technical Forum 2012

  22. FermiCloud – Monitoring Requirements & Goals • Need to monitor to assure that: • All hardware is available (both in FCC-3 and GCC-B), • All necessary and required OpenNebula services are running, • All “24x7” & “9x5” virtual machines (VMs) are running, • If a building is “lost”, then automatically relaunch “24x7” VMs on surviving infrastructure, then relaunch “9x5” VMs if there is sufficient remaining capacity, • Perform notification (via Service-Now) when exceptions are detected. FermiCloud - EGI Technical Forum 2012

  23. FermiCloud – Monitoring • FermiCloud Usage Monitor: • http://www-fermicloud.fnal.gov/fermicloud-usage-data.html • Data collection dynamically “ping-pongs” across systems deployed in FCC and GCC to offer redundancy, • See plot on next page. • We have deployed a production monitoring infrastructure based on Nagios: • Utilizing the OSG Resource Service Validation (RSV) scripts. • A “stretch” goal of the monitoring project was to figure out how to identify really idle virtual machines: • Unfortunately, the “virsh list” output cannot be used, since actively running Xen based VMs are incorrectly labeled as “idle” and idle KVM based VMs are incorrectly labeled as “running”. FermiCloud - EGI Technical Forum 2012

  24. Note – FermiGrid Production Services are operated at 100% to 200% “oversubscription” VM states as reported by “virsh list” FermiCloud Target FermiCloud - EGI Technical Forum 2012

  25. True Idle VM Detection • In times of resource need, we want the ability to suspend or “shelve” idle VMs in order to free up resources for higher priority usage. • This is especially important in the event of constrained resources (e.g. during building or network failure). • Shelving of “9x5” and “opportunistic” VMs allows us to use FermiCloud resources for Grid worker node VMs during nights and weekends • This is part of the draft economic model. • Giovanni Franzini (an Italian co-op student) has written (extensible) code for an “Idle VM Probe” that can be used to detect idle virtual machines based on CPU, disk I/O and network I/O. The deployment will be along the lines of the rup/rusers service: • A central “collector” will periodically contact the idle VM probe to determine the state of all the “running” virtual machines, and store the results. • The VM state results will be processed by a separate “logic” engine that will determine candidate “idle” virtual machines, • When necessary, the appropriate commands will be issued to shut down “idle” virtual machines (typically to make resources available for higher priority virtual machines). FermiCloud - EGI Technical Forum 2012

  26. Idle VM Information Flow Idle VM Management Process VM VM VM VM Raw VM StateDB VM VM Idle VM Collector VM VM VM VM Idle VM Logic Idle VM Trigger VM VM VM VM Idle VM List VM VM Idle VM Shutdown VM VM VM VM FermiCloud - EGI Technical Forum 2012

  27. FermiCloud - Accounting • Currently have two “probes” based on the Gratia accounting framework used by Fermilab and the Open Science Grid: • https://twiki.grid.iu.edu/bin/view/Accounting/WebHome • Standard Process Accounting (“psacct”) Probe: • Installed and runs within the virtual machine image, • Reports to standard gratia-fermi-psacct.fnal.gov. • Open Nebula Gratia Accounting Probe: • Runs on the OpenNebula management node and collects data from ONE logs, emits standard Gratia usage records, • Reports to the “virtualization” Gratia collector, • The “virtualization” Gratia collector runs existing standard Gratia collector software (no development was required), • The development of the Open Nebula Gratia accounting probe was performed by Tanya Levshina and Parag Mhashilkar. • Additional Gratia accounting probes could be developed: • Commercial – OracleVM, VMware, --- • Open Source – Nimbus, Eucalyptus, OpenStack, … FermiCloud - EGI Technical Forum 2012

  28. Open Nebula Gratia Accounting Probe Fermilab Gratia Collector/Reporter ONE DB onevm_query ONE API Collector gr11x4 gratia-fermi-itb Collector gr10x4 gratia-fermi-itb Collector gr11x3 gratia-fermi-transfer Collector gratia-fermi-transfer gr10x3 Collector gr11x2 gratia-fermi-qcd Collector gr10x2 gratia-fermi-qcd Collector gr11x1 gratia-fermi-psacct Collector gratia-fermi-psacct gr10x1 Reporter gr11x0 gratia-fermi-osg Collector gr10x0 gratia-fermi-osg onevm data Probe Config Standard Usage Records gratia_onevm Gratia API MySQLDatabase MySQLDatabase New Code osg user map FermiCloud - EGI Technical Forum 2012

  29. FermiCloud - EGI Technical Forum 2012

  30. FermiCloud - EGI Technical Forum 2012

  31. FermiCloud - EGI Technical Forum 2012

  32. FermiCloud - Interoperability • From the beginning, one of the goals of FermiCloud has been the ability to operate as a hybrid cloud: • Being able to join FermiCloud resources to FermiGrid resources to temporarily increase the Grid capacity or GlideinWMS with VMs (Grid Bursting), • Being able to join public cloud resources (such as Amazon EC2) to FermiCloud (Cloud Bursting via Public Clouds). • Participate in compatible community clouds (Cloud Bursting via other Private Clouds). FermiCloud - EGI Technical Forum 2012

  33. FermiCloud - Grid Bursting • Join “excess” FermiCloud capacity to FermiGrid: • Identify “idle” VMs on FermiCloud, • Automatically “shelve” the “idle” VMs, • Automatically launch “worker node” VMs, • “worker node” VMs join existing Grid cluster and contribute their resources to the Grid. • “Shelve” the “worker node” VMs when appropriate. • AKA – The “nights and weekend” plan for increased Grid computing capacity. FermiCloud - EGI Technical Forum 2012

  34. FermiCloud – Cloud Bursting • vCluster - Deployable on demand virtual cluster using hybrid cloud computing resources. • Head nodes launched on virtual machines within the FermiCloud private cloud. • Worker nodes launched on virtual machines within the Amazon EC2 public cloud. • Work performed by Dr. Seo-Young Noh (KISTI). • Refer to his ISGC talk on Friday 2-Mar-2012 for more details: • http://indico3.twgrid.org/indico/contributionDisplay.py?contribId=1&confId=44 FermiCloud - EGI Technical Forum 2012

  35. FermiCloud – Infiniband & MPI • To enable HPC stakeholders, the FermiCloud hardware specifications included the MellanoxSysConnect II Infiniband card that was claimed by Mellanox to support virtualization with the Infiniband SRIOV driver. • Unfortunately, despite promises from Mellanox prior to the purchase, we were unable to take delivery of the Infiniband SRIOV driver while working through the standard sales support channels. • While at SuperComputing 2011 in Seattle in November 2011, Steve Timm, Amitoj Singh and I met with the Mellanox engineers and were able to make arrangements to receive the Infiniband SRIOV driver. • The Infiniband SRIOV driver was delivered to us in December 2011, and using this driver, we were able to make measurements comparing MPI on “bare metal” to MPI on KVM virtual machines on the identical hardware. • The next three slides will detail the configurations and measurements. • This allows a direct measurement of the MPI “virtualization overhead”. FermiCloud - EGI Technical Forum 2012

  36. FermiCloud “Bare Metal MPI” Infiniband Card Infiniband Switch Infiniband Card Process pinned to CPU Process pinned to CPU Process pinned to CPU Process pinned to CPU Process pinned to CPU Process pinned to CPU Process pinned to CPU Process pinned to CPU Process pinned to CPU Process pinned to CPU Process pinned to CPU Process pinned to CPU Process pinned to CPU Process pinned to CPU Process pinned to CPU Process pinned to CPU FermiCloud - EGI Technical Forum 2012

  37. FermiCloud “Virtual MPI” Infiniband Card Infiniband Switch Infiniband Card VM pinned to CPU VM pinned to CPU VM pinned to CPU VM pinned to CPU VM pinned to CPU VM pinned to CPU VM pinned to CPU VM pinned to CPU VM pinned to CPU VM pinned to CPU VM pinned to CPU VM pinned to CPU VM pinned to CPU VM pinned to CPU VM pinned to CPU VM pinned to CPU FermiCloud - EGI Technical Forum 2012

  38. MPI on FermiCloud (Note 1) Notes: (1) Work performed by Dr. Hyunwoo Kim of KISTI in collaboration with Dr. Steven Timm of Fermilab. (2) Process/Virtual Machine “pinned” to CPU and associated NUMA memory via use of numactl. (3) Software Bridged Virtual Network using IP over IB (seen by Virtual Machine as a virtual Ethernet). (4) SRIOV driver presents native InfiniBand to virtual machine(s), 2nd virtual CPU is required to start SRIOV, but is only a virtual CPU, not an actual physical CPU. Virtual MPI on FermiCloud

  39. Current Stakeholders • Grid & Cloud Computing Personnel, • Run II – CDF & D0, • Intensity Frontier Experiments, • Cosmic Frontier (JDEM/WFIRST,), • Korean Institute for Science & Technology Investigation (KISTI), • Open Science Grid (OSG) software refactoring from pacman to RPM based distribution. FermiCloud - EGI Technical Forum 2012

  40. FermiCloud Efforts Today? • Production deployment of OpenNebula v3.2, • Will shortly follow with ONE v3.6 (we will skip ONE v3.4). • Monitoring, including Idle VM detection and management, • FermiCloud-HA design and deployment, • We are using FermiCloud resources to build, test and integrate the new RPM based OSG software stack, • Including stress testing of the software configurations at production scale. • Gearing up to use FermiCloud with the soon to be deployed 100 Gb/s network testbed: • We will start by using 100 Gb/s TX cards in our integration infrastructure, and may purchase some new FermiCloud servers. FermiCloud - EGI Technical Forum 2012

  41. FermiCloud Summary - 1 • The FermiCloud usage monitoring shows that the peak FermiCloud usage is ~100% of the nominal capacity and ~50% of the expected oversubscription capacity. • The FermiCloud collaboration with KISTI has leveraged the resources and expertise of both institutions to achieve significant benefits. • FermiCloud has implemented both monitoring and accounting by extension of existing tools. • Using SRIOV drivers on FermiCloud virtual machines, MPI performance has been demonstrated to be >96% of the native “bare metal” performance. • Note that this HPL benchmark performance measurement was accomplished using 2fewerphysical CPUs than the corresponding “bare metal” performance measurement! • The 2 fewer physical CPUs is a limitation of the current Infiniband SRIOV driver, if we can get this fixed then the virtualization results are likely to achieve full parity with the bare metal results. • We will be taking additional measurements of virtualized MPI performance in the near term! • FermiCloud personnel are working to implement a SAN storage deployment that will offer a true multi-user filesystem on top of a distributed & replicated SAN. FermiCloud - EGI Technical Forum 2012

  42. FermiCloud Summary - 2 • Science is directly and indirectly benefiting from FermiCloud: • CDF, D0, Intensity Frontier, Cosmic Frontier, CMS, ATLAS, Open Science Grid,… • FermiCloud operates at the forefront of delivering cloud computing capabilities to support physics research: • By starting small, developing a list of requirements, building on existing Grid knowledge and infrastructure to address those requirements, FermiCloud has managed to deliver an Infrastructure as a Service cloud computing capability that supports science at Fermilab. • The Open Science Grid software team is using FermiCloud resources to support their RPM “refactoring”. • None of this could have been accomplished without: • The excellent support from other departments of the Fermilab Computing Sector – including Computing Facilities, Site Networking, and Logistics. • The excellent collaboration with the open source communities – especially Scientific Linux and OpenNebula, • As well as the excellent collaboration and contributions from KISTI. FermiCloud - EGI Technical Forum 2012

  43. Job Opening • The Grid and Cloud Computing Department has been approved for a new personnel opening to work with the FermiGrid and FermiCloud projects: • Cloud and Grid Administrator (Computer Services Specialist II) • The opening will be posted on the “Jobs at Fermilab” web site shortly: • https://fermi.hodesiq.com/ FermiCloud - EGI Technical Forum 2012

  44. Thank You! Any Questions?

  45. Extra Slides

  46. FermiCloud – Hardware Specifications Currently 23 systems split across FCC-3 and GCC-B: • 2 x 2.67 GHz Intel “Westmere” 4 core CPU • Total 8 physical cores, potentially 16 cores with Hyper Threading (HT), • 48 GBytes of memory (we started with 24 & upgraded to 48), • 2 x 1GBit Ethernet interface (1 public, 1 private), • 8 port Raid Controller, • 2 x 300 GBytes of high speed local disk (15K RPM SAS), • 6 x 2 TBytes = 12 TB raw of RAID SATA disk = ~10 TB formatted, • InfiniBand SysConnect II DDR HBA, • Brocade FibreChannel HBA (added in Fall 2011/Spring 2012), • 2U SuperMicro chassis with redundant power supplies FermiCloud - EGI Technical Forum 2012

  47. FermiCloudTypical VM Specifications • Unit: • 1 Virtual CPU [2.67 GHz “core” with Hyper Threading (HT)], • 2 GBytes of memory, • 10-20 GBytes of of SAN based “VM Image” storage, • Additional ~20-50 GBytes of “transient” local storage. • Additional CPU “cores”, memory and storage are available for “purchase”: • Based on the (Draft) FermiCloud Economic Model, • Raw VM costs are competitive with Amazon EC2, • FermiCloud VMs can be custom configured per “client”, • Access to Fermilab science datasets is much better than Amazon EC2. FermiCloud - EGI Technical Forum 2012

  48. FermiCloud – VM Format • Virtual machine images are stored in a way that they can be exported as a device: • The OS partition contains full contents of the / partition plus a boot sector and a partition table. • Not compressed. • Kernel and initrd are stored internally to the image, • Different from Amazon and Eucalyptus, • Note that it is possible to have Xen and KVM kernels loaded in the same VM image and run it under either hypervisor. • Secrets are not stored in the image, • See slides on Authentication/Contextualization. • We are currently investigating the CERN method of launching multiple copies of same VM using LVM qcow2 (quick copy on write) but this is not our major use case at this time. • We will likely invest in LanTorrent/LVM for booting multiple “worker node” virtual machines simultaneously at a later date. FermiCloud - EGI Technical Forum 2012

  49. Comments on Cost Comparison • The FermiCloud “Unit” (CPU+2GB) without HyperThreadingis approximately two Amazon EC2 compute units. • Amazon can change their pricing model any time. • The Amazon EC2 prices do not include the costs for data movement, FermiCloud does not charge for data movement. Since the typical HEP experiment moves substantial amounts of data, the Amazon data movement changes will be significant. • The prices for FermiCloud do not include costs for the infrastructure (building/computer room/environmental) and the costs for operation (electricity). • System administrator costs factor out of the comparison, since they apply equally to both sides of the comparison [FermiCloud / Amazon]. • Our expectation/hope is that with the puppet & cobbler deployment, the VM system administrator costs will decrease. FermiCloud - EGI Technical Forum 2012

  50. Amazon Data MovementCost Range (USD) FermiCloud - EGI Technical Forum 2012

More Related