290 likes | 530 Vues
VMware vSphere on Cisco UCS at Deakin University. Melbourne VMUG – February 2011. Paul Fikkers, Unix Team Leader. About Me. I am a Unix Team Leader with the Systems Unit at Deakin. I have worked as a team leader for 3 years, prior to that as a Unix Administrator for 5 years.
E N D
VMware vSphere on Cisco UCS at Deakin University Melbourne VMUG – February 2011 Paul Fikkers, Unix Team Leader
About Me • I am a Unix Team Leader with the Systems Unit at Deakin. • I have worked as a team leader for 3 years, prior to that as a Unix Administrator for 5 years. • I have spent the better part of the last 8 years at Deakin. • Prior to Deakin I worked at EDS and Telstra. I also worked at Sun Microsystems in 2006 as Proactive Services Engineer. • My background and experience is mostly Unix (Solaris and RHEL). • I have a VCP on VMware Infrastructure 3 and 4 (vSphere). • My role with UCS and vSphere has been to oversee and assist with the planning and design of these technologies at Deakin as well as the migration from our VMware 3.5 Infrastructure to vSphere.
Overview of IT at Deakin • Deakin University has over 45,000 students and more than 5,000 staff spread across four campuses located in Burwood, Geelong Waterfront, Geelong Waurn Ponds and Warrnambool. • The Information Technology Services Division has around 200 staff and centrally manages the vast majority of IT service for the University from Desktop PCs and IP phones to the servers and services in the data centres. • We have two data centres, one at the Waterfront campus and one at the Burwood campus.
Overview of the Systems Unit • There are around 25 staff in the Systems Unit. • We are responsible for the management and provisioning of server hardware, operating systems, storage, databases, the data centres and infrastructure services (AD/LDAP, Internet, email, file sharing, DNS/DHCP, load balancing), • We manage approx 250 Red Hat Enterprise Linux servers, 250 Windows servers. • We manage around 300 physical servers/appliances in our data centres consisting of IBM xSeries servers, Cisco UCS, IBM XIV storage arrays, Isilon NAS, Cisco Ironports and Infoblox Network Service Appliances. • We also manage a small number of Mac OSX servers and are in the process of decommissioning the last of our Sun Solaris Servers.
Our history with VMware • VMware has been used at Deakin for over 10 years. We were using VMware Workstation early on and later moved to VMware Server. • In 2008, we began implementing VMware Infrastructure 3.5 as our server virtualisation software of choice. • At the time of beginning this work we had around 50 virtual servers running on VMware Server, and over 180 physical servers across our four campuses. • In early 2010, ITSD began installation of two Cisco Unified Computing Systems in our Waterfront and Burwood data centres.
Why UCS? • A reduction in the time taken to deploy new systems and therefore new business applications. • A reduction in system management overhead and therefore reduced staffing need. • A reduction in cabling, rack space, power and cooling requirements and therefore reduced cost. • Optimized for virtualisation. • Near-linear costs to add additional capacity
Our UCS environment • We have two Cisco Unified Computing System (UCS) instances installed, one in each of our Data Centres. • Across our two instances we house our VMware vSphere infrastructure and several Oracle Real Application Clusters (RAC) servers. • Combined, our Cisco UCS instances have over 440 CPU cores and over 3 TB of Random Access Memory (RAM). • This hardware takes up less than two server racks of space in each data centre. • Our vSphere clusters run on 24 (12 at each site) x B200 M2 dual socket, 6 core, 96GB blades.
Our vSphere Infrastructure • 3 vSphere clusters, 1 in each DC and 1 small two node cluster in Warrnambool. (2 x Cisco C200 M2 – 1RU Servers) • Running ESXi 4.1. • 2 vCenter Servers running on vSphere in linked mode. • HA enabled. Isolation response set to leave VMs on, Admission Control is off, EVC is on. • DRS is set to fully automated and aggressive. • Not using Fault Tolerance or DPM yet. • Not using SRM yet. • Using Symantec Netbackup client for VM backups.
Our Network • Each UCS 6120 Fabric Interconnect is running in end-host mode (no spanning tree) and has two 10Gbps uplinks with VLAN trunking of all networks down these links from Cisco 6509-E switches. • Pinning of vNICs to uplinks is dynamic. • All UCS blades used for vSphere have M81KR Virtual Interface (Palo) Cards. This allows us to present more than two vNICs on one CNA. • Using a vNetwork Distributed Switch for guest networks and standard vSwitch for VMkernel and IP storage Networks.
Our Network (cont..) • Each virtual switch (vSS of vDS) has two UCS vNICs (uplinks) that are configured with no failover on the UCS 6120 Fabric Interconnects. • We are not doing any traffic shaping (in vSS or vDS) or using PVLANs at this stage. • There is currently a project underway to replace our data centre core/aggregation and access layer network with Nexus 7000 and 5000 switches which will introduce benefits such as Virtual PortChannels.
Our Storage • SAN • Unified Fabric over Twinax copper from UCS chassis to 6120 Fabric Interconnects • The 6120s have multiple 4Gbps FC uplinks back to Cisco 9509 MDS switches. • Using NPIV (N_Port ID Virtualization). • Not using VSAN trunking yet. • Use LUN masking on storage arrays (IBM XIV) • Use host-based zoning on MDS (pre-zone WWPN blocks configured as WWPN Pools in UCS). • NFS (Isilon NAS) • Direct to ESXi hosts. Used for storage vMotion between clusters and Exchange 2010 archive mail.
Capacity and Resource Management • Currently 20 to 1 guest/host ratio on our most densely populated cluster. • No CPU/Memory limits, reservations or shares in use on individual VMs. • We have resource pools for DEV/UAT/Prod but they do not have limits or reservations. • Not oversubscribing memory so no ballooning occurs. • Most of our guests are over allocated RAM and CPU. • DRS migrations do occur during the day and more so during backup windows. • University is 24x7 but peak user load is still 9-5 for most corporate applications.
ESXi Provisioning • PXE Boot “stateless” ESXi nodes. • Scripted configuration of ESXi nodes, vCenter, Infoblox (DNS), XIV (storage) and UCS through use of APIs. • Provision of new ESXi node to cluster takes less than one hour (including hardware installation), to the point it is joined to the DRS cluster and guests are being migrated to the node. • Stateless PXE booting of ESXi nodes is still experimental and not fully supported by VMware.
Server virtualisation at Deakin. • Almost 70% of our server infrastructure (Windows/RHEL) is virtualised with more than 350 virtual machines running on our VMware vSphere clusters. • All of our VMware 3.5 Infrastructure has been decommissioned. • We plan to have 80% of our Windows and Red Hat server infrastructure virtualised by the end of 2011.
What we have virtualised • Many of our corporate application servers. • Oracle application servers, Web servers (Apache and IIS), HP Service Manager, Learning Management System, HR application nodes, Microsoft Enterprise Project Manager • Most of our core infrastructure servers • Exchange 2010, LDAP/AD and RADIUS, Squid proxy servers, vCenter servers. • Provisioning servers • RHN Satellite servers, Microsoft SCCM. • Database servers • MS SQL , MySQL, Filemaker Pro
What we have not virtualised • Oracle RAC Database Servers • Netbackup Media Servers • Hardware appliances (Infoblox Network Services Appliances, Ironport Email Security Appliances). • Some legacy Solaris servers and corporate applications. • Our production Exchange 2007 environment (to be decommissioned this year). • Our Cisco Call Managers.
Challenges - Technical • Site Planning: Getting started as early as possible to consider all aspects (power, cooling, rack space, capacity, network, storage). • Capacity Management and Planning: Choosing the right blade server for vSphere nodes (size of ram, performance/cores, cost). • Network: Choosing a distributed switch technology that suits your needs (vNetwork DS, Hardware VNLink with UCS Manager, Cisco 1KV) . • Integration: Catering for future network and storage architecture changes. • Security: Keeping things as secure as possible while maintaining simplicity and flexibility. • Migration: Developing a migration strategy early to ensure appropriate planning and communication can occur well ahead of time.
Challenges - Management • Bridging the skill gaps/overlaps: Systems vs. Networking vs. Storage and Backup. • Measuring and communicating benefits: We said we would save capital and operational expense through server virtualisation and unified computing. How do we prove that we have? • Paradigm shift in server replacement model: UCS and VMware abstract away the single use server model where the business application is tied to the hardware in which it runs. • Costing and charging: to the business for virtual hardware. • Cloud computing: competing with the public cloud and what to offer in terms of private or community cloud services (IaaS vs. PaaS vs. SaaS).
Lessons Learned • Test and re-test your configuration, in particular for anticipated redundancy and failover capabilities. • Avoid overcomplicating design in favour of a simple, repeatable configuration. • Automate where possible, but consider the maintainability of automation. • Cross skill staff and avoid creating a single go-to “golden spanner” person within your organisation. • Know your network and know what type of failures are likely to occur. • Review vendor best practices, but adhere to those that are relevant to your environment. • Where possible upgrade to the latest stable firmware version before deploying to production.
Questions? paul.fikkers@deakin.edu.au