RACEv: An Overview What it is, where it came from

RACEv: An OverviewWhat it is, where it came from Scott O. Lundell solundel@us.ibm.com

Agenda • I/T Environment • RACEv History and Overview • Cross Platform Sizing Methodology • TCO Costing • RACEv Details • Customer Examples

Right-Fitting • IT is getting crunched • Reduce cost • Improve service • i.e. “Do more w/less!” • Efficiency Efficiency Efficiency • Optimization Optimization Optimization • Server Virtualization to the Rescue • As though picking the optimal server was not already hard • Right-Fitting • A function of (assets, workloads, skills, time, cost, risk) • And of course “functional” and “non-functional” requirements!

As a Percentage of IT Spend - Hardware costs have plummeted - Software costs have increased - People costs have exploded - Other costs are taking off Costs have changed dramatically 1995 Hardware Other Based on IBM Server Consolidation Engagements People Other Hardware Software Total Cost of Acquisition Analysis Used to capture 80% of spend. It is down to less than 50%. People Software Present

Because IT Complexity Drives Many Hidden Costs • Managing today’s mixed IT platform environments can be complex and costly • Thousands of servers • Underutilized assets • Thousands of software licenses • Thousands of distributed control points • Ineffective costing methodologies • The Result • Massive complexity • Spiraling people costs • Increased availability and downtime costs • Increased security breach costs • Sub-optimal investment choices Source:IDC

50 Spending (US$B) Power and cooling costs Installed Base (M Units) 45 Server mgmt and admin costs $300 New server spending 40 $250 35 30 $200 25 $150 20 15 $100 10 $50 5 0 $0 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 The Costs of Systems Management, Power, and Cooling are Creating New Pain Points for Server Customers Source: Virtualization 2.0: The Next Phase in Customer Adoption, Doc #204904, Dec 2006

RACEv History and Overview

Origins of RACEv • Introduction of CMOS and Specialty Engines into System Z • Changed value proposition • Select group of individuals started working on cross-platform sizing comparisons • Originally tightly controlled, over time began speaking openly with customers • Large number of sizings and comparisons • Customer Requirements for Right-Fitting Applications • Confusion among platform choices, needed a way to make decisions • Account teams began working with customer to develop tools to assist • IBM Management focused on growing System Z • Directed two individuals to find better way to drive new workload onto System Z • Many fits and starts • Needed to shorten sales cycle The “RACE Core Team” was born

What is RACEv • IBM Pre-Sales technical support program • Bring “total cost oriented” (aka TCO) tools to IBM sellers and customers • Fee-free customer offering • IBM Internal Use Only tooling • Acronym: • “Right-Fitting Applications into Consolidated Environments” • V is for virtualization • RACE Core-Team • Terry Weinberg, Bob Neidig, Bob Vik, Eduardo Oliveira, Scott Lundell, and Monte Bauman • World-wide community of practitioners • Originally geared for System Z, now used cross platform

What is RACEv • RACEv is both a framework and a tool • Implemented as a spreadsheet • People tend to distrust “black boxes” • Highly customizable • Two major functions • Cross platform sizing • TCO Modeling • Tries to bring platform decisions to a common metric -- Money

RACEv Major Functions • Cross Platform Sizing • Includes constraint analysis • TCO Modeling

Cross Platform Sizing

Memory SAP C C C Disk P P P Simplified Computer Architecture Front side bus C Cache design can have a great impact on performance and workload characteristics

Reported BW Zero C-C Bus Speed Zero Internal Bandwidth Example Shared Cache C-C Bus Private Cache Memory Cache Processors Effective BW C-M-C Speed C-C Bus Speed Infinite

Single system capacity is determined by: Processor speed Memory hierarchy I/O structure What determines system capacity There's more to performance than just processing power CPU Busy I/O Busy CPU Time Memory Time I/O Time Processor, memory, and I/O times vary greatly by application and by machine type

Memory Time I/O Time CPU Time zSeries CPU Time Memory Time Others I/O Time Relative single system capacity There's more to performance than just processing power Data Intensive Workloads CPU Busy zSeries Memory Time CPU Time I/O Time CPU Time I/O Time Memory Time Others Compute Intensive Workloads

Cross Platform Capacity Method • Capacity Metric can vary (MIPS, MHz, tpm, tpc-c, n° engines, ...) • Value of WLF corresponds to units of capacity • WorkLoad Factor difficult to measure • not enough benchmarks to cover all the cases • driven by cache miss rate, which cannot be directly measured • “Cloud of uncertainty” around measured values • Accuracy vs. Precision • Pathlength reductions need to be accounted for separately IBM Patent obtained on this method

Cross Platform Methodology • Multiple “before and after” measurements taken over the years • WLF derived from measurements • Values varies by an order of magnitude from high to low • Sizings done with CPU utilization • Assume CPU is limiting factor (Good assumption when going to zSeries) • Differences in I/O, memory, CPU speed, bandwidth, etc. accounted for by WLF • “Data-mining” of measurements leads to broad categories • Use broad categories to predict future relationships • Need to account for utilization differences and “peaking effects” • Total amount required is seldom the “peak of peaks” • Typically see 30-60% reduction in requirements • Overhead needs to be added for hypervisor • Both true overhead, and loss of efficiency

System z Workload Migration Characterization 10. CPU Intensive – e.g. numerically intensive, etc. 9. Protocol Serving – e.g. static HTTP, firewall, etc. 8. Skewless OTLP – e.g. simple and predictable transaction processing 7. Java Heavy – e.g. cpu intensive java applications Superb Candidates for System z 6. Java Light – e.g. data intensive java applications 5. Database – e.g. Oracle DBMS ordynamic HTTP server More Challenging for System z… But z196 is a game changer 4. Mixed High – e.g. multiple, cpu-intense simple applications 3. Mixed Low – e.g. multiple, data-intense applications or skewed OLTP, MQ 2. I/O Bound – e.g. high I/O content applications 1. Data Intensive – large working set and/or high I/O content applications

A Server Characterization Approach http://www.ideasinternational.com/ • Comprehensive source of cross-brand server performance • Tracks and licenses server research on all major vendors • Built into the RACEv tool set

Sizing Methodology • Also need to count for peaks and valleys of servers • Servers seldom all peak at the same time • RACEv uses two different methods to estimate “peak smoothing” • Statistically based • Can be overridden if real data available

Example #1 – Selected Servers Showing Maintenance Window

Example #2 – Utilization Profile

TCO Costing

Integration • Integrated Functionality vs. Functionality to be implemented (possibly with 3rd party tools) • Balanced System • Integration of / into Standards • Further Availability Aspects • Planned outages • Unplanned outages • Automated Take Over • Uninterrupted Take Over (especially for DB) • Workload Management across physical borders • Business continuity • Availability effects for other applications / projects • End User Service • End User Productivity • Virtualization • Skills and Resources • Personnel Education • Availability of Resources A Range of IT Cost Factors - Frequently Not Considered • Security • Authentication / Authorization • User Administration • Data Security • Server and OS Security • RACF vs. other solutions • Deployment and Support • System Programming • Keeping consistent OS and SW Level • Database Effort • Middleware • SW Maintenance • SW Distribution (across firewall) • Application • Technology Upgrade • System Release change without interrupts • Operating Concept • Development of an operating procedure • Feasibility of the developed procedure • Automation • Resource Utilization and Performance • Mixed Workload / Batch • Resource Sharing • shared nothing vs. shared everything • Parallel Sysplex vs. Other Concepts • Response Time • Performance Management • Peak handling / scalability • Availability • High availability • Hours of operation • Backup / Restore / Site Recovery • Backup • Disaster Scenario • Restore • Effort for Complete Site Recovery • SAN effort • Infrastructure Cost • Space • Power • Network Infrastructure • Storage Infrastructure • Initial Hardware Costs • Software Costs • Maintenance Costs • Additional development and implementation • Investment for one platform – reproduction for others • Controlling and Accounting • Analyzing the systems • Cost • Operations Effort • Monitoring, Operating • Problem Determination • Server Management Tools • Integrated Server Management – Enterprise Wide Routinely Assessed Cost Factors

RACEv in a Nutshell • A tool encapsulating a methodology • Step 1 • Technical Constraint & Configuration Analysis • Step 2 • Cost and Value Analysis • Step 3 • Iteration and Customization • Step 4 • Action

Target Cases Analysis Summary Analysis Inputs Case 1 Target Servers Server Server Server Server Case 0 Subject Servers Case 2 Target Servers Server Server Server Server Server Server Server Server Case 3 Target Servers Server Server Server Server Server

RACEv Subject Cases • CASE 0 - the “Input Case” - the “Subject Servers” • The servers that are subject to being virtualized. • Two “types” of subject servers: • Brown-field • A set of “existing” servers that are subject to being virtualized). • Green-field • A set of “new” servers that are candidates to instead being built new as virtualized servers.

RACEv Costing Categories (1) Energy • The cost of watts to power and cool servers • Nameplate watts converted to steady-state watts for accuracy (2) Floorspace • The cost of floorspace • Floor-standing, rack-mounted, or blade-chassis rack-mounted • each are carefully/separately accounted for (3) Facilities • Cost of new or updated datacenter facilities • CDUs, PDUs, datacenter expansion, new datacenter, etc. • (RACEv input field) (4) Migration • Cost of migrating subject servers to target (virtual) servers • (RACEv input field / IBM Server Makeover Team Link) (5) Engineering • Cost of engineering (building) target server infrastructure • (RACEv input field / IBM Server Makeover Team Link)

RACEv Costing Categories (continued) (6) Hardware Acquisition • Cost of new servers • RACEv understands that brown-field subject servers may have book-value losses to account for as added cost to target cases RACEv understand System z Solution Edition packages Enterprise Linux Solution (ELS) (7) Connectivity Acquisition • Cost of network-equipment ports and cables • Cost of storage area network switch-ports and cables • RACEv understands the potential to re-use ports and cables in use by subjects being removed in order to satisfy needs of targets to be installed (8) Disk Acquisition • Cost of disk storage devices • RACEv understands that some software binaries may be “shareable”, and as such disk savings may ensue

RACEv Costing Categories (continued) (9) Hardware Maintenance • Cost of maintaining acquired servers • RACEv understands warranty periods and starts maintenance costs in appropriate time periods • for this and for all maintenance fields (10) Connectivity Maintenance • Cost of maintaining acquired connectivity equipment (11) Disk Maintenance • Cost of maintaining acquired disk storage equipment (12) Software Licenses • Cost of software licenses acquired • RACEv understands that many middleware titles are “portable” between subject servers and the target servers replacing them • RACEv understands many of the varied means used by vendors to charge for software licenses • e.g. IBM Value Units, Oracle Core Processor Licensing Factor, etc. (13) Software Maintenance • Cost of software support, distributions, and/or subscription • RACEv understands many of the varied means used by vendors to charge for software maintenance • e.g. IBM Value Units, Oracle Core Processor Licensing Factor, etc.

RACEv Costing Categories (continued) (14) Network Bandwidth • The cost of bandwidth consumed on the physical network (mostly engineering costs) • Different companies have different concerns and costs regarding bandwidth • RACEv helps assess the degree to which various virtualization technologies are able to remove bandwidth consumption from the physical network (15) Administration • The cost of people required to administer the physical layer, the hypervisor layer, and the operating system layer of each case's configuration • RACEv assimilates the full-time equivalents (FTE) ratios associated with various technologies against target configurations to derive a estimated cost of administration (16) Disaster Recovery Equipment Acquisition • The cost of additional equipment required for disaster recovery configurations • RACEv provides easy/automated mechanism with suggested parameters for adding DR equipment in equal measure to “production” equipment configured (17) Disaster Recovery Equipment Operation • The cost of operating the acquired disaster recovery equipment • RACEv provides easy and accurate assessment means to account for total cost of operating additional equipment – aka power space connectivity maintenance etc.

RACEv Costing Categories (continued) (18) Outages • The cost of server-based outages • Lost sales • Lost payroll • Overtime paid during recovery • Penalties paid • Also handles growth and tech refreshes • Individual cost categories can be turned on or off • All values are either inputs or can be overridden • Up to 10 Target Cases supported WOW... 18 categories of cost! Pretty complete... What did we miss?

Introduction and Overview – Final Remarks • RACEv – a “Model” • A “framework” • Can do a lot, but has its limits • Still needs smart people to drive it • RACEv – a “spreadsheet” • Highly customizable • RACEv – a “methodology” • a “thoughtful” and “consultative” methodology to aid with (virtual) platform placement decision-making • aka “Right-Fitting” • RACEv studies can be done in 5 minutes or 5 months

RACEv Sample Inputs and Output Screen shots taken from multiple sources, numbers may not follow from one page to the next

RACEv Sample Inputs – Subjects sheet • Existing Servers need to be entered onto the Subjects sheet • Should be combined into groups

RACEv Sample Inputs • Completed Subjects sheet

RACEv Sample Inputs • To-be machine is identified on the Targets sheet

RACEv Sample Inputs

RACEv Sample Output Num. Servers 300 Num. Chips 300 Num. Cores 600 Num. Servers 29 Num. Chips 29 Num. Cores 174 Case ID 1st Year 2nd Year 3rd Year 4th Year 5th Year (0)x86-Discete 9,838,591 15,159,637 20,480,684 25,801,731 31,122,777 (1)x86-Vmware 3,869,286 6,004,041 8,138,796 10,282,831 12,426,865 (2)zLinux/z10 6,003,023 7,174,038 8,345,053 9,516,068 10,687,083 Num. Servers 1 Num. Cores 63

RACEv Summation RACEv Sample Output Detail – By Case

RACEv Sample Outputs • Example of Facts sheet • Server Maintenance detail

RACEv Sample Outputs – Datacenter Sheet Environmentals calculated on Datacenter sheet • Power • Cooling • Floorspace • Racks • Blade Chassis • zBX • Etc.

RACEv Sample Outputs • Detail from Software sheet

Admin Sheet Detail

Customer Examples with RACEv

Large Bank • CIO reluctant to look at zLinux – his people were too busy • Agreed to paper study with minimal input from his people • 5000 blades, Running WAS, 5% busy • RACEv done in a couple of hours • Showed savings of $84M by moving to zLinux • CIO then agreed to have people get real data – still paper study • RACEv done in two months • New number was $83.5M • CIO then agreed to POC

Large Insurance Company • Wanted to evaluate zLinux vs. Power for 401K application • All servers peaked at the same time • RACEv quickly showed than Power was the right platform

Large Brokerage Firm • Absolutely no Z presence • Once in a lifetime opportunity to change architectures • Managing director wanted to instill discipline • RACEv study done in just over a week • Showed some savings, but was not customized for the customer • Managing director immediately approved the POC • “Don’t you worry about MY vendor pricing” • Now they are one of the largest zLinux implementations in the world

Thank You

RACEv: An Overview What it is, where it came from