870 likes | 1.13k Vues
Ionix Findings & Recommendations for Citi Management Summary. Scott Shaffer – Director of Ionix Engineering Jim Stringer – Pr. Corporate System Engineer. Oct 5, 2009. Agenda. Part 1 – Management Summary Opening Statements Summary of Use Case Summary of Enhancement Requests Summary of SRs
E N D
Ionix Findings & Recommendations for CitiManagement Summary Scott Shaffer – Director of Ionix Engineering Jim Stringer – Pr. Corporate System Engineer Oct 5, 2009
Agenda Part 1 – Management Summary • Opening Statements • Summary of Use Case • Summary of Enhancement Requests • Summary of SRs • Agent related information • ControlCenter Repository Information • Reporting Information Part 2 – Detail Discussions • ControlCenter Facts • Citi Use Cases & Gap Analysis • EMC Findings & Recommendations • Citi Enhancement Requests • Citi Service Requests Wrap-up
Objective of this Presentation • To impart findings based upon EMC’s study of the current running environment at Citi • To illustrate Citi’s usage of ControlCenter today – its all about reporting • To show how Citi could benefit from by expanding it’s use of additional features of ControlCenter • To compare how Citi uses ControlCenter with how other EMC customers do • To provide an Engineering understanding of the root cause of the problems Citi has been experiencing
Two Companies Citi EMC
Working as a Team Collectively… We have challenges Citi Eng Citi Ops EMC Ionix EMC PS Let’s make it happen We have solutions
Use Case Summary How ControlCenter is used today and EMC’s recommendation with CC 6.1
Summary of Enhancement Request Current status on the Enhancement Request submitted by Citi
Summary of SRs Current status on the Customer Service Requests open by Citi
What’s needed “It is all about reporting” ControlCenter processes the information Storage Utilization is collected by the Agents Reports tailored to Citi’s needs are created Citi’s Reports Custom Reporting System
Summary of Agent findings Most (not all) problems started here Has negatively affected CC DB Reports tailored to Citi’s needs are created Citi’s Reports Custom Reporting System
Agents related factors • Citi requirements • No pushing from ControlCenter • Requires an Agent install package (Citi wrapper) • Upgrading Agents • Unix Hosts -- Citi wrapper and agent EMC Native Install Package works • Windows Hosts -- Citi wrapper and agent EMC Native Install Package is flawed • Upgrading Agent to CC 6.0 on Windows corrupts the ECC directory • Databases agents become orphaned with CC 6.0 agent upgrade • Citi servers are moved from Datacenter to Datacenter • Multi-homed servers used Dynamic DNS with round-robin resolution
Resulting in Agents related issues • CC 6.0 Native Install Package is a customer built package • To be installed on “new” servers only • Not designed for upgrading pre-existing agents • Unix Hosts • Citi wrapper for Unix does handle an agent upgrade • Windows Hosts • Citi wrapper for Windows does not work • Causing a loss of agents upgraded to CC 6.0 • Databases agent upgrade is a push • Citi servers end up reporting to multiple ControlCenter's • Citi servers may resolve to any IP address of the server • ControlCenter becomes confused as to “who is who”
How is EMC helping to resolve this? • New upgradeable CC 6.1 Native Install (NI) Package • Prebuilt for each platform by EMC • Incorporates the Console Agent Wizard technology • Includes Database Agents • Upgradeable & Patchable • Supports CC 5.2 and CC 6.0 agents • Built to meet Citi’s requests • Better CC 6.1 Citi Wrapper • Brought Citi and EMC Engineering together • Brainstorm using the silent install features for CC NI and Solutions Enabler • EMC is working on • resolving the agent resolution problem to use a fixed server name and IP address (starting in UB7) • EMC Engineering is working on solutions to clean-up the confusion in the ControlCenter Repository
Summary of Repository findings Agent communications Orphaned agents Multi host entries Has negatively affected CC DB Reports tailored to Citi’s needs are created Citi’s Reports Custom Reporting System
CC Repository related factors • Citi Servers (normal datacenter activity) • Moving around, decommissioning, HBA replacement, OS upgrades, hardware upgrades • Procedures for handling this in ControlCenter is documented • Host Resolution • Hosts files, NIS+, BigIP, Dynamic DNS • Inactive Agents • High number of agents with communication problems • Different Agents using different IPs on same host • Same Host with multiple names in Console • ControlCenter caught in a “who-is-who” bottleneck
Resulting in ControlCenter related issues • Citi servers reporting to multiple ControlCenter instances • Only one ControlCenter instance can monitor • Host Resolution from multiple sources cause failed communications • ControlCenter is deigned for strict DNS resolution • Certificates with CC 6.0 validate communications • CC 6.1 Agents are lock-down for security • Inactive Agents • ControlCenter is spending too much time trying to resolve the status of inactive agents • Causing degraded performance • Loss of host utilization reporting • Internal exception errors are being thrown because Managed Object (MO) known in the Repository is same, but different when being processed by the Store • Failed CC operations
How is EMC helping to resolve this? • Resolving data issues inside the Repository is • Tricky and dangerous • Ionix Engineering • Provided Principle resources backed by all of Ionix Engineering • A Senior ControlCenter person has been engaged onsite to help with day-day issues • ControlCenter as product • Being integrated with solutions that meet Citi’s requirement • Some changes can happen fast • Other are more involved and will be rolled out in Update Bundles • EMC Engineering has and is providing special solutions • To remediate the agents and an army of people for 3 weeks • Bulk loading of information for Database discovery • Solution to correct the Domain Name issue • Solutions to extract the MO Discovery information • Validation of the Citi Wrapper solution • Clean Host Domain issues • Identify DCP issues
What’s needed • Multi-site roll-up of storage Allocation and Utilization, by customer • Chargeback • 7 defined Tiers • Primary Capacity Utilization % • Multi-site, Cluster-aware • Customized layout • Drill-down sub-reports for Reclaimable Capacity, File-systems & Databases Custom Reporting System Citi’s Reports
How is this data obtained CC • Integration of External Data-Sources • TAS (Citi internal system) • COB (“Continuance of Business”) box identification • Server support contact information • FinCon (Citi internal system) • Application and App P&L codes • Line of Business • Tech Refresh list (Citi internal system) • Source, Target, Grace Period • Onaro SANscreen • Hosts not in ECC (via host-to-WWN lookup) • Hitachi Data Systems (HDS) Hi-Command scripts TAS Tech Refresh FinCon HDS Scripts Onaro Custom Reporting System Citi’s Reports
How is EMC helping to resolve this? CC • Providing Custom Reporting solution • A loader to pull the data from multiple sources • EMC PS resources • To understand the report details needed • To create the reports TAS Tech Refresh FinCon HDS Scripts Onaro Custom Reporting System Citi’s Reports
Custom Reporting at Citi (continued)15 reports, 6 data-sources, add’l notes
Summary • EMC is here with Citi for the long-haul • We need to all work together as team • From EMC Engineering PS CS Account Team. • We understand Citi’s needs, and is working to meet these needs. • Citi needs the right EMC people in place to support ControlCenter We collectively have challenges • Citi’s environment has exposed ControlCenter bugs and areas for changes and improvements • Citi has unique requirements that do not exist at any other customer • Citi needs to “think outside of the box” for ControlCenter: • How it is used and by whom • Ownership, Maintenance, Sharing • Embracing the technology
What EMC Engineering has delivered since July 2009 • CLARiiON Raid 6 – HF 4636 • Disk hot spare issue in STS – HF 4650 • HDS DP-Pool Vol – HF 4669 (Still in QE) • StorageScope DB starting issue – Dependency fix provided • Master Agent restart for Agent Remediation – Special Patch • AAD Utility for DB Discovery • SE Silent install that meets Citi needs – SE 7.0.1.12 • Engineering Consulting session for • ControlCenter UB5 Native Install handling • Solutions Enabler Silent Install handling • In-depth knowledge transfer on ControlCenter • ControlCenter design consulting – On going • Agent Pushing • Addressed the Manifest file request • Sybase Support in AIX • Identify installed version of SE on hosts • Including this in LCU is being investigated • Address the Gatekeeper issue • Multi-pathing • SE issues on AIX
What is next… • Resolve the agent binding issue • Will need modification to CTG.INI • Will need UB7 • Resolve the Host Domain issue • Clean of old obsolete domain names • Can be done with Jim’s FQDN utility • Resolve Managed Objects not in a DCP and MO’s in 2 or more DCPs • An enhancement to an older 5.2 script is being developed • Validate the AAD relationship information in repository • Developing a script to dump this data to a file – Still in process • Validate the upgrade to CC 6.1 in CSE Lab • Eliminate any concerns of database issues • Resolve issues found on infrastructure servers • ControlCenter 6.1 UB5 architectural design
Questions Coming up next: Indepth Discussions
Break-time Coming up next: In-depth Discussions
Ionix Findings & Recommendations for CitiDetailed Discussion Scott Shaffer – Director of Ionix Engineering Jim Stringer – Pr. Corporate System Engineer
A Quick Review of ControlCenter • What ControlCenter is • Functional Features • EMC’s Design Criteria • Maintaining ControlCenter
What is ControlCenter? • ControlCenter is not a tool… • A tool is an application that runs when it is needed and goes dormant when it is not needed. • Example of tools are Element Managers • Symmetrix Management Console • NaviSphere Manager • Celerra Manager • Centera Manager • ControlCenter is an intelligent learning application… • Continuously collecting information and learning about the datacenter • Continuously analyzing this information to provide actions and reports
ControlCenter Functional Features Collecting data on Managed Objects Presenting actionable information Storage Provisioning SAN Management Server Monitoring Database Monitoring Performance Analysis Intelligent Wizards Intelligent Wizards perform tasks Daily, Weekly, Monthly reports Analyzing Managed Objects’ operations
ControlCenter Design Criteria • Best-in-Class datacenter management • Flexible and powerful user console • Managed Object drill-down and drill-up navigation • Robust and detailed relationship and topology views • Intelligent Wizards • Self-maintaining functionality of internal databases • Flexible Reporting • Dashboard • Management views with graphics • Built-in reports • Query Builder for in-depth custom reporting
What ControlCenter 6 is not designed for… • Not designed with “Service Provider” features • There are no “multi-tendency” functions in ControlCenter today • Separation of data based upon duties • Can not hide Managed Objects based upon different users • Has limitations when defining functionality to a user for a Manage Object • This is a design goal for SRM7
Provides Assisted Problem Resolution • Problem Resolution Tools • Pro-active problem alerting • Early warning • Outage notification • Troubleshooting outages • Graphical presentations of • Path Details – Storage device to Host including SAN Connections • Visual Storage – Front-end to Back-end to physical disk relationships • Host Storage - Mapped and Masking Storage Views • Host Relationship Views – Server Storage to Array • SAN Topology Views – Switch port and Fabric connections • Best-in-Class Performance Analysis • Detailed graphical view Managed Objects • Host SAN Array Device Director physical disk • Ability to determine a failing disk with in a meta-device • Application Performance Analysis using the Data Links features • Detailed Trending tools
Maintaining ControlCenter • Alert Management • Acknowledging, Assigning, Clearing of Alerts • Daily activity • Needed to ensure good Console Performance • Agent Management • Review of agent Status • Daily activity • Resolve agent issues • Needed to ensure data integrity • Data Collection Policies • Manage as the environment changes • Managed Object Add/Deletes • Storage Reporting • Monitor StorageScope ETL status • Monitor Managed Object LDT (Last Discovery Time) • Both are needed to ensure reporting accuracy
Scope of Use Case & Gap Analysis • Criteria for the information being presented: • Discover how Citi uses ControlCenter • Focus was on the US operations only • Citi OPS, Rutherford, NJ • Provide a comparison to: • How ControlCenter was designed to be used • How ControlCenter is being used by EMC Typical customer • How ControlCenter is being used by EMC Customers that are similar to Citi • Uses Cases that apply to Citi today: • Reporting • Alerting
Use Case Analysis Charts Comparison Customer Largest Customers (Top 5% of installed base) Multiple Datacenters Greater then: 8K SAN Ports 2K Host 130 Arrays Feature functionality description Target Customer Single Datacenter Customer
Storage Provisioning • Citi Usage • Storage Provisioning is handled by the SAN Group • Using CLI and device specific tools • Device configuration • Device mapping • TimeFinder and BCV management • User Defined Fields • Not used by Citi • Storage Tiering • All reporting is handled by the Storage Group • Custom Reporting provided by EMC • Provides a roll-up of storage allocated and utilized • Information is gathered from multiple sources • Primary source of the data is ControlCenter data • Customer Comparison • Handled by the Storage Administrators • SAN Administrators are part of the Storage Group or closely aligned together • Accomplished through CLI by senior people and ControlCenter • As IT organizations are becoming leaner and experienced senior people are lost, ControlCenter is used more
SAN Management • Citi Usage • SAN Management is handled by the SAN Group • Citi has not discovered the SAN in ControlCenter • Not discovering the SAN… • Limits the troubleshooting features provided by ControlCenter • Leaves Citi blind to End-to-End information and performance analysis • Customer Comparison • Usually managed by the SAN Administrator or Storage Administrators • Storage Group is closely aligned with the SAN Group who handle the physical Switch Configuration • EMC Customers always discover the SAN • Fabric Management is usually done with Vendor SAN Management tools • Device mapping and masking is usually completed with ControlCenter
Monitoring • Citi Usage • Alerting is monitored by the Citi Storage Group • Monitors the Arrays, Databases and Hosts • These are critical for reporting • Monitoring is at the Agent level • LDT (Last Discovered Time) • Agent active status • Agents left in a failed state (inactive) causes degraded performance • Customer Comparison • All four elements are closely monitored and managed daily
Reporting • Citi Usage • Citi Storage Group handles the reporting • Data is gathered using the STSAPI tables • Reporting is generated from a SQL Multi-Site database • EMC has provided the Custom Reporting solution • Data is gathered from multiple sources using the Z-Loader • Report generation is customized to meet Citi requirements • StorageScope 6.x data is not currently used by Citi for reports • STSAPI tables are no longer updated by EMC • Customer Comparison: • EMC Customers use StorageScope data exclusively • Where there are multiple data centers, EMC custom reporting is contracted
Performance • Citi Usage • Citi has not enabled the Performance Management features • Customer Comparison • EMC Customers collect Performance data for: • All arrays • Critical Hosts • Specific SAN switches • Performance Manager is considered ControlCenter’s most valued feature