1 / 24

How to Manage Data Collection in a Large Environment

How to Manage Data Collection in a Large Environment. Paul K Merline & Mike Badaczewski November 15, 2011. Which is greater…the average attendance at Busch Stadium or the number of servers we collect data on every night?. Answer….. AT&T Systems collected nightly= 38,353 Busch Stadium

emmett
Télécharger la présentation

How to Manage Data Collection in a Large Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to Manage Data Collection in a Large Environment Paul K Merline & Mike BadaczewskiNovember 15, 2011

  2. Which is greater…the average attendance at Busch Stadium or the number of servers we collect data on every night?

  3. Answer….. AT&T Systems collected nightly= 38,353 Busch Stadium Average nightly attendance = 38,196 (source ESPN.com A/O 10-18-11)

  4. Data Collection Goals • Provide consistent, standard and meaningful resource usage data for all servers to support Capacity Planning. • Establish and maintain an environment capable of supporting data collection for 40,000 servers with the existing staff. • Have previous day’s data available by 08:00 local time.

  5. Data Collection Overview • Number of metrics collected and retention based upon criticality of server (service levels). • Separate data collection based on platform, e.g. UNIX, Windows, etc. • Spread workload across several centralized data collection servers (Consoles). • Stagger data collection across time zones. • Analyzed data output sent to data base server for Visualizer db loads.

  6. Data Collection Strategy Collect and retain only the metrics necessary based on the criticality of the server • Tier and Tier Level assigned based on: • server criticality (MCA, normal production) • status (production, test, development) • in-service indicator • Service Level assigned based on • Tier and Tier Level which determines: • metrics collected • retention period of metrics

  7. Tier Mapping to Service Levels

  8. Data Collection Service Levels BRONZE SILVER PLATINUM GOLD

  9. Data Collection Process Servers are grouped into collection domains based on: Service Level Gold Silver Bronze Region East Central Mountain Pacific Alaska Hawaii UNIX Windows VMWare Platform Frame Frames Non-Frames (target is 25 servers per domain for performance reasons)

  10. UNIX Metric Groups

  11. Data Retentions

  12. Current Data Collection Counts

  13. Data Collection Tools • The BMC Performance Assurance product family offers a complete solution for performance management of UNIX and Windows systems. • It delivers the following critical functions for managing distributed systems: • Real-time monitoring • Modeling and predicting • Graphical performance analysis

  14. BMC Performance Assurance ATT Developed Exception Reporting Database Server Servers (nodes) CPDB Application Reporter • FACT Metric Tables • Hourly • Summarized CPDB Reporting Collect Forecasting (bi-annual planning) BMC Visualizer BMC Perceiver Analyze (detailed analysis) (web-based report viewing) Console Servers Analyst Console BMC Predict BMC Investigate (real-time analysis) (modeling)

  15. BMC Consoles and Visualizer Database Visualizer Database Schemas Windows UNIX VMWare East Central Pacific East Central Pacific East Central Pacific Gold – 1 Gold – 1 Gold – 1 Gold – 6 Silver – 8 Bronze - 6 Gold – 1 Silver – 5 Bronze - 2 Gold – 6 Silver – 8 Bronze - 7 Gold – 5 Silver – 7 Bronze - 4 Gold – 1 Silver – 3 Bronze - 1 Gold – 1 Silver – 6 Bronze - 2 All Other - 5 All Other - 1 All Other - 4 62 UNIX Schemas 26 Windows Schemas 4 VMWare Schemas • Visualizer database is 2.3 Tb. in size and divided into 92 schemas by: • Platform • Time Zone • Service Level • (limit to 1,000 servers per schema for performance) Console A Console D Console C Console B 8,566 476 domains 8,743 475 domains 11,970 485 domains 9,074 489 domains Number of Servers Collected from Nightly Automation

  16. Data Collection Process • Perform binaries are laid down with the Patrol installation on the server (node) • A collector runs on each server (node) and writes data to disk periodically (currently set to 15 minutes) • The data is pulled by the Perform Console and processed nightly (hourly summarization) creating “vis” files • Nightly automation consists of 3 processes: • Retrieve • Analyze • Populate

  17. Nightly Automation Scheduling

  18. Monitoring Environment Results • Nightly automation stats • 7 time zones • 39 states • 256 cities • 1,925 domains • 1,947 VIS files • 38,353 servers • 621,615 UDR files • 13.5 new servers added per day over the last year (4,947)

  19. AT&T Capacity Planning Database

  20. AT&T Application Reporter

  21. Bonus Material BMC 7.5

  22. Performance Assurance Release 7.5 New Features and Functionality New Virtualization Support • SUN Solaris Logical Domains (LDoms) • SUN Chip Multi-Threading (CMT) technology • IBM AIX Live Partition Mobility • IBM AIX Workload Partitions (WPARs) • IBM PowerVM • HP Integrity Virtual Machines (IVM) • Microsoft 2008 Virtualization Server (Hyper-V) Enhanced VMware Virtualization Support • Cluster, resource pool, disk and datastore metrics • Info on relationships between servers, virtual machines, pools • Perceiver support for cluster, resource pool and disk views • Improvements to proxy data collector • Complete re-design of Visualizer tables and relationships

  23. Performance Assurance Release 7.5 New Features and Functionality (cont) Console Operations • Improvements to Manager for recovery and reprocessing of data • Manager exception reports • Officially supported Service Levels • New General Manager web application to manage Perform and Perceiver – daily operation and exceptions • UDR Transfer Utility • Changes to management of Hardware table for performance ratings • Changes to the Visualizer database structures • Problem resolutions and enhancement implementations

  24. 7.5 Migration Issues • Some Visualizer tables have been re-designed to accommodate metrics for virtual servers (current metrics may have moved to new tables). • The changes in Visualizer require migrating all data from the old 7.4 schemas to the new 7.5 schemas. • If multiple Consoles update the same Visualizer schema, all Consoles must be migrated to release 7.5 at the same time. • The Visualizer database migrations must be done at the same time the Consoles are migrated to release 7.5. • Therefore, in our environment, all Consoles and all Visualizer databases must be migrated to 7.5 simultaneously. • Per BMC, very large Visualizer schemas may take longer than a day to migrate to 7.5 (we have 90+ Visualizer schemas). • Per BMC, the most significant problems they have seen with the new release involves database migrations.

More Related