Systems Management @ SEB - PowerPoint PPT Presentation

gunnar svanberg n.
Skip this Video
Loading SlideShow in 5 Seconds..
Systems Management @ SEB PowerPoint Presentation
Download Presentation
Systems Management @ SEB

play fullscreen
1 / 58
Systems Management @ SEB
Download Presentation
Download Presentation

Systems Management @ SEB

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Gunnar Svanberg Systems Management @ SEB

  2. Agenda • Overview ITIL etc. • Monitoring (system, network) • Transaction Monitoring • Job Scheduling • Security log archiving • Way of working (processes)

  3. Overwiev cont`d. Four main disciplines: • Monitoring • Network • System • Postmsg type. etc. • Omnibus • Transaction Monitoring • Passive non intrusive solution • Job Scheduling • TWS E2E based on TWSz (former OPC) • Security log archiving • From infrastructure and applications (Under construction)

  4. Agenda • Overview ITIL etc. • Monitoring (system, network) • Transaction Monitoring • Job Scheduling • Security log archiving • Way of working (processes)

  5. Network Monitoring

  6. Network Monitoring cont´d. • SNMP Manager, Spectrum from CA. • Strategic SNMP manager handles all SNMP based traffic from all sources. • ~6000 devices. • Tight integration with e-Health also CA (Network Capacity management). • Includes service management, i.e. port availability. • Forwards alarms to Omnibus EIF probe. • Built-in hot standby clustering. • Fully automated via CMDB, create/remove devices.

  7. ITM6 • ~3800 agents. • Unix, Windows 32&64, SQL, IIS, MS Cluster, File & Dir. (Blue Medora), LO (LFA logfile), • Three instances of itm6 Prod1, prod2 and prod3 • Prod1 main instance ~3500 agents. • Prod2, For SQL servers running either SQL 2000 or on W2K. This is a consequence of legacy OS levels. SQL 2K8 support added but SQL W2K withdrawn at the same time by IBM. Single W2K3 TEMS/TEPS • Prod3 new instance for new infrastructure. • All instances report to one Omnibus in Prod1. • Infrastructure currently at 6.2.2 FP03, OS agents at >6.2.1 • Warehouse on SQL Server, same box as S&P & WHP • All infrastructure servers are either VmWare or Solaris zone based, depending on OS type.

  8. ITM6 Infrastructure

  9. Logfile monitoring (parsing really) • IBM has five offerings within itm6 & TEC/Omnibus alone! • Legacy TEC adapter • Omnibus probe • Legacy UL agent • Agent builder • New LFA agent shipped with Omnibus consumes *.fmt files • Which one should I use? • Which one will be the strategic choice of IBM? • Guess what IBM, did not have an answer until recently when a sixth solution appeared. • It will appear on OPAL soon?

  10. ITM6 Windows specific topics

  11. Windows Agent • Installs as 2 new services, Primary and Watchdog. • kntcma.exe • kcawd.exe • Path C:\IBM\ITM. • 200 MB disk needed.

  12. SQL Agent • Installs as 2 new services, Agent and Collector • koqagent.exe • koqcoll.exe • Both services run under AD an account • Path C:\IBM\ITM. • 100 MB disk needed.

  13. MS Cluster Agent • Installs as 1 new service, Agent. Always manual startup • kq5Agent.exe • Path C:\IBM\ITM. • 80 MB disk needed.

  14. Cluster support • All agents (NT OS and Cluster) always installs on every physical node. • OS Agent is never configured to run by cluster administrator. Always automatic startup. • SQL agent, always manual startup on service’s. Cluster Administrator handles start/stop. • Cluster Agent, always manual startup of service. Cluster Administrator handles start/stop. • Cluster administrator app. Is used to configure SQL and Cluster agent. Generic service resources are def. • Full failover support with sustained monitoring. SQL Agent follows Group_hostname_SQL group and Cluster agent follows Group_hostname_Cluster group. • No more false cluster alarms!

  15. Cluster support cont´d. • Cluster agent shows up as it’s real hostname XCC* same for SQL Agent XQC*. So on a cluster where all three agents run alarms will come from the correct hostname. I.e if cluster manager alarms XCC* if SQL alarms XQC*. Huge advantage for NOC over existing solution, where all alarms comes from the node running the resources at the moment. (ITM5) • Note! Never start/stop a ITM6 agent that is defined as manual if the server is part of a cluster (goes for SQL and Cluster agents). There is a reason it is defined as Manual. The Cluster Administrator loses control of the resource and it will start to toggle between the nodes in the cluster. Bring it on/offline in cluster admin.

  16. Cluster Agent screenshots

  17. SQL Agent screenshots

  18. OS Agent screenshots

  19. Omnibus

  20. Omnibus • Migration project ran for 6 months. Completed feb. 2010. • We chose to put Omnibus on top, replacing TEC GUI for operators. • All rules migrated to Omnibus and TEC only forwards events to Omnibus. • Two EIF probes one for legacy EIF stuff and one for new EIF (?) Omnibus integration such as ITM6. • TEC is still running for legacy post sources. They are being replaced during 2010/2011. • Goal to sundown TEC during 2011. • New post binary posteifmsg replaces legacy versions and post’s to EIF Omnibus probe.

  21. Omnibus cont´d • Omnibus TDWH MS SQL alarm warehousing out of the box. • Reports generated with MS reporting services tools. • No out of the box indexing on TDWH history solution. • Reports can take long time to run. • Alarms kept for 13 months.

  22. Agenda • Overview ITIL etc. • Monitoring (system, network) • Transaction Monitoring • Job Scheduling • Security log archiving • Way of working (processes)

  23. Transaction Monitoring

  24. Transaction Monitoring cont`d. • Solution based on Compuware Vantage family. • Tap’s network traffic passive without intrusion nor agents on servers/systems. • Tap basically split’s 30% of the light in fibre’s without degrade quality of the signal. • Measures real user data no need for synthetic transaction type monitoring. • Agents still needed if deep dive is required or if data is kept inside the same host. I.e. SOA architecture. But anything traversing the network is captured by Vantage. • Around 20 tap points captures the whole of SEB’s network backbone. Legacy and new. Tap’s must be placed strategically in the core network to capture all packets.

  25. Implement business oriented monitoring Function insight / Tier insight Implement end to end monitoring including response time tracking

  26. ESB services autodiscoveredtype, quality, volumes and number of users ESB Services for Transfer ESB Services for Transfer ESB Services for Transfer

  27. Troubleshooting report example

  28. Manager Dashboard example

  29. Agenda • Overview ITIL etc. • Monitoring (system, network) • Transaction Monitoring • Job Scheduling • Security log archiving • Way of working (processes)

  30. Job Scheduling

  31. Job Scheduling cont’d

  32. Job Scheduling cont’d • Solution from IBM, Tivoli Workload Scheduler • Configured as TWSz E2E. Meaning all planning and scheduling is done on z/OS. • Same people handling z/OS jobs and distributed. • All agents, FTA’s (Fault Tolerant Agents) receive a full copy of the current plan built on z/OS and updated approx. every 12 hours. Stretches for 24 hours. Longer over weekends. • Domain Manager in place for legacy reasons and alarm forwarding. • DM, configured to log to file, picked up by TEC adapter. • Cross platform scheduling capability. Currently implemented at SEB for z/OS, Solaris and Windows.

  33. Agenda • Overview ITIL etc. • Monitoring (system, network) • Transaction Monitoring • Job Scheduling • Security log archiving • Way of working (processes)

  34. Security log archiving cont`d • Solution based on EnVision from RSA. • Origin from a PCI-DSS requirement. • Cross platform service possible to track logins cross platforms and systems. • Currently in project form. • Full functionality planned next year.

  35. Security log archiving

  36. Agenda • Overview ITIL etc. • Monitoring (system, network) • Transaction Monitoring • Job Scheduling • Security log archiving • Way of working (processes)

  37. Way of Working cont´d • Models for Objectives, Processes, Concepts, Information and Stakeholders. • Value added process chains. • Ambition to establish Service Oriented Delivery. • Two Systems Management role sets, Global Systems Management and Local Systems Management. • Other roles: Operator, Maintainer.

  38. Way of Working cont´dStakeholder Model

  39. Way of Working cont´d Small piece and simplified Conceptual model CMDB Subscriber Only based upon production items from Monitor Is to be deployed to Does always have one Maintainer Alarm Collection Alarm Can have zero to many Can have zero to many Do have one or shares an existing Focal Instruction

  40. Way of Working cont´d • In-house developed order and instruction portal with tight integration to CMDB. • Portal called AIWA (Alarm Information Web Application). • Each role has its own responsibilities and information. If maintainers does not supply their information for example what to monitor, Systems Management will and cannot deliver any monitoring. • AIWA exports an API (Web Services) enabling automatic batch orders (both creating and executing). • AIWA handles maintenance for Omnibus, granularity down to server name or Alarm Collection (MSL).

  41. Way of Working cont´dProcess model Omnibus ITM, Spectrum, Logfile, … AIWA