1 / 19

Problem Management

A A A N C N U I N F O R M A T I O N T E C H N O L O G Y IT OPERATIONS. Problem Management . Jim Heronime, Manager, ITSM Program Tanya Friehauf-Dungca, Manager, Problem Management 2/17/11. Agenda. PM Overview History Vision & Mission Operational Level Agreement (OLA) Action Items

leal
Télécharger la présentation

Problem Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A A A N C N U I N F O R M A T I O N T E C H N O L O G Y IT OPERATIONS Problem Management • Jim Heronime, Manager, ITSM Program • Tanya Friehauf-Dungca, Manager, Problem Management • 2/17/11

  2. Agenda • PM Overview • History • Vision & Mission • Operational Level Agreement (OLA) • Action Items • Trending (Proactive Problem Management) • Facilitated Meetings (MIR & ToE) • KPIs and Metrics • Future Initiatives • Questions? Problem Management Team Members

  3. Problem Management Overview • Main goal of Problem Management: • Detection of the underlying causes of an incident and the subsequent resolution and prevention of the incidents. • Problem Management ensures: • The identification and classification of problems, root cause analysis, and resolution of problems • Problem Management process also includes: • The formulation of recommendations for improvement, maintenance of problem records, and review of the status of corrective actions

  4. History of PM at AAA • Began our formal Problem Management practice in 2008. • Track major incidents • ID Root cause for major incidents • Rudimentary MS-Access dB to store info • Began formal implementation of ITSM in June 2009 • Average root cause found was 55.4% • Mean time to close problems = 6 days • Implemented current iteration of Problem Management October 2009. By January 2010. • Average root cause found was 83% • Mean time to close problems = 3 days • We continue to mature our process

  5. Vision and Mission • VISION: • To permanently eliminate problems in our production environment and prevent new problems from occurring • MISSION: • To aggressively identify root cause of problems and drive permanent solutions to stabilize our IT infrastructure • We do this by: • PROCESSES: Ensuring PM processes and procedures are followed by IT support teams • ACTION ITEMS: Managing assigned action items and their timeframes with support teams to drive permanent solutions • ROOT CAUSE: Driving root cause identification within OLA timeframes

  6. OLAs for PM Be aggressive: 3 Business days to identify root cause - Report enables us to track daily progress

  7. Action Items • Objective: • Action items are identified and assigned to drive permanent solutions • Types of Action Items: • Root cause identification for every problem created from an incident • Areas of improvement • Documentation • Process improvement & training • Vendor management • Hardware replacement • How are Action Items identified? • Incident management activities • Problem management activities – Root Cause Analysis • Meetings: Daily IT Operations Meeting, Major Incident Review (MIR), or Team of Experts (ToE) • How are they tracked? • Maximo – integrated system with Change, Incident, and Asset

  8. Trend Analysis (Proactive Problem Management) • Objective: • Analyze related incidents for common root causes • Collaboration with Operations Bridge: • Weekly work sessions to identify potential areas of concern • The Problem Management team reviews related incidents to look for common symptoms, causes, or conditions • Commonalities identified by trend analysis? • A Global Problem record is created and assigned to the Service Owner with appropriately assigned action items • Service Owner analysis: • The Service Owner prioritizes their efforts • Determine to identify root cause • Prioritize and approve with business for funding, scheduling

  9. Major Incident Review (MIR) • What is it? • Evaluation of the incident process after a major incident • What’s it’s purpose? • Validate details of the incident record • Review incident handling – identify opportunities • Identify lessons learned - share across the enterprise • Identify action items • When is one required? • Mandated for all Severity 1 incidents • Lower severities by request or as needed • Why does Problem Management facilitate a Major Incident Review? • Unbiased view of events – no call involvement

  10. MIR Agenda

  11. MIR Template

  12. Team of Experts (ToE) • What is it? • A special team of technical subject matter experts (SMEs) assembled to analyze and resolve critical problems at an accelerated pace to minimize or eliminate exposure. • How long has this process been in place? • This is one of our newest additions – since December 2010 • Why are ToEs initiated? • Teams not collaboratively engaging each other • Need to identify root cause immediately – back to back incidents • Leadership’s request for information and status of critical or chronic problems

  13. ToE (cont.) • ToE Activities • Root cause analysis • Brainstorm solutions and permanent fixes • Assign action items and due dates • Where’s the template? • Currently under construction

  14. KPIs and Metrics • KPIs • Root cause identified within OLA • MIRs conducted for Sev1 Incidents • Operational Metrics • Total Problems by Severity • Problems by Causing Party • Outages by Domain (Applications, Network, Security, Servers, Telecom or Other)

  15. KPIs *Baseline determined by internal historical data = 82% *Industry standards non-existent

  16. KPI Details *2010 Average for RC Identified within OLA = 85.7%

  17. Examples of Metrics *Change Freeze AT&T AAA NCNU

  18. Future Initiatives • Workarounds and defects – Known Error Database • Action item validation – quality check on completed actions • ToE template development

  19. Questions? • PROBLEM MANAGEMENT TEAM MEMBERS • Mark Hernandez - IT Service Transition Analyst V • Gessica Briggs-Sullivan – IT Service Transition Analyst III • Andrew Egan - Intern

More Related