1 / 36

Disaster Recovery at the University of Alberta

Disaster Recovery at the University of Alberta. Rob Lake (Presenter and Co-producer) Information Technology Planning and Forecasting Officer Office of the Vice Provost (Information Technology) University of Alberta rob.lake@ualberta.ca www.vpit.ualberta.ca.

lotus
Télécharger la présentation

Disaster Recovery at the University of Alberta

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DisasterRecovery at theUniversity of Alberta Rob Lake (Presenter and Co-producer) Information Technology Planning and Forecasting Officer Office of the Vice Provost (Information Technology) University of Alberta rob.lake@ualberta.ca www.vpit.ualberta.ca

  2. Co-produced with (and thanks to): Marika Bourque Associate CITO and Executive Director of AICT University of Alberta Kevin Moodie Director of AICT University of Alberta Brian Acheson Director of AICT University of Alberta

  3. 2007 EDUCAUSE Top Ten IT Issues • Funding IT • Security • Administrative/ERP/Information Systems • Identity/Access Management • Disaster Recovery/Business Continuity • Faculty Development, Support and Training • Infrastructure • Strategic Planning • Course/Learning Management Systems • Governance, Organization and Leadership for IT Source: EDUCAUSEreviewMay/June 2007

  4. 2006 ECAR Survey Results “If central IT systems and services were not operational at my institution, business units could carry out essential operations.” 2.7% - Strongly Agree 2.1% - Don'tKnow 18.6% - Agree 31.6% - Strongly Disagree 10.0% - Neutral 35.0% - Disagree Source: Ron Yanosky, ECAR Symposium, 30 June 2006

  5. 2006 ECAR Survey Results 0.9% - Don'tKnow 6.5% - Strongly Agree 5.0% - Strongly Disagree “My institution is prepared to restore centrally controlled systems in the event of a disruption.” 20.7% - Disagree 44.6% - Agree 22.3% - Neutral Source: Ron Yanosky, ECAR Symposium, 30 June 2006

  6. ECAR MostCritical Services • Campus Internet connection • Institutional Web site • Campus network • E-mail • Voice telephony • Course management system • Recovery time objective (RTO): 48 hours or less Source: ECAR Research Bulletin Volume 2007 Issue 4

  7. Disaster Recovery at the UofA • Four components to disaster recovery plan: • Academic information systems • Administrative information systems • Off-site data recovery centre • Emergency notification

  8. Academic Information Systems • AICT Disaster Recovery Overview plan completed in March 2007 • Core academic services: • Voice and data network connectivity • Web • Email / Webmail • Telephony • E-Learning (WebCT) • DNS • Identity Management • AFS

  9. Academic Information Systems • Recovery Time Objective: 48 hours • Current plan requires a hot site to meet RTO for core academic services • Will investigate warm site possibilities and virtualization opportunities in the future

  10. Academic Information Systems • Overall requirements (worst case): • $6.0 million for basic infrastructure • $5.6 million to meet specified RTO of critical services • $9.4 million for restoration of secondary services • Total cost: $21.0 million

  11. Academic Information Systems • Plan considers two scenarios: • Restoration with a secondary hot site • Restoration without a secondary hot site

  12. Academic Information Systems • Restoration with a secondary hot site: • Basic infrastructure already in place • Fully functioning equipment for the core academic services already in place • Installation of secondary services • 48 hour RTO for the core services • 3 month minimal restoration timeframe for the secondary services

  13. Academic Information Systems • Restoration without a secondary hot site: • Requires selection of a hot site • Installation of basic infrastructure • Installation of core services • Installation of secondary services • 3 to 6 month downtime for core services • Up to 9 month downtime for secondary services

  14. Administrative Information Systems • Outsourced to IBM Global Services since 2000 • Relocated to Markham, Ontario in 2005 • Warm site options in Montreal and Edmonton • Deferred until outsourcing contract renewal in 2010

  15. Regional Data Centre Toma and Bouma Management Consultants and Stantec engaged in April 2006 to develop a Business Plan for a new Disaster Recovery Centre (DRC) Preliminary Business Case completed in late 2006 Approved by Vice-Presidents in April 2007

  16. Data Centre Standards • Defined by the Telecommunications Infrastructure Standard for Data Centers (TIA 942) • Classifies data centers into Tiers • Each Tier offers a higher degree of sophistication and reliability

  17. Tier 1 • Basic: 99.671% availability • Annual downtime of 28.8 hours • Susceptible to disruptions from both planned and unplanned activity • Single path for power and cooling distribution, no redundant components (N) • May or may not have a raised floor, UPS or generator • 3 months to implement

  18. Tier 2 • Redundant Components: 99.741% availability • Annual downtime of 22.0 hours • Less susceptible to disruption from both planned and unplanned activity • Single path for power and cooling disruption, includes redundant components (N+1) • Includes raised floor, UPS and generator • 3 to 6 months to implement

  19. Tier 3 • Concurrently Maintainable: 99.982% availability • Annual downtime of 1.6 hours • Enables planned activity without disrupting computer hardware operation, but unplanned events will still cause disruption • Multiple power and cooling distribution paths but with only one path active, includes redundant components (N+1) • Includes raised floor, UPS and generator • 15 to 20 months to implement

  20. Tier 4 • Fault Tolerant: 99.995% availability • Annual downtime of 0.4 hours • Planned activity does not disrupt critical load and data center can sustain at least one worst-case unplanned event with no critical load impact • Multiple active power and cooling distribution paths with redundant components • 15 to 20 months to implement

  21. Regional Data Centre • Data Centre Requirements: • Tier 3 • 18,000 sq. ft. • 6000 sq. ft. for servers and racks • 3000 sq. ft. for future growth • 9000 sq. ft. for support • Minimum 5 km from primary computing centre

  22. Regional Data Centre • Options: • Exchange computing centre space with other institutions • Lease space from service providers • Build new DRC alone • Build new DRC with public and / or private partners

  23. Option 1 • Exchange computing centre space • Minimal exchange with U of Calgary • No space in either computing centre • Reliance on external staff • Would require new capital investment

  24. Option 2 2. Lease space from service provider • Four vendors surveyed • Lack of capacity in Alberta at this time, but that is changing • Vendors would consider building a facility in the Edmonton area if they could find an “anchor” tenant • Costs unclear, but would be about $3.5 million per year

  25. Option 3 3. Build new DRC alone • Requires provincial funding assistance • Unlikely for University of Alberta only

  26. Option 4 4. Build new DRC with public and / or private partners (P3 arrangement) • Northern Alberta post-secondary institutions • Government of Alberta • City of Edmonton • Capital Health • TELUS / Epcor / etc.

  27. Option 4 • 16 Alberta post-secondary institutions surveyed: • 12 responded • 5 have no plan • 6 have a plan in progress • 1 plan completed • 8 interested in a regional solution • 4 “somewhat interested”

  28. Regional Data Centre • 30,000 sq. ft. facility required • Capital costs range from $12 million to $36 million (average: $22 million) • Operating costs range from $1 million to $4 million per year (average: 3.5 million) • Better chance to be funded by provincial government • Governance model required • Rural or urban, but travel time important

  29. Regional Data Centre • Working group established in August 2008 between Government of Alberta, City of Edmonton, Capital Health and the University of Alberta • Four meetings held • Government of Alberta leading the initiative • Consultants to finish long term strategy by end of February • 50,000 sq. ft. Tier 3 facility for 20 year period

  30. Regional Data Centre • Four possible locations considered • Location needs to satisfy Provincial Auditors • May still involve a P3 model • Jubilee Auditorium model

  31. Risk Mitigation • 3000 square foot server room to open in Enterprise Square in March 2008 • Lights out facility • Limited hot site capability – storage of offsite backup tapes • Intended for building tenants – one room per building • Green computing • Virtualization • IT Principles of Operation

  32. Emergency Communication System • To be implemented by the start of the 2008/09 academic year • Emergency Communication Work Group established in November 2007 • Work in progress on an Emergency Communication Plan • RFP for an alert system to be released by the end of February 2008

  33. Getting the Message Out • Home page announcement • Email • Telephony (cell and VOIP-enabled phones) • Facebook • Campus and local radio / media • Sirens • Flashing lights

  34. Notification Software Criteria • Flexible • Easy-to-use • Continuously available (24x7x365) • Accommodate two-way communications • Accessible from multiple locations • Handle high volumes for calls or messages within a reasonable timeframe • Support an educational institution environment

  35. Getting Started….. • This effects everyone – need buy-in from all constituents on campus • Roll-out will include a sign-up campaign and a public awareness campaign • Many roll-out strategies will be employed • Several emergency exercises have recently been held

  36. Summary • Still an outstanding IT liability • No inexpensive solution available • Lack of capability for service providers • Partnership for a new regional facility the best option at this time • Some mitigation with the Enterprise Square server room • Currently exploring many partnership options

More Related