360 likes | 579 Vues
Disaster Recovery at the University of Alberta. Rob Lake (Presenter and Co-producer) Information Technology Planning and Forecasting Officer Office of the Vice Provost (Information Technology) University of Alberta rob.lake@ualberta.ca www.vpit.ualberta.ca.
E N D
DisasterRecovery at theUniversity of Alberta Rob Lake (Presenter and Co-producer) Information Technology Planning and Forecasting Officer Office of the Vice Provost (Information Technology) University of Alberta rob.lake@ualberta.ca www.vpit.ualberta.ca
Co-produced with (and thanks to): Marika Bourque Associate CITO and Executive Director of AICT University of Alberta Kevin Moodie Director of AICT University of Alberta Brian Acheson Director of AICT University of Alberta
2007 EDUCAUSE Top Ten IT Issues • Funding IT • Security • Administrative/ERP/Information Systems • Identity/Access Management • Disaster Recovery/Business Continuity • Faculty Development, Support and Training • Infrastructure • Strategic Planning • Course/Learning Management Systems • Governance, Organization and Leadership for IT Source: EDUCAUSEreviewMay/June 2007
2006 ECAR Survey Results “If central IT systems and services were not operational at my institution, business units could carry out essential operations.” 2.7% - Strongly Agree 2.1% - Don'tKnow 18.6% - Agree 31.6% - Strongly Disagree 10.0% - Neutral 35.0% - Disagree Source: Ron Yanosky, ECAR Symposium, 30 June 2006
2006 ECAR Survey Results 0.9% - Don'tKnow 6.5% - Strongly Agree 5.0% - Strongly Disagree “My institution is prepared to restore centrally controlled systems in the event of a disruption.” 20.7% - Disagree 44.6% - Agree 22.3% - Neutral Source: Ron Yanosky, ECAR Symposium, 30 June 2006
ECAR MostCritical Services • Campus Internet connection • Institutional Web site • Campus network • E-mail • Voice telephony • Course management system • Recovery time objective (RTO): 48 hours or less Source: ECAR Research Bulletin Volume 2007 Issue 4
Disaster Recovery at the UofA • Four components to disaster recovery plan: • Academic information systems • Administrative information systems • Off-site data recovery centre • Emergency notification
Academic Information Systems • AICT Disaster Recovery Overview plan completed in March 2007 • Core academic services: • Voice and data network connectivity • Web • Email / Webmail • Telephony • E-Learning (WebCT) • DNS • Identity Management • AFS
Academic Information Systems • Recovery Time Objective: 48 hours • Current plan requires a hot site to meet RTO for core academic services • Will investigate warm site possibilities and virtualization opportunities in the future
Academic Information Systems • Overall requirements (worst case): • $6.0 million for basic infrastructure • $5.6 million to meet specified RTO of critical services • $9.4 million for restoration of secondary services • Total cost: $21.0 million
Academic Information Systems • Plan considers two scenarios: • Restoration with a secondary hot site • Restoration without a secondary hot site
Academic Information Systems • Restoration with a secondary hot site: • Basic infrastructure already in place • Fully functioning equipment for the core academic services already in place • Installation of secondary services • 48 hour RTO for the core services • 3 month minimal restoration timeframe for the secondary services
Academic Information Systems • Restoration without a secondary hot site: • Requires selection of a hot site • Installation of basic infrastructure • Installation of core services • Installation of secondary services • 3 to 6 month downtime for core services • Up to 9 month downtime for secondary services
Administrative Information Systems • Outsourced to IBM Global Services since 2000 • Relocated to Markham, Ontario in 2005 • Warm site options in Montreal and Edmonton • Deferred until outsourcing contract renewal in 2010
Regional Data Centre Toma and Bouma Management Consultants and Stantec engaged in April 2006 to develop a Business Plan for a new Disaster Recovery Centre (DRC) Preliminary Business Case completed in late 2006 Approved by Vice-Presidents in April 2007
Data Centre Standards • Defined by the Telecommunications Infrastructure Standard for Data Centers (TIA 942) • Classifies data centers into Tiers • Each Tier offers a higher degree of sophistication and reliability
Tier 1 • Basic: 99.671% availability • Annual downtime of 28.8 hours • Susceptible to disruptions from both planned and unplanned activity • Single path for power and cooling distribution, no redundant components (N) • May or may not have a raised floor, UPS or generator • 3 months to implement
Tier 2 • Redundant Components: 99.741% availability • Annual downtime of 22.0 hours • Less susceptible to disruption from both planned and unplanned activity • Single path for power and cooling disruption, includes redundant components (N+1) • Includes raised floor, UPS and generator • 3 to 6 months to implement
Tier 3 • Concurrently Maintainable: 99.982% availability • Annual downtime of 1.6 hours • Enables planned activity without disrupting computer hardware operation, but unplanned events will still cause disruption • Multiple power and cooling distribution paths but with only one path active, includes redundant components (N+1) • Includes raised floor, UPS and generator • 15 to 20 months to implement
Tier 4 • Fault Tolerant: 99.995% availability • Annual downtime of 0.4 hours • Planned activity does not disrupt critical load and data center can sustain at least one worst-case unplanned event with no critical load impact • Multiple active power and cooling distribution paths with redundant components • 15 to 20 months to implement
Regional Data Centre • Data Centre Requirements: • Tier 3 • 18,000 sq. ft. • 6000 sq. ft. for servers and racks • 3000 sq. ft. for future growth • 9000 sq. ft. for support • Minimum 5 km from primary computing centre
Regional Data Centre • Options: • Exchange computing centre space with other institutions • Lease space from service providers • Build new DRC alone • Build new DRC with public and / or private partners
Option 1 • Exchange computing centre space • Minimal exchange with U of Calgary • No space in either computing centre • Reliance on external staff • Would require new capital investment
Option 2 2. Lease space from service provider • Four vendors surveyed • Lack of capacity in Alberta at this time, but that is changing • Vendors would consider building a facility in the Edmonton area if they could find an “anchor” tenant • Costs unclear, but would be about $3.5 million per year
Option 3 3. Build new DRC alone • Requires provincial funding assistance • Unlikely for University of Alberta only
Option 4 4. Build new DRC with public and / or private partners (P3 arrangement) • Northern Alberta post-secondary institutions • Government of Alberta • City of Edmonton • Capital Health • TELUS / Epcor / etc.
Option 4 • 16 Alberta post-secondary institutions surveyed: • 12 responded • 5 have no plan • 6 have a plan in progress • 1 plan completed • 8 interested in a regional solution • 4 “somewhat interested”
Regional Data Centre • 30,000 sq. ft. facility required • Capital costs range from $12 million to $36 million (average: $22 million) • Operating costs range from $1 million to $4 million per year (average: 3.5 million) • Better chance to be funded by provincial government • Governance model required • Rural or urban, but travel time important
Regional Data Centre • Working group established in August 2008 between Government of Alberta, City of Edmonton, Capital Health and the University of Alberta • Four meetings held • Government of Alberta leading the initiative • Consultants to finish long term strategy by end of February • 50,000 sq. ft. Tier 3 facility for 20 year period
Regional Data Centre • Four possible locations considered • Location needs to satisfy Provincial Auditors • May still involve a P3 model • Jubilee Auditorium model
Risk Mitigation • 3000 square foot server room to open in Enterprise Square in March 2008 • Lights out facility • Limited hot site capability – storage of offsite backup tapes • Intended for building tenants – one room per building • Green computing • Virtualization • IT Principles of Operation
Emergency Communication System • To be implemented by the start of the 2008/09 academic year • Emergency Communication Work Group established in November 2007 • Work in progress on an Emergency Communication Plan • RFP for an alert system to be released by the end of February 2008
Getting the Message Out • Home page announcement • Email • Telephony (cell and VOIP-enabled phones) • Facebook • Campus and local radio / media • Sirens • Flashing lights
Notification Software Criteria • Flexible • Easy-to-use • Continuously available (24x7x365) • Accommodate two-way communications • Accessible from multiple locations • Handle high volumes for calls or messages within a reasonable timeframe • Support an educational institution environment
Getting Started….. • This effects everyone – need buy-in from all constituents on campus • Roll-out will include a sign-up campaign and a public awareness campaign • Many roll-out strategies will be employed • Several emergency exercises have recently been held
Summary • Still an outstanding IT liability • No inexpensive solution available • Lack of capability for service providers • Partnership for a new regional facility the best option at this time • Some mitigation with the Enterprise Square server room • Currently exploring many partnership options