1 / 66

Chapter 9: Business Continuity Planning

Chapter 9: Business Continuity Planning. Business Continuity and Disaster Recovery: Overview Business Impact Analysis Preventative Measures Recovery Strategies Insurance Recovery and Restoration Implementing Strategies Testing, Revising, and Maintaining.

Jims
Télécharger la présentation

Chapter 9: Business Continuity Planning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 9: Business Continuity Planning • Business Continuity and Disaster Recovery: Overview • Business Impact Analysis • Preventative Measures • Recovery Strategies • Insurance • Recovery and Restoration • Implementing Strategies • Testing, Revising, and Maintaining

  2. Business Continuity and Disaster Recovery: Overview (1) Why Business Continuity Planning (BCP) and Disaster Recovery Planning (DRP) is important? • Every year, thousands of businesses are affected by floods, fires, tornadoes, terrorist attacks, and vandalism in one area or another. • Most organizations have tangible resources, intellectual property, employees, computers, communications links, facilities, and facility services. • If any one of these resources is damaged or inaccessible for one reason or another, the company can be crippled. • The companies that survive these traumas are the ones that thought ahead, planned for the worst, estimated the possible damages that could occur, and put the necessary controls in place to protect themselves.

  3. Overview (2): DRP vs. BCP • The goal of disaster recovery: minimize the effects of a disaster and take the necessary steps to ensure that the resources, personnel, and business processes are able to resume operation in a timely manner. • DRP: • deal with the disaster and its ramifications right after the disaster hits • is carried out when everything is still in emergency mode • BCP: • providing methods and procedures for dealing with longer-term outages and disasters. • takes a broader approach to the problem. This includes getting critical systems to another environment while repair of the original facilities is taking place, getting the right people to the right places, and performing business in a different mode until regular conditions are back in place.

  4. Overview (3): BCP in Overall Security Program • Every company should have security policies, procedures, standards, and guidelines. they provide the framework of a security program for an organization. • Business continuity should be a part of the security program and business decisions, as opposed to being an entity that stands off in a corner by itself.

  5. Overview (4): Management Role in BCP • First of all, we need to identify critical functions and critical resources in an organization • These should be protected in a BCP • Who should involved in this task? Why? • The most critical part of establishing and maintaining a current continuity plan is management support. • It is critical that management understands what the real threats are to the company, the consequences of those threats, and the potential loss values for each threat. • Executives may be held responsible and liable under various laws and regulations. They could be sued by stockholders and customers if … • The cost / benefit issues

  6. Overview (5): Who will build BCP? • A business continuity coordinatorneeds to be identified. • This will be the leader for the BCP team and will oversee the development, implementation, and testing of the continuity and disaster recovery plans. • A BCP committee needs to be put together. • The team must be comprised of people who are familiar with the different departments within the company,

  7. Overview (6): Best Practices of BCP • Although there is not a specific scientific equation that must be followed to create continuity plans, there are best practices that have proven themselves over time. • The National Institute of Standards and Technology (NIST) organization is responsible for developing these best practices and documenting them • Special Publication 800-34, Continuity Planning Guide for Information Technology Systems, (http://csrc.nist.gov/publications/nistpubs/800-34/sp800-34.pdf)

  8. a.k.a. Project initiation phase

  9. Index • Business Continuity and Disaster Recovery: Overview • Business Impact Analysis • Preventative Measures • Recovery Strategies • Insurance • Recovery and Restoration • Implementing Strategies • Testing, Revising, and Maintaining

  10. Business Impact Analysis (1) Business impact analysis (BIA)is a functional analysis • BCP committeecollects data through interviews and documentary sources; documents business functions, activities, and transactions; develops a hierarchy of business functions; and finally applies a classification scheme to indicate each individual function’s criticality level. • BCP committee must identify the threats and map them to the following characteristics: • Maximum tolerable downtime (MTD) • Operational disruption and productivity • Financial considerations • Regulatory responsibilities • Reputation

  11. Business Impact Analysis (2) BIA steps: • Select individuals to interview for data gathering. • Create data-gathering techniques. • Identify the company’s critical business functions. • Identify the resources that these functions depend upon. • Calculate how long these functions can survive without these resources -- maximum tolerable downtime (MTD) • Identify vulnerabilities and threats to these functions. • Calculate risk for each different business function. • Document findings and report them to management.

  12. Business Impact Analysis (3)Maximum tolerable downtime (MTD) • The outage time that can be endured by a company is referred to as the maximum tolerable downtime (MTD). • Some MTD estimates that may be used within an organization: • Nonessential 30 days • Normal 7 days • Important 72 hours • Urgent 24 hours • Critical Minutes to hours • Each business function and asset should be placed in one of these categories.  to determine what backup solutions are necessary to ensure the availability of these resources. • E.g. MTD of a T1 communication line is three hours and cost $130,000 MTD of a sever is ten days and cost $250

  13. Business Impact Analysis (4)Risk Analysis • Threats can be manmade, natural, or technical • Manmade threats: an arsonist, a terrorist, or a simple mistake that can have serious outcomes. • Natural threats: tornadoes, floods, hurricanes, or earthquakes. • Technical threats: data corruption, loss of power, device failure, or loss of a data communications line. • Steps of risk analysis: • To identify all possible threats and estimate the probability of them happening. • To assign a value to the assets that could be affected by each threat. • The value an asset include the amount of money paid for it, the asset’s role to the company, and liability issues. Risk = the likehood of a negative event happening * the impact of such an event happening

  14. Business Impact Analysis (5)Quantitative vs. Qualitative • In BIA, information should be stated in quantitative terms, not in subjective, qualitative terms. e.g., If a tornado were to hit, the result would be really bad. If a tornado were to hit and affect 65 percent of the facility, the company could be at risk of losing computing capabilities for up to 72 hours, power supply for up to 24 hours, and a full stop of operations for 76 hours, which would equate to a loss of $125,000 each day.

  15. Business Impact Analysis (6)Interdependencies in BIA • A company comprises many types of equipment, people, tasks, departments, communications mechanisms, and interfaces to the outer world. • The biggest challenge of continuity planning is understanding all of these intricacies and their interrelationships.

  16. Example of Dependency Chart

  17. Business Impact Analysis (7)Software tools • There are several software tools available for developing a BCP that simplify the process. • Business Continuity Plan Generator: • comprises two major elements: a template and a guide. • Disaster Recovery Toolkit: is designed to help you review the full array of business continuity and disaster recovery issues. It comprises: • A contingency audit questionnaire • A Business Impact Analysis questionnaire. • An audit questionnaire for your disaster recovery or business continuity plan (if indeed you have one) • A checklist, action list and framework for disaster recovery

  18. Index • Business Continuity and Disaster Recovery: Overview • Business Impact Analysis • Preventative Measures • Recovery Strategies • Insurance • Recovery and Restoration • Implementing Strategies • Testing, Revising, and Maintaining

  19. Preventative Measures (1) • To reduce negative impact and mitigate these risks by implementing preventative measures. • instead of just waiting for a disaster to hit to see how the company holds up, countermeasures should be integrated to better fortify the company from the impacts that were recognized. • Appropriate and cost-effective, preventative methods and proactive measures are more preferable than reactionary methods.

  20. Preventative Measures (2) Preventative Measures include: • Fortification of the facility in its construction materials • Redundant servers and communications links • Power lines coming in through different transformers • Redundant vendor support • Purchasing of insurance • Purchasing of UPS and generators • Data backup technologies • Media protection safeguards • Increased inventory of critical equipment • Fire detection and suppression systems

  21. Index • Business Continuity and Disaster Recovery: Overview • Business Impact Analysis • Preventative Measures • Recovery Strategies • Insurance • Recovery and Restoration • Implementing Strategies • Testing, Revising, and Maintaining

  22. Recovery Strategy (1) • In the recovery strategy stage, the team try to figure out what the company needs to do to actually recover the items that it has identified to be so important to the organization. • discover the most cost-effective recovery mechanisms that need to be implemented to address the threats that were identified in the BIA stage. • Preventative mechanisms • Are put into place to try to reduce the possibility of the company experiencing a disaster • If a disaster does hit, to lessen the amount of damage that will take place. • Recovery strategies are a set of predefined activities that will be implemented and carried out in response to a disaster.. • Such as establishing alternate sites for facilities, implementing emergency response procedures, etc.

  23. Recovery Strategy (2) • In BIA phase, the team has figured out these types of timelines for the individual business functions, operations, and resources. (MTD) • In develop recovery strategy phase, the team needs to identify the recovery mechanisms and strategies that must be implemented to make sure that everything is up and running within the timelines that it has calculated. • Business process recovery • Facility recovery • Supply and technology recovery • User environment recovery • Data recovery

  24. Business Process Recovery • A business process is a set of interrelated steps linked through specific decision activities to accomplish a specific task. • The processes should encapsulate the knowledge of services, resources, and operations provided by a company. • E.g., when a customer requests to buy a car via an organization’s e-commerce site, a set of steps must be followed. • The BCP team needs to understand these different steps of the company’s most critical steps. • The data is usually presented as a workflow document

  25. Example of Workflow

  26. Facility Recovery (1) • Three main categories of disruptions: nondisaster, disaster, and catastrophe • A nondisasteris a disruption in service as a result of a device malfunction or failure. • Replacing a device or restoring files from onsite backups • The team needs to identify the critical equipment and estimate the mean time between failure (MTBF) and mean time to repair (MTTR) • MTBF is the estimated lifetime of a piece of equipment • MTTR is an estimate of how long it will take to fix a piece of equipment

  27. Facility Recovery (2) • A disasteris an event that causes the entire facility to be unusable for a day or longer. • Usually requires the use of an alternate processing facility • A catastropheis a major disruption that destroys the facility altogether. • Requires both a short-term solution (an offsite facility) and a long-term solution (rebuilding the original facility)

  28. Facility Recovery (3) • Companies can choose from three main types of leased or rented offsite facilities: Hot site, Warm site, Cold site • Hot site: a facility that is leased or rented and is fully configured and ready to operate within a few hours. • The only missing resources from a hot site are usually the data (will be retrieved from a backup site), and the people who will be processing the data. • Are a good choice for a company that needs to ensure that a site will be available for it as soon as possible. • the annual testing guarantee its operating state • A hot site can support a short- or long-term outage • the most expensive choice among three offsites

  29. Facility Recovery (4) • Warm site:a leased or rented facility that is usually partially configured with some equipment, but not the actual computers. • A warm site = a hot site - the expensive equipments • Less expensive than a hot site • Can be up and running within a reasonably acceptable time period. • The most widely used model • Drawback: the annual testing is not usually available. Thus a company cannot be certain that it will in fact be able to return to an operating state within hours.

  30. Facility Recovery (5) • Cold siteA leased or rented facility that supplies the basic environment • Electrical wiring, air conditioning, plumbing, and flooring, but none of the equipment or additional services. • It may take weeks to get the site activated and ready for work. • the least expensive option • Comparison among three offsite options P712

  31. Facility Recovery (6) • Alternatives to offsite facility: • Reciprocal agreement • Redundant sites • Reciprocal agreement, also referred to as mutual aid, with another company. • This means that company A agrees to allow company B to use its facilities if company B is hit by a disaster, and vice versa. • A cheaper way to go than the other offsite choices, but it is not always the best choice.

  32. Facility Recovery (7) • Redundant sites: one site is equipped and configured exactly like the primary site, which serves as a redundant environment. • Primary site, backup site, and tertiary site. • These sites are owned by the company and are mirrors of the original production environment. • The most expensive backup facility options, because a full environment must be maintained. • Other facility-backup options • Rolling hot site (mobile hot site) • multiple processing centers

  33. Facility Recovery (8) Hot site vs. redundant site • A hot site are provided by service bureaus, is a subscription service. • A redundant site is a site owned and maintained by the company. The company does not pay anyone else for the site.

  34. Supply and Technology Recovery (1) BCP team needs to dig down into some more granular items, such as backup solutions for the following: • Network and computer equipment • Voice and data communications resources • Human resources • Transportation of equipment and personnel • Environment issues (HVAC) • Data and personnel security issues • Supplies (paper, forms, cabling, and so on) • Documentation

  35. Supply and Technology Recovery (2) It is not easy to fully understand the organization’s current technical environment, because … • The network was most likely established years ago and has kept growing • Over years, a number of technology refreshes have taken place • Employee turnover: the individuals who are maintaining the environment now are not the same people who built it years ago.

  36. Supply and Technology Recovery (3)hardware backup • The team has identified the equipment that is required to keep the critical functions up and running. • Issue 1: Using images vs. building from scratch • Using images is time-saving, unless the team finds out that the replacement equipment is a newer version and thus the images cannot be used. • The BCP team should plan for the recovery team to use the company’s current images, but also have a manual process of how to build each critical system from scratch with the necessary configurations.

  37. Supply and Technology Recovery (4)hardware backup • Issue 2: Depending on SLA vs. redundant system • MTD indicates how long the company can be without a specific device. • Knowing the parameters of the SLA • The BCP team needs to make a decision between depending upon the vendor or purchasing redundant systems and storing them as backups • Issue 3: legacy system vs. COTS product • The team should identify legacy devices and understand the risk that the organization is under if replacements are unavailable. • This type of finding has caused many companies to move from legacy systems to commercial off the shelf (COTS) products to ensure that replacement is possible.

  38. Supply and Technology Recovery (5) Software Backup • The BCP team should make sure to have an inventory of the necessary software that is required for mission-critical functions and have backup copies at an offsite facility. • At least two copies of the company’s operating system software and critical applications. • One copy should be stored onsite and the other copy should be stored at a secure offsite location. • These copies should be tested periodically and re-created when new versions are rolled out.

  39. Supply and Technology Recovery (6) Software Backup • Customized software usually comes without source code • What if this software vendor goes out of business because of a disaster or bankruptcy? • A company will require a new vendor to maintain and update this customized software; thus, the new vendor will need access to the source code. • Software escrow means that a third party holds the source code, backups of the compiled code, manuals, and other supporting materials. • This contract usually states that the customer can have access to the source code only if and when the vendor goes out of business, is unable to carry out stated responsibilities, or is in breach of the original contract.

  40. Supply and Technology Recovery (7) Documentation Without documentation, when a disaster hits, no one will know how to put critical function back together again. • The documentation needs to include: • information on how to install images, configure operating systems and servers, and properly install utilities and proprietary software. • A calling tree, which outlines who should be contacted, in what order, and who is responsible for doing the calling. • Multiple copies: One copy may be at the primary location. Typically, a copy is stored at the BCP coordinator’s home and a copy is stored at the offsite facility. This reduces the risk of not having access to the plans when needed.

  41. Supply and Technology Recovery (8) Human Resources Human resources is a critical component to any recovery and continuity process • Issue 1: If a large disaster takes place, will employees be more worried about your company or their families? • Issue 2: The BCP team may need to look at how it will be able to replace employees quickly through a temporary agency or a headhunter. • Issue 3: executive succession planning • If someone in a senior executive position retires, leaves the company, or is killed, the organization has predetermined steps to carry out to protect the company. • Deputies are ready to take over the necessary tasks • A policy indicating that to protect the United States, its top leaders cannot be under the same risk at the same time.

  42. Supply and Technology Recovery (9) End-User Environment The end users must be provided a functioning environment as soon as possible after a disaster hits. • How the end users will be notified of the disaster and who will tell them where to go and when. • A tree structure of managers can be developed • After a disaster, only a skeleton crew is put back to work. • The BCP committee identified the most critical functions of the company during the analysis stage, and the employees who carry out those functions must be put back to work first. • The BCP team needs to identify user requirements • stand-alone PCs, networked systems … • The BCP team needs to identify how current automated tasks can be carried out manually if that becomes necessary.

  43. Supply and Technology Recovery (10) Data Backup • The BCP team’s responsibility is to provide solutions to protect this data and identify ways to restore it after a disaster. • Data has become one of the most critical assets to nearly all organizations. • Data usually changes more often than hardware and software, so these backup procedures must happen on a continual basis. • The data backups can be full, differential, or incremental backups and are usually used in some type of combination. • Most companies choose to combine a full backup with a differential OR incremental backup.

  44. Supply and Technology Recovery (11) Data Backup • Full Process • All data is backed up and saved to some type of storage media. • The archive bit is clear • the restoration process is just one step, but the backup and restore processes could take a long time. • Differential Process • Backs up the files that have been modified since the last full backup. • Does not change the archive bit value. • When the data needs to be restored, the full backup is laid down first and then the differential backup is put down on top of it.

  45. Supply and Technology Recovery (12) Data Backup • Incremental process • Backs up all the files that have changed since the last full or incremental backup • The archive bit is clear • When the data needs to be restored, the full backup data is laid down and then each incremental backup is laid down on top of it in the proper order. • A comparison of three data backup processes is next …

  46. Supply and Technology Recovery (13) Data Backup How to choose a data back up process? • Although using differential and incremental backup processes is more complex, it requires less resources and time. • A differential backup takes more time in the backing up phase than an incremental backup, but it also takes less time to restore than an incremental backup. Why? • Do NOT mix differential and incremental backups! Full process + differential backup OR Full process + incremental backup • A backup strategy must take into account that failure can take place at any step of the process. • Test is essential! avoid developing false sense of security

  47. Supply and Technology Recovery (14) Data Backup • Several automated backup alternatives: • Disk-shadowing, Electronic vaulting, Remote journaling, Hierarchical storage management (HSM), Storage area network (SAN), automatic tape vaulting. • Manually backing up systems and data can be time consuming, error prone, and costly. • Disk-shadowing (data-mirroring) • A disk-shadowing process uses two physical disks, and the data is written to both at the same time for redundancy purposes. If one disk fails, the other is readily available. • Provides online backup storage, which can either reduce or replace the need for periodic offline manual backup operations. • Provides transparency to the user • Another benefit is that it can boost read operation performance. • Is an expensive solution

  48. Supply and Technology Recovery (15) Data Backup • Electronic vaulting: makes copies of files as they are modified and periodically transmits them to an offsite backup site. • The transmission is carried out in batches • Can choose to have all files that have been changed sent to the backup facility every hour, day, week, or month. • How to choose a transmission period? • Remote journaling: only includes moving the journal or transaction logs to the offsite facility, not the actual files. • These logs contain the deltas (changes) that have taken place to the individual files. • Takes place in real time • Is efficient for database recovery. Why?

  49. Supply and Technology Recovery (16) Data Backup hierarchical storage management (HSM) provides continuous online backup functionality. • It combines hard disk technology with the cheaper and slower optical or tape jukeboxes. • Dynamically manages the storage and recovery of files, which are copied to storage media devices that vary in speed and cost. • The faster media holds the data that is accessed more often • The seldom-used files are stored on the slower devices, or near-line devices • Happens in the background without the knowledge of the user or any need for user intervention.

  50. Supply and Technology Recovery (17) Data Backup

More Related