Principles of Incident Response and Disaster Recovery

Principles of Incident Response and Disaster Recovery Chapter 6 Contingency Strategies for Business Resumption Planning

Objectives • Know and understand the relationships between the overall use of contingency planning and the subordinate elements of incident response, business resumption, disaster recovery, and business continuity planning • Become familiar with the techniques used for data and application backup and recovery • Know the strategies employed for resumption of critical business processes at alternate and recovered sites Principles of Incident Response and Disaster Recovery

Introduction • Contingency planning addresses everything done by an organization to prepare for the unexpected • IR process focuses on detecting, evaluating, and reacting to an incident • Later phases focus on keeping the business functioning even if the physical plant is destroyed or unavailable • Business resumption (BR) plan: takes over when the IR process cannot contain and resolve an incident Principles of Incident Response and Disaster Recovery

Introduction (continued) • Business resumption (BR) plan major elements: • Disaster recovery (DR) plan: lists and describes the efforts to resume normal operations at the primary places of business • Business continuity (BC) plan: contains steps for implementing critical business functions using alternative mechanisms until normal operations can be resumed at the primary site or elsewhere • Primary site: location(s) at which the organization executes its functions • BR plan operates concurrently with DR plan when damage is major or long-term Principles of Incident Response and Disaster Recovery

Introduction (continued) Principles of Incident Response and Disaster Recovery

Introduction (continued) • Each component of CP (IRP, DRP, and BCP) comes into play at specific times in the life of an event • 5 key procedural mechanisms for restoring critical information and facilitating continuation of operations: • Delayed protection • Real-time protection • Server recovery • Application recovery • Site recovery Principles of Incident Response and Disaster Recovery

Data and Application Resumption • Backup methods must be used according to an established policy: • How often to back up • How long to retain the backups • What must be backed up • Data files and critical system files should be backed up daily, with one copy on-site and one copy off-site • Nonessential files should be backed up weekly • Full backups: keep at least one copy in a secure location off-site Principles of Incident Response and Disaster Recovery

Disk-to-Disk-to-Tape: Delayed Protection • Decreasing costs of storage media, especially hard drives and removable drives, precludes the time-consuming nature of tape backup • Storage area networks provide on-line backups • Lack of redundancy if both online and backup versions fail or are attacked dictates that tape backup is still required periodically • Disk-to-disk initial copies are efficient and can run simultaneously with other processes • Secondary disk-to-tape copies do not affect production processing Principles of Incident Response and Disaster Recovery

Disk-to-Disk-to-Tape: Delayed Protection (continued) • Types of backups: • Full backup • Differential backup • Incremental backup • Full backup: • Includes entire system, including applications, OS components, and data • Pro: provides a comprehensive snapshot • Con: requires large media; time consuming Principles of Incident Response and Disaster Recovery

Disk-to-Disk-to-Tape: Delayed Protection (continued) • Differential backup: • Includes all files that have changed or been added since the last full backup • Pro: faster and less storage space than full backup; only 1 backup file needed to restore from full backup • Con: gets larger each day and takes longer; one corrupt file loses everything • Incremental backup: • Includes only files that were modified that day • Pro: requires less space and time than the differential • Con: multiple incremental backups are required to restore from the last full backup Principles of Incident Response and Disaster Recovery

Disk-to-Disk-to-Tape: Delayed Protection (continued) • Fastest backup method: incremental backups • Fastest recovery time: differential backups • All on-site and off-site storage must be secured and must have a controlled environment (temperature and humidity) • Media should be clearly labeled and write-protected • Tape media types: • Digital audio tape (DAT) • Quarter-inch cartridge (QIC) • 8 mm tape • Digital linear tape (DLT) Principles of Incident Response and Disaster Recovery

Disk-to-Disk-to-Tape: Delayed Protection (continued) • Typical backup scheduling: • Daily: on-site incremental or differential backup • Weekly: off-site full backup • Tape media should be retired and replaced periodically • Popular strategies for selecting the files to back up: • Six-tape rotation • Grandfather-Father-Son • Towers of Hanoi Principles of Incident Response and Disaster Recovery

Disk-to-Disk-to-Tape: Delayed Protection (continued) • Six-tape rotation: • Uses a rotation of six sets of media • Five media sets per week are used with one extra labeled Friday2 • Friday full backup is taken off-site • Friday1 and Friday2 are rotated off-site every week • Provides roughly 2 weeks of recovery capability • Variation: keep a copy of each off-site Friday tape on-site for faster recovery Principles of Incident Response and Disaster Recovery

Disk-to-Disk-to-Tape: Delayed Protection (continued) • Grandfather-Father-Son (GFS): • Uses five media sets per week • Allows recovery for previous 3 weeks • First week uses first set, second week uses second set, third week uses third set • Following week starts with first set • Every 2nd or 3rd month, a group of media sets are taken out of the cycle for permanent storage and replaced with a new set Principles of Incident Response and Disaster Recovery

Disk-to-Disk-to-Tape: Delayed Protection (continued) • Towers of Hanoi: • More complex approach • Based on statistical principles to optimize media wear • 16-step strategy assumes that 5 media sets are used per week on a daily basis • First media set is used more often and must be monitored for wear Principles of Incident Response and Disaster Recovery

Disk-to-Disk-to-Tape: Delayed Protection (continued) Principles of Incident Response and Disaster Recovery

Redundancy-Based Backup and Recovery Using RAID • Redundant array of independent disks (RAID): uses online disk drives for redundancy • RAID spreads out data across multiple units, and offers recovery from hard drive failure • 9 established RAID configurations: RAID Level 0 through 10 • RAID Level 0 (disk striping without parity): • Not redundant • Spreads data across several drives in segments called stripes • Failure of one drive may make all data inaccessible Principles of Incident Response and Disaster Recovery

Redundancy-Based Backup and Recovery Using RAID (continued) • RAID Level 1 (disk mirroring): • Uses twin drives in a system • All data written to one drive is written to the other simultaneously • Is expensive and is an inefficient use of disk space • Vulnerable to a disk controller failure • Disk duplexing: mirroring with dual disk controllers • RAID Level 2: • Specialized form of disk striping with parity that is not widely used • Uses the Hamming code for parity • No commercial implementations of this Principles of Incident Response and Disaster Recovery

Redundancy-Based Backup and Recovery Using RAID (continued) • RAID Levels 3 and 4: • RAID 3 uses byte-level striping while RAID 4 uses block-level striping • Parity information is stored on a separate drive and provides error recovery • RAID Level 5: • Balances safety and redundancy against costs • Stripes data across multiple drives • Parity is interleaved with data segments on all drives • Hot-swappable: drives can be replaced without shutting down the system Principles of Incident Response and Disaster Recovery

Redundancy-Based Backup and Recovery Using RAID (continued) • RAID Level 6: • Combination of RAID 1 and RAID 5 • Performs two different parity computations or the same computation on overlapping subsets of data • RAID Level 7: • Proprietary variation on RAID 5 in which the array works as a single virtual drive • May be implemented via software running on RAID 5 hardware • RAID Level 10: • Combination of RAID 1 and RAID 0 Principles of Incident Response and Disaster Recovery

Redundancy-Based Backup and Recovery Using RAID (continued) Principles of Incident Response and Disaster Recovery

Database Backups • Databases require special considerations when planning backup and recovery procedures • Are special utilities required to perform database backups? • Can the database be backed up without interrupting its use? • Are there additional journal files or database system files that are required in order to use backup tapes or disk images? Principles of Incident Response and Disaster Recovery

Application Backups • Some applications use file systems and databases in unusual ways • Members of the application development and support teams should be involved in the planning process Principles of Incident Response and Disaster Recovery

Backup and Recovery Plans • The backup and recovery setting should be provided with complete recovery plans • Plans need to be developed, tested, and rehearsed periodically • Plans should include information about: • How and when backups are created and verified • Who is responsible for backup creation and verification • Storage and retention of backup media • Review cycle of the plan • Rehearsal of the plan Principles of Incident Response and Disaster Recovery

Real-Time Protection, Server Recovery, and Application Recovery • Entire servers can be mirrored to provide real-time protection and recovery in a strategy of hot, warm, and cold servers • Hot server: the server in production • Warm server: backup server that is running and may handle overflow work from hot server • Cold server: offline, test server • If hot server goes down, warm and cold servers are promoted while the hot server is being repaired • Bare metal recovery: technologies designed to replace operating systems and services when they fail Principles of Incident Response and Disaster Recovery

Real-Time Protection, Server Recovery, and Application Recovery (continued) • Application recovery (or clustering plusreplication): • Applications are installed on multiple servers • If one fails, the secondary systems take over the role • Electronic vaulting: • Bulk transfer of data in batches to an off-site facility • Receiving server archives the data • Can be more expensive than tape backup and slower than data mirroring • Data must be encrypted for transfer over public infrastructure Principles of Incident Response and Disaster Recovery

Real-Time Protection, Server Recovery, and Application Recovery (continued) Principles of Incident Response and Disaster Recovery

Real-Time Protection, Server Recovery, and Application Recovery (continued) • Remote journaling (RJ): • Transfer of live transactions to an off-site facility • Only transactions are transferred in near real-time to a remote location • Facilitates the recovery of key transactions in near real-time Principles of Incident Response and Disaster Recovery

Real-Time Protection, Server Recovery, and Application Recovery (continued) • Database shadowing (or databank shadowing): • Storage of duplicate online transaction data and duplication of databases at a remote site on a redundant server • Both databases are updated, but only the primary responds to the user • Combines electronic vaulting with remote journaling • Used when immediate data recovery is a priority • Also used for data warehousing, data mining, batch reporting, complex SQL queries, local access at the shadow site, and load balancing Principles of Incident Response and Disaster Recovery

Real-Time Protection, Server Recovery, and Application Recovery (continued) • Network-attached storage (NAS): • Usually a single device or server attached to a network to provide online storage • Not well suited for real-time applications due to latency • Storage area networks (SANs): • Online storage devices connected by fiber-channel direct connections between the servers and the additional storage Principles of Incident Response and Disaster Recovery

Site Resumption Strategies • If the primary business site is not available, alternative processing capability may be needed • CPMT can choose from several strategies for business resumption planning • Exclusive control options: • Hot sites • Warm sites • Cold sites • Shared-use options: • Timeshare • Service bureaus • Mutual agreements Principles of Incident Response and Disaster Recovery

Exclusive Site Resumption Strategies Principles of Incident Response and Disaster Recovery

Exclusive Site Resumption Strategies (continued) • Hot site: • Fully configured computer facility • Duplicates computing resources, peripherals, phone systems, applications, and workstations • Can be 24/7 if desired • Can be a mirrored site that is identical to the primary site Principles of Incident Response and Disaster Recovery

Exclusive Site Resumption Strategies (continued) Principles of Incident Response and Disaster Recovery

Exclusive Site Resumption Strategies (continued) • Warm site: • Provides some of the same services and options as a hot site • May include computing equipment and peripherals but not workstations • Has access to data backups or off-site storage • Lower cost than a hot site, but takes more time to be fully functional Principles of Incident Response and Disaster Recovery

Exclusive Site Resumption Strategies (continued) • Cold site: • Provides only rudimentary services and facilities • No computer hardware or software are provided • Communications services must be installed when the site is occupied • Often no quick recovery or data duplication functions on site • Primary advantage is cost Principles of Incident Response and Disaster Recovery

Exclusive Site Resumption Strategies (continued) • Other options: • Rolling mobile site configured in the payload area of a tractor-trailer • Rental storage area with duplicate or second generation equipment • Mobile temporary offices Principles of Incident Response and Disaster Recovery

Shared Site Resumption Strategies • Timeshare: • Leased site shared with other organizations • Possibility that more than one organization might need the facility simultaneously • Service bureaus: • Service agency that provides physical facilities in the event of a disaster • May provide off-site data storage Principles of Incident Response and Disaster Recovery

Shared Site Resumption Strategies (continued) • Mutual agreement: • Contract between two organizations to provide mutual assistance in the event of a disaster • Each organization is obligated to provide facilities, resources, and services to the other • Good for divisions of the same parent company, between business partners, or when both parties have similar capabilities and capacities • A memorandum of agreement (MOA) should be drawn up with specific details Principles of Incident Response and Disaster Recovery

Principles of Incident Response and Disaster Recovery