1 / 43

are you sure you can recover in any circumstance

Who Am I ?. Tim BolesDBA with Lockheed Martin IS

Audrey
Télécharger la présentation

are you sure you can recover in any circumstance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Are You Sure You Can Recover In Any Circumstance?

    2. Hello, I am Tim Boles. Senior Staff DBA with Lockheed Martin in the Information Systems and Global Services Civil division. I have been working for over 12 years as a Database Administrator. I have worked the entire gamut of system types starting in a small state governmental office that had a database a few gigabytes in size. I was the only full-time IT person and did System Administration, Database Administration, Forms and Reports Development. More recently I worked in large global pharmaceutical company managing and monitoring 100s of databases with a team of Database Administrators.Hello, I am Tim Boles. Senior Staff DBA with Lockheed Martin in the Information Systems and Global Services Civil division. I have been working for over 12 years as a Database Administrator. I have worked the entire gamut of system types starting in a small state governmental office that had a database a few gigabytes in size. I was the only full-time IT person and did System Administration, Database Administration, Forms and Reports Development. More recently I worked in large global pharmaceutical company managing and monitoring 100s of databases with a team of Database Administrators.

    3. Topics Availability Is Not Recovery Causes of Data Loss And Its Cost Basics of Backups Building A Backup Policy How To Be Sure You Can Recover This presentation does not tout any particular method, software, tools or hardware for backing up or restoring your database. If you came to this presentation wanting a silver bullet method, I am sorry to disappoint you. There are no silver bullets when it comes to system backups and restores. This session does not cover every situation you might encounter that you need to recovery your database in. There is no way we can do that. Everything depends on your unique system and circumstance. The intention is to provide you a guide and ideas on many situations you might encounter and a template on how to verify that you can recover from those situations. It is a spring board for you to investigate your own system and solutions. This presentation does not tout any particular method, software, tools or hardware for backing up or restoring your database. If you came to this presentation wanting a silver bullet method, I am sorry to disappoint you. There are no silver bullets when it comes to system backups and restores. This session does not cover every situation you might encounter that you need to recovery your database in. There is no way we can do that. Everything depends on your unique system and circumstance. The intention is to provide you a guide and ideas on many situations you might encounter and a template on how to verify that you can recover from those situations. It is a spring board for you to investigate your own system and solutions.

    4. Availability is not recovery Some people may disagree with me but availability is not recovery. Although sometimes availability does involve recovery, for example when RAC instance fails and any transactions on that instance have to be recovered. I dont know of a case of recovery that involves availability. Both are important and need to be evaluated for every particular system and circumstance. Some people may disagree with me but availability is not recovery. Although sometimes availability does involve recovery, for example when RAC instance fails and any transactions on that instance have to be recovered. I dont know of a case of recovery that involves availability. Both are important and need to be evaluated for every particular system and circumstance.

    5. The New Thing Cloud Computing Virtual RAC Standby Database 99.999 Uptime The new catch phrase is Cloud Computing which often can be thought of as Systems As A Service. You dont know where your information resides you just want to be able to get to it. Going hand in hand with that are virtual servers, virtual storage and virtually everything else. They can move your application from one virtual server to another very quickly. You can use Real Application Clusters to manage workloads and if you have one server go down another server can pick up the work. You have the ability to create standby databases that will allow you to do a rolling upgrade to your system with minimal if any downtime. Physical and logical standbys allow you to quickly move your services in real-time between primary and standby sites and if the programmers have designed the system appropriately there will be no interruption to the users. All this can be combined to give companies the ability to achieve the golden 5 nines or better uptime.The new catch phrase is Cloud Computing which often can be thought of as Systems As A Service. You dont know where your information resides you just want to be able to get to it. Going hand in hand with that are virtual servers, virtual storage and virtually everything else. They can move your application from one virtual server to another very quickly. You can use Real Application Clusters to manage workloads and if you have one server go down another server can pick up the work. You have the ability to create standby databases that will allow you to do a rolling upgrade to your system with minimal if any downtime. Physical and logical standbys allow you to quickly move your services in real-time between primary and standby sites and if the programmers have designed the system appropriately there will be no interruption to the users. All this can be combined to give companies the ability to achieve the golden 5 nines or better uptime.

    6. When Does Availability Not Help? RAC Lose the underlying data files. ESAN Lose Power and drives dont come up. Virtual Disgruntled Employee Drops Schema COOP Site Software Bug / Virus corrupts data DISK MIRROR Data Corruption Can anyone give me scenarios of when availability will not help you? (Get Audience Feedback) Availability is a great concern for many companies. However, availability does not address recoverability. I can have a dozen servers in a Real Application Cluster (RAC) environment with a guaranteed uptime of 5 nines, but that does not help me if an administrator deletes a shared file. Having an Enterprise Storage Network is great but if my SAN suddenly goes down and multiple drives do not come back up then that can spell trouble. I know from experience that this can happen even in an environment with redundant power supplies. I can have virtual versions of my servers that are ready to be deployed at a moments notice but that does not do me any good if I have a disgruntled employee that decides they are going to drop all the application schemas before they walk out the door. I can have a Continuity of Operations site that is updated in real-time but that does not protect me if I have data corruption from a software bug or virus. You have to remember that Availability and Recoverability are different, but should both be evaluated within the requirements of your business and systems.Can anyone give me scenarios of when availability will not help you? (Get Audience Feedback) Availability is a great concern for many companies. However, availability does not address recoverability. I can have a dozen servers in a Real Application Cluster (RAC) environment with a guaranteed uptime of 5 nines, but that does not help me if an administrator deletes a shared file. Having an Enterprise Storage Network is great but if my SAN suddenly goes down and multiple drives do not come back up then that can spell trouble. I know from experience that this can happen even in an environment with redundant power supplies. I can have virtual versions of my servers that are ready to be deployed at a moments notice but that does not do me any good if I have a disgruntled employee that decides they are going to drop all the application schemas before they walk out the door. I can have a Continuity of Operations site that is updated in real-time but that does not protect me if I have data corruption from a software bug or virus. You have to remember that Availability and Recoverability are different, but should both be evaluated within the requirements of your business and systems.

    7. Causes of Data Loss and its cost Studies and industry experience have shown that there are 5 or 6 general causes for data loss. What do you think were some of these causes? (Get audience feedback see if we can get the top 3.) Studies and industry experience have shown that there are 5 or 6 general causes for data loss. What do you think were some of these causes? (Get audience feedback see if we can get the top 3.)

    8. The Experts Say In 2003, David Smith, PhD of the Pepperdine University, wrote an article for the Graziadio Business Report. In this report, Dr. Smith highlighted six categories of data loss and the percentage of data loss that occurs from each. Ontrack Data Recovery has been studying data loss and been performing successful recoveries for over 20 years. They have found that most data loss occurs for 5 of the same reasons. These numbers are just thatnumbers. They give you an idea of areas that you need to keep aware of and watch for but nothing particular for your environment. What I hope to do in the next few slides is to give you a few ideas that I hope will startup your creative juices to determine some scenarios that might affect your environment. In 2003, David Smith, PhD of the Pepperdine University, wrote an article for the Graziadio Business Report. In this report, Dr. Smith highlighted six categories of data loss and the percentage of data loss that occurs from each. Ontrack Data Recovery has been studying data loss and been performing successful recoveries for over 20 years. They have found that most data loss occurs for 5 of the same reasons. These numbers are just thatnumbers. They give you an idea of areas that you need to keep aware of and watch for but nothing particular for your environment. What I hope to do in the next few slides is to give you a few ideas that I hope will startup your creative juices to determine some scenarios that might affect your environment.

    9. Hardware Failure Recovery Server Failure(s) Drive Failure(s) ESAN Disaster Recovery Site Hardware that affects your system depends entirely upon your system setup. You might work on a system that if one drive fails you could lose everything. Someone else may have an ESAN (Enterprise Storage Area Network) where all the files are stripped and mirrored across hundreds of drives for the entire company. A third person may have a Disaster Recovery site 200 miles away that is constantly being updated through the network. A catastrophic hardware failure for would occur for the third person if something occurs that touches both the primary and DR site. No matter your setup the question is have you considered the What Ifs?Hardware that affects your system depends entirely upon your system setup. You might work on a system that if one drive fails you could lose everything. Someone else may have an ESAN (Enterprise Storage Area Network) where all the files are stripped and mirrored across hundreds of drives for the entire company. A third person may have a Disaster Recovery site 200 miles away that is constantly being updated through the network. A catastrophic hardware failure for would occur for the third person if something occurs that touches both the primary and DR site. No matter your setup the question is have you considered the What Ifs?

    10. What If? Server Disk Failure with Oracle Software Binaries SAN with redo logs fail Mirrored Master Destruction with Administrative Files listener.ora tnsnames.ora password dataguard configuration Enterprise Manager configuration files RMAN repository failure Many people will consider that their system safe and not much to worry about with their storage solution. They dont really try to think outside the box. Here are a few scenarios that I hope will get you thinking, they may not apply to your system but maybe lead you to think of a scenario that does apply. A power surge just took out the server and the hard drives that house the binaries of your database server. What are you going to do? If you think you are installing it from the disks again are you sure you still have those disks? Do you remember the last patch you installed? Are there any one-off patches you installed? What about if the cooling system fails for the server room fails over night and the temperature reaches 130+ in the server room and causes the SAN system to crash. When the SAN system is rebooted the drives that house the redo logs fail to come up. What are you going to do? The mirrored drive for the administrative files fails. The System administrator removed the Master drive by accident, which accidently falls to the floor and breaks. Is the drive backed up or were you depending on the mirroring to keep things safe? What you are going to do? What happens if you lose the server with the RMAN repository? Can you restore it? Can you restore you databases without it?Many people will consider that their system safe and not much to worry about with their storage solution. They dont really try to think outside the box. Here are a few scenarios that I hope will get you thinking, they may not apply to your system but maybe lead you to think of a scenario that does apply. A power surge just took out the server and the hard drives that house the binaries of your database server. What are you going to do? If you think you are installing it from the disks again are you sure you still have those disks? Do you remember the last patch you installed? Are there any one-off patches you installed? What about if the cooling system fails for the server room fails over night and the temperature reaches 130+ in the server room and causes the SAN system to crash. When the SAN system is rebooted the drives that house the redo logs fail to come up. What are you going to do? The mirrored drive for the administrative files fails. The System administrator removed the Master drive by accident, which accidently falls to the floor and breaks. Is the drive backed up or were you depending on the mirroring to keep things safe? What you are going to do? What happens if you lose the server with the RMAN repository? Can you restore it? Can you restore you databases without it?

    11. Human Error OS Commands Bad DDL Bad DML Compounded by Additional Mistakes Features Only Help When Enabled Whats Your Plan? The second most prevalent cause of data loss is human error. This makes up approximately one quarter of the reason for data loss. It does not matter how careful you are, how meticulous, or change management conscious, humans are ultimately that, human, and make mistakes. I have been on systems that have a very stringent change control process. Nothing would be allowed on the production system unless it had been peer reviewed and tested on the non-production systems. I worked on one system that a shell script had been written that would transverse the directory tree and delete files that fell within a particular category. It worked fine on the test/development system, but low and behold when it was executed on the production server it deleted several database files. Now you might say to yourself, Any database administrator worth their salt should be able to quickly restore those database files and have the system up and running in a few minutes. Really, you think so. What if that particular data file belonged to a Read-Only tablespace. Would you know when your last backup of the Read-Only tablespace occurred? Would you know which backup you would need to have to be able to restore that data file? Would you have the backup of those data files needed on hand? Humans make mistakes, and often they are compounded by additional mistakes that can quickly pile up. I was working on a system that had been upgraded to 10g. The flashback area had been setup and sized appropriately, backups scheduled to take place, and archive log space monitored. In order to appease security, all database users that were not needed, such as the sample schemas and Oracle features that were not used were to be locked and eventually deleted. The schemas were identified and locked the next step was to delete them after an appropriate time. During the deletion process, one schema was inadvertently deleted that was needed by one of the applications. One of the DBAs said, Hey, we have flashback we can recover that schema easily. Well, it would have been easy if someone had actually turned on flashback not just set the system up for it. We were fortunate that this was the only application on that database and it was the test system. Restoring it to a point in time was good enough for our development team. Could you imagine trying to figure this one out on the fly on a production system? So, could you recover from a dropped schema? How quickly? Would you have to restore the entire database to a point in time to be able to do it? What is your plan? Have you tested your plan? The second most prevalent cause of data loss is human error. This makes up approximately one quarter of the reason for data loss. It does not matter how careful you are, how meticulous, or change management conscious, humans are ultimately that, human, and make mistakes. I have been on systems that have a very stringent change control process. Nothing would be allowed on the production system unless it had been peer reviewed and tested on the non-production systems. I worked on one system that a shell script had been written that would transverse the directory tree and delete files that fell within a particular category. It worked fine on the test/development system, but low and behold when it was executed on the production server it deleted several database files. Now you might say to yourself, Any database administrator worth their salt should be able to quickly restore those database files and have the system up and running in a few minutes. Really, you think so. What if that particular data file belonged to a Read-Only tablespace. Would you know when your last backup of the Read-Only tablespace occurred? Would you know which backup you would need to have to be able to restore that data file? Would you have the backup of those data files needed on hand? Humans make mistakes, and often they are compounded by additional mistakes that can quickly pile up. I was working on a system that had been upgraded to 10g. The flashback area had been setup and sized appropriately, backups scheduled to take place, and archive log space monitored. In order to appease security, all database users that were not needed, such as the sample schemas and Oracle features that were not used were to be locked and eventually deleted. The schemas were identified and locked the next step was to delete them after an appropriate time. During the deletion process, one schema was inadvertently deleted that was needed by one of the applications. One of the DBAs said, Hey, we have flashback we can recover that schema easily. Well, it would have been easy if someone had actually turned on flashback not just set the system up for it. We were fortunate that this was the only application on that database and it was the test system. Restoring it to a point in time was good enough for our development team. Could you imagine trying to figure this one out on the fly on a production system? So, could you recover from a dropped schema? How quickly? Would you have to restore the entire database to a point in time to be able to do it? What is your plan? Have you tested your plan?

    12. Software Corruption Customized COTS / In-House COTS Leopard OS Oracle BUG The last area we are going to discuss in any detail is software corruption, which takes up about 13% causes. This is a very real issue in companies that have customized COTS or in-house built software. I have worked on systems that would have periodic builds. Every 3 to 6 months a new version of the software would be released to place on the production system. Even on systems where there is regression testing, independent verification and validation of code and performance testing, it is possible for a bug to get into the code. More often than not, these bugs do not cause data issues, but rather will cause the software to produce error messages or a particular function just will not work. However, there are those rare occasions that data changes or deletion may occur. This can even occur with software provided by vendors. Take for instance the Leopard operating system. In 2007, a blog entry on tomkarpik.com refers to a bug in the Leopard operating system that could lead to horrendous data loss if a destination volume disappears while a move operation is in action. Let us say that your backup was to basically move your data from the production volume to a larger, slower disk system using the Leopard operating system commands. Well, I sure hope that the destination volume does not disappear. Believe it or not it actually can occur with Oracle. In my first days as a Database Administrator I actually walked onto a new position on a system that had just recently ran into an Oracle bug that caused block corruption on the system. This was a critical high profile system and it was down for nearly 3 days as they worked with Oracle to get the database open and perform recovery with as little data loss as possible. They had a standby database in place at the time but since there was no delay in applying changes to that system it had the same block corruption. I know that Oracle has come a long way in 10 years, but there are always bugs. The last area we are going to discuss in any detail is software corruption, which takes up about 13% causes. This is a very real issue in companies that have customized COTS or in-house built software. I have worked on systems that would have periodic builds. Every 3 to 6 months a new version of the software would be released to place on the production system. Even on systems where there is regression testing, independent verification and validation of code and performance testing, it is possible for a bug to get into the code. More often than not, these bugs do not cause data issues, but rather will cause the software to produce error messages or a particular function just will not work. However, there are those rare occasions that data changes or deletion may occur. This can even occur with software provided by vendors. Take for instance the Leopard operating system. In 2007, a blog entry on tomkarpik.com refers to a bug in the Leopard operating system that could lead to horrendous data loss if a destination volume disappears while a move operation is in action. Let us say that your backup was to basically move your data from the production volume to a larger, slower disk system using the Leopard operating system commands. Well, I sure hope that the destination volume does not disappear. Believe it or not it actually can occur with Oracle. In my first days as a Database Administrator I actually walked onto a new position on a system that had just recently ran into an Oracle bug that caused block corruption on the system. This was a critical high profile system and it was down for nearly 3 days as they worked with Oracle to get the database open and perform recovery with as little data loss as possible. They had a standby database in place at the time but since there was no delay in applying changes to that system it had the same block corruption. I know that Oracle has come a long way in 10 years, but there are always bugs.

    13. How Would You Recover? DROP SCHEMA CASCADE Oracle software deletion Wrong data deletion detected immediately detected several hours later Batch Job corruption Software Upgrade Block Corruption Detected in Backup Have you considered what a user might accidently do to your system? Have you thought about how you would fix the problem? . The key here is that the method to recovery from hardware failure is often different than how you need to recover from human error. One of the big differences between a hardware failure and that of software corruption or user error is that the latter two can target specific data and might go unnoticed for a given length of time. What can you do to protect against that kind of data loss? You dont need to restore a data file when an user deletes a subset of data from the system, or when a software corruption causes the wrong data to be deleted from a particular schema. The error might occur in such a manner that you can not do a point-in-time recovery because other data that has been entered into the system needs to remain. Consider some of the following scenarios; I hope they inspire your thoughts on possible things that could happen on your system. Administrator has access to both test and production accounts. They are performing updates to the test system for the next build.but wait they are in the wrong window when they type, DROP SCHEMA CASCADE. How can you recover the schema? A Database Administrator logged in as the Oracle software owner trying to clean out a failed installation of software types .find ./* -type f exec rm {} \; at the wrong directory level. Do you know when you last backup took place? How quickly can you get the Oracle software back in place? Are you sure you have not applied any patches since that backup? The business users identified incorrect data in the system and request the database administrators to remove it out of application tables. Later in the work day after the deletion, the application begins to freezes up and it is determined that the data identified was needed after all. Can you rollbackup that deletion? Let say you upgrade the COTs software to the latest edition which makes modifications to the schema objects in your database. The users immediately notice that there are problems with the new software and want to roll back the changes to the system. Can you do it? Do you know what to do if during verification of your system backups the monitoring software highlights block corruption in your files?Have you considered what a user might accidently do to your system? Have you thought about how you would fix the problem? . The key here is that the method to recovery from hardware failure is often different than how you need to recover from human error. One of the big differences between a hardware failure and that of software corruption or user error is that the latter two can target specific data and might go unnoticed for a given length of time. What can you do to protect against that kind of data loss? You dont need to restore a data file when an user deletes a subset of data from the system, or when a software corruption causes the wrong data to be deleted from a particular schema. The error might occur in such a manner that you can not do a point-in-time recovery because other data that has been entered into the system needs to remain. Consider some of the following scenarios; I hope they inspire your thoughts on possible things that could happen on your system. Administrator has access to both test and production accounts. They are performing updates to the test system for the next build.but wait they are in the wrong window when they type, DROP SCHEMA CASCADE. How can you recover the schema? A Database Administrator logged in as the Oracle software owner trying to clean out a failed installation of software types .find ./* -type f exec rm {} \; at the wrong directory level. Do you know when you last backup took place? How quickly can you get the Oracle software back in place? Are you sure you have not applied any patches since that backup? The business users identified incorrect data in the system and request the database administrators to remove it out of application tables. Later in the work day after the deletion, the application begins to freezes up and it is determined that the data identified was needed after all. Can you rollbackup that deletion? Let say you upgrade the COTs software to the latest edition which makes modifications to the schema objects in your database. The users immediately notice that there are problems with the new software and want to roll back the changes to the system. Can you do it? Do you know what to do if during verification of your system backups the monitoring software highlights block corruption in your files?

    14. Counting the Cost What if someone told you that on December 2nd of next year your system is going to have a hardware failure? What would you do? Would you take preventive action? Would you shutdown the system for the day? The fact is that you generally will not know when the data loss is going to hit you. The reasons for data loss that we have discussed are unpredictable and for the most part uncontrollable. There are plenty of stories out there on the internet about really profitable companies that had to close their doors because they lost their server or database and could not recover the data. You can also find articles on companies that are in trouble with one regulating agency or another because of missing or corrupted data. The cost of losing or not being able to access your data can only be calculated by those in your company. However, there is a study published by Meta Group of Stamford, CT in October of 2000: IT Performance Engineering & Measurement Strategies: Quantifying Performance Loss. They show that industries such as Energy, telecommunications, and pharmaceuticals can lose millions of dollars each hour they are unable to get to their data. These figures highlight the effects of data loss in a production system. Have you considered the effects in non-production systems. Even when you lose the ability to access your data in a test or development environment you are losing money. Think about the cost caused by a non-production system loss. You have to pay for the time the staff used to restore, retrieve, or recreate the system or data. Depending on the length of the outage a release of new features or products might be delayed costing the company possible revenue.What if someone told you that on December 2nd of next year your system is going to have a hardware failure? What would you do? Would you take preventive action? Would you shutdown the system for the day? The fact is that you generally will not know when the data loss is going to hit you. The reasons for data loss that we have discussed are unpredictable and for the most part uncontrollable. There are plenty of stories out there on the internet about really profitable companies that had to close their doors because they lost their server or database and could not recover the data. You can also find articles on companies that are in trouble with one regulating agency or another because of missing or corrupted data. The cost of losing or not being able to access your data can only be calculated by those in your company. However, there is a study published by Meta Group of Stamford, CT in October of 2000: IT Performance Engineering & Measurement Strategies: Quantifying Performance Loss. They show that industries such as Energy, telecommunications, and pharmaceuticals can lose millions of dollars each hour they are unable to get to their data. These figures highlight the effects of data loss in a production system. Have you considered the effects in non-production systems. Even when you lose the ability to access your data in a test or development environment you are losing money. Think about the cost caused by a non-production system loss. You have to pay for the time the staff used to restore, retrieve, or recreate the system or data. Depending on the length of the outage a release of new features or products might be delayed costing the company possible revenue.

    15. Basics of a Backup So are you sure your system has even a basic backup completed?So are you sure your system has even a basic backup completed?

    16. What The? I try to go out to different Oracle forums every day. I enjoy trying to help people solve problems and learn so much from others. You come across many circumstances that you just sit there and thinkWow! Why did they not think about that! However, I look back on my career and then say, Wow! There are many things I have missed. Often it is the small things that can get you. When I originally started backing up databases my main concern was making sure that the data was safe. That is a great first move, but what about all the rest? Recently I saw a posting on one of these Oracle forums from a very frantic person who wanted to get his database up and running again. The data files for the database were safe and sitting on the SAN, however his problem was that the server that the Oracle process generally resides on crashed. He had another server that he wanted to use as the database server and wanted the RMAN commands to restore the database to that server. You hate to be the bearer of bad news and tell the poster, Sorry RMAN can not restore your binary copies of the Oracle software. You might be thinking, Wow, what a newbie mistake! Why, didnt he read the manual? However, it gets you thinking. How many people who are familiar with Oracle and RMAN are making similar mistakes? How many people know what RMAN does and does not backup?.I try to go out to different Oracle forums every day. I enjoy trying to help people solve problems and learn so much from others. You come across many circumstances that you just sit there and thinkWow! Why did they not think about that! However, I look back on my career and then say, Wow! There are many things I have missed. Often it is the small things that can get you. When I originally started backing up databases my main concern was making sure that the data was safe. That is a great first move, but what about all the rest? Recently I saw a posting on one of these Oracle forums from a very frantic person who wanted to get his database up and running again. The data files for the database were safe and sitting on the SAN, however his problem was that the server that the Oracle process generally resides on crashed. He had another server that he wanted to use as the database server and wanted the RMAN commands to restore the database to that server. You hate to be the bearer of bad news and tell the poster, Sorry RMAN can not restore your binary copies of the Oracle software. You might be thinking, Wow, what a newbie mistake! Why, didnt he read the manual? However, it gets you thinking. How many people who are familiar with Oracle and RMAN are making similar mistakes? How many people know what RMAN does and does not backup?.

    17. RMAN Does Not Back Up Oracle Software Home (binaries) BFILES Password Files pfiles (spfiles are covered with newer versions) tnsnames.ora listener.ora sqlnet.ora /etc/oratab scripts (shell, sql) Now some of these answers can depend on the database version you work with but can you name files that RMAN does not backup that we might consider to be important? (Get Audience Responses) Most of these files can be configured relatively quickly but does your backup policy cover them so that you can get them restored quickly and not have to worry about trying to set them up again?Now some of these answers can depend on the database version you work with but can you name files that RMAN does not backup that we might consider to be important? (Get Audience Responses) Most of these files can be configured relatively quickly but does your backup policy cover them so that you can get them restored quickly and not have to worry about trying to set them up again?

    18. The Basics Backup and Recovery Plan Physical Backups Data Storage data files, contol files, Archived Redo Support Files Binaries, Initialization Files, Scripts, .ora, password Logical Backups (Exports) Logical data structure such as tables, tablespaces, objects, users, data within tables Since it is impossible for me to know the level of knowledge and expertise of everyone we will start with the basics. A great overview of Backup and Recovery is available on the Oracle Technology Network. Backup and Recovery Plan Basically refers to the procedures and strategies chosen based on the data retention policies. The procedures and strategies are for the protection against data loss and reconstructing the database after any kind of data loss. Physical backup Files involved with data storage. For the Oracle database it includes things such as data files (store the data), archived redo logs (data changes that occurred), and control files (structure). Supporting files such as the Oracle binaries, initialization files, scripts, tnsnames.ora, listener.ora and password files. Logical backups Logical backups contain information about the logical data structure and data within a database. These are snapshots and you can not fully recover your database but some shops find this adequate for their needs. Does anyone in the audience use exports as part of their backup strategy? Why? (Get audience response) Since it is impossible for me to know the level of knowledge and expertise of everyone we will start with the basics. A great overview of Backup and Recovery is available on the Oracle Technology Network. Backup and Recovery Plan Basically refers to the procedures and strategies chosen based on the data retention policies. The procedures and strategies are for the protection against data loss and reconstructing the database after any kind of data loss. Physical backup Files involved with data storage. For the Oracle database it includes things such as data files (store the data), archived redo logs (data changes that occurred), and control files (structure). Supporting files such as the Oracle binaries, initialization files, scripts, tnsnames.ora, listener.ora and password files. Logical backups Logical backups contain information about the logical data structure and data within a database. These are snapshots and you can not fully recover your database but some shops find this adequate for their needs. Does anyone in the audience use exports as part of their backup strategy? Why? (Get audience response)

    19. Building a Backup policy Many shops just backup the database. There is little concern beyond that they have a backup. However, there are many industries and types of database in which you have to be concerned with uncommon requirements such as how long you must keep data, how personal identification information is handled, if the data is encrypted or not, and the list can go on. So what do you do to record this information for later reference? Well you build a backup policy.Many shops just backup the database. There is little concern beyond that they have a backup. However, there are many industries and types of database in which you have to be concerned with uncommon requirements such as how long you must keep data, how personal identification information is handled, if the data is encrypted or not, and the list can go on. So what do you do to record this information for later reference? Well you build a backup policy.

    20. Where to Start? Stake Holders Who Cares About The Data? Users Auditor, Lawyer, Regulator Security System Administrators Who Touches The Data? System / Backup Administrators Database Administrators You have to start off with understanding the requirements for storing the data backups. So who do you go to figure out the requirements for storing your data backups? (Get audience response) You have to hit the appropriate people with the right combination of knowledge. The people you are talking to are called stake holders. Basically a stakeholder is anyone that has some interest in the storage, management and the ability to recover the data of a system. The stakeholders for one system may or may not be the same as for another system. Even if they work for the same company or even within the same department. There is not a generic list of stakeholders for all projects. I would say that if your system has users, then they would be a stakeholder. If there is some regulatory body that has interest in your data then you need to check with them, or at least an expert in your company such as an auditor, lawyer, or business manager that knows the regulations. You need to include the personnel whom will be handling the actual media that the backups will be stored on. You better include the person whom has control over the finances of the system, because what good is a policy if you cannot afford to put it into place. Depending on if it is an in-house application or a COTS system, you might want to include the developers who know how the data gets manipulated and if there should be special consideration about different parts of the data. You have to start off with understanding the requirements for storing the data backups. So who do you go to figure out the requirements for storing your data backups? (Get audience response) You have to hit the appropriate people with the right combination of knowledge. The people you are talking to are called stake holders. Basically a stakeholder is anyone that has some interest in the storage, management and the ability to recover the data of a system. The stakeholders for one system may or may not be the same as for another system. Even if they work for the same company or even within the same department. There is not a generic list of stakeholders for all projects. I would say that if your system has users, then they would be a stakeholder. If there is some regulatory body that has interest in your data then you need to check with them, or at least an expert in your company such as an auditor, lawyer, or business manager that knows the regulations. You need to include the personnel whom will be handling the actual media that the backups will be stored on. You better include the person whom has control over the finances of the system, because what good is a policy if you cannot afford to put it into place. Depending on if it is an in-house application or a COTS system, you might want to include the developers who know how the data gets manipulated and if there should be special consideration about different parts of the data.

    21. Basic Concerns Size of Database (growth potential) Backup Window Space Available for Backup Storage Media Used Tools Available Data Retention Times Acceptable Mean Time To Recovery (MTTR) Everyones backup and recovery plan will look different but all plans should cover some basic concerns. What do you think are some of those concerns? (Get audience feedback) What is the current database size and its growth potential? The size combined with the time frame allowed for backup and restore operations can affect the types of technology you use to backup the database. Taking a backup of a multi-terabyte database with short backup and recovery time frames is a lot different than backing up a 500 GB database. Along with the size you need to know if there is a prediction on how fast the database will grow. You dont want the backup system you create today not meet your needs in a couple of years. When is the best time to backup the database? This can be affected by the load not only on the database, but can be affected by other work being completed on the server, the availability of backup media, network load or other system factors. How much space is available to hold backups? This might dictate if you can do a full backup, use data file copies, or have to choose a different compression algorithm. What type of media will this system be backed up to? Do you want to backup to disk first and then to tape so you can have two copies of the backup or do you need to go to tape immediately? What tools are currently available for performing the backup? Are you locked into using system commands or can you use RMAN? Is RMAN integrated with your media manager? Are there special data retention times for any of the system? Do you have to keep data for a particular time period? Will you be able to recover the data 10 years from now?Everyones backup and recovery plan will look different but all plans should cover some basic concerns. What do you think are some of those concerns? (Get audience feedback) What is the current database size and its growth potential? The size combined with the time frame allowed for backup and restore operations can affect the types of technology you use to backup the database. Taking a backup of a multi-terabyte database with short backup and recovery time frames is a lot different than backing up a 500 GB database. Along with the size you need to know if there is a prediction on how fast the database will grow. You dont want the backup system you create today not meet your needs in a couple of years. When is the best time to backup the database? This can be affected by the load not only on the database, but can be affected by other work being completed on the server, the availability of backup media, network load or other system factors. How much space is available to hold backups? This might dictate if you can do a full backup, use data file copies, or have to choose a different compression algorithm. What type of media will this system be backed up to? Do you want to backup to disk first and then to tape so you can have two copies of the backup or do you need to go to tape immediately? What tools are currently available for performing the backup? Are you locked into using system commands or can you use RMAN? Is RMAN integrated with your media manager? Are there special data retention times for any of the system? Do you have to keep data for a particular time period? Will you be able to recover the data 10 years from now?

    22. Beyond the Basics Encryption Storage of Encryption Keys Access to Encryption Keys Design of Database Read-Only Tablespaces Tablespace Partitions Compression Algorithms There are some things that I dont really feel are common between systems but you might need to think about them for yours.There are some things that I dont really feel are common between systems but you might need to think about them for yours.

    23. Is Your Backup Good? Backup Log Physical Check Logical Check Only good if you can recover You can check your backup log and make sure there are no errors. You can do a physical check of the files and make sure they exist. You can do a report via your backup tool like RMAN to make sure all the files needed are available and not corrupt. But no matter what you do your backup is only good if you can recover using it.You can check your backup log and make sure there are no errors. You can do a physical check of the files and make sure they exist. You can do a report via your backup tool like RMAN to make sure all the files needed are available and not corrupt. But no matter what you do your backup is only good if you can recover using it.

    24. How To Be Sure You Can Recover Once you have implemented your backups according to your Backup Policies how do you know that your backup is good? How do you know that you have everything you need to be able to recover the database? (Get Response from Audience)Once you have implemented your backups according to your Backup Policies how do you know that your backup is good? How do you know that you have everything you need to be able to recover the database? (Get Response from Audience)

    25. What Is Your Source? Memory / Experience Oracle Documentation / Books Internet Search Engines Co-worker Monitoring Tools (i.e. Oracle Enterprise Manager) Customized Documentation Imagine this situation you get a call at 2:00 am. The system is down, the server was corrupted, we have the SAN but nothing else, can you get the database running? The company could lose hundreds, thousands or even millions of dollars if you dont get the system restored quickly. What sources of information are you relying on to help you get your system backup as quickly as possible? (Get Audience Responses) Do you rely on your experience and memory of the system and commands to perform the recovery? Do you have Oracle Documentation or Books handy on the shelf if you need to refer to them? Do you use the internet search engines? I hope that you have a connection near your workstation. Do you call on a co-worker to help you out? Do you use monitoring tools such as Oracle Enterprise Manager or some other 3rd party tool to do your restores? What about company documentation? Imagine this situation you get a call at 2:00 am. The system is down, the server was corrupted, we have the SAN but nothing else, can you get the database running? The company could lose hundreds, thousands or even millions of dollars if you dont get the system restored quickly. What sources of information are you relying on to help you get your system backup as quickly as possible? (Get Audience Responses) Do you rely on your experience and memory of the system and commands to perform the recovery? Do you have Oracle Documentation or Books handy on the shelf if you need to refer to them? Do you use the internet search engines? I hope that you have a connection near your workstation. Do you call on a co-worker to help you out? Do you use monitoring tools such as Oracle Enterprise Manager or some other 3rd party tool to do your restores? What about company documentation?

    26. Are You A Single Point Of Failure? Let us say that your cell phone dies while you are sitting in this session and office has no way to get a hold of you. The database suddenly crashes. Can the people you work with recover it quickly? I wonder if any of you have heard of the milk truck scenario. What would happen if someone on the team was hit by a milk truck? Would something get lost? Would we be able to carry on. So my question to you, if you or someone else suddenly never returns to the office, are you missing some knowledge you need for a quick recovery of your system?Let us say that your cell phone dies while you are sitting in this session and office has no way to get a hold of you. The database suddenly crashes. Can the people you work with recover it quickly? I wonder if any of you have heard of the milk truck scenario. What would happen if someone on the team was hit by a milk truck? Would something get lost? Would we be able to carry on. So my question to you, if you or someone else suddenly never returns to the office, are you missing some knowledge you need for a quick recovery of your system?

    27. Backup And Recovery Document Why spend time trying to remember commands or how to do something? Would it not be nice to have a new database administrator come onto staff and be able to jump pretty much into handling backup and recovery needs of the system? So take some time and create a Backup and Recovery Document.Why spend time trying to remember commands or how to do something? Would it not be nice to have a new database administrator come onto staff and be able to jump pretty much into handling backup and recovery needs of the system? So take some time and create a Backup and Recovery Document.

    28. Documentation Is Your Friend Good Business Sense Every System Is Different Boosts Ability to Concentrate Gain Experience and Knowledge Refine Backup / Restore Policies Refine Procedures There are so many good reasons for creating and maintaining accurate documentation. It just make good business sense. Have you ever been in a company when a person with the detailed knowledge about a particular aspect of your system decides to leave the company? Did you have to scramble to try and get the knowledge from them, did you have the knowledge with someone else or documented? How many people here work for a company that has multiple systems? Of those people put your hand down if they are all configured in the same manner. So what do you find is the biggest challenge working on multiple systems with different configurations? (Get answers from audience) Even if you are not the primary DBA on the system, what happens if the primary DBA is not arounddo you know everything you need to restore that system if something goes wrong? Have you ever had your boss or someone else looking over your shoulder when you are under pressure to restore a system? It is nerve-wracking experience trying to even remember passwords in this situation, much less remember proper syntax. There is no better experience than writing the documentation, you gain knowledge, experience and learn as you go. It also helps you see where your gaps are in knowledge, requirements and polices so you can refine them.There are so many good reasons for creating and maintaining accurate documentation. It just make good business sense. Have you ever been in a company when a person with the detailed knowledge about a particular aspect of your system decides to leave the company? Did you have to scramble to try and get the knowledge from them, did you have the knowledge with someone else or documented? How many people here work for a company that has multiple systems? Of those people put your hand down if they are all configured in the same manner. So what do you find is the biggest challenge working on multiple systems with different configurations? (Get answers from audience) Even if you are not the primary DBA on the system, what happens if the primary DBA is not arounddo you know everything you need to restore that system if something goes wrong? Have you ever had your boss or someone else looking over your shoulder when you are under pressure to restore a system? It is nerve-wracking experience trying to even remember passwords in this situation, much less remember proper syntax. There is no better experience than writing the documentation, you gain knowledge, experience and learn as you go. It also helps you see where your gaps are in knowledge, requirements and polices so you can refine them.

    29. B&R Document 20000 ft View Overall Backup Strategy Architecture Summary Script Listing and Description Procedures Test Documentation What types of things do you believe should be in an Backup and Recovery Document? (Get Audience Feedback) I think that a Backup and Recovery Document should have four main topics. You need to outline the overall backup strategy, summarize the architecture of the system, highlight the scripts used and provided test documentation showing the strategy works. The test cases can also be documented in such a manner as to be SOPs for situations that occur on the live system. What types of things do you believe should be in an Backup and Recovery Document? (Get Audience Feedback) I think that a Backup and Recovery Document should have four main topics. You need to outline the overall backup strategy, summarize the architecture of the system, highlight the scripts used and provided test documentation showing the strategy works. The test cases can also be documented in such a manner as to be SOPs for situations that occur on the live system.

    30. Overall Backup Strategy Types of Backups And Reasons Physical Hot / Cold Full / Incremental Exports Full Schema, Table, (Transportable) Tablespace Tools Scheduling Notification Retention Policies (Time and Off-site Location) System Specifics Let us Develop a little deeper what do you think should be included in the overall backup strategy? (Get answers from audience) Many times when reviewing this type of information as I come onto a new system I have and often saidWhy did they setup things like that? I would ask around and often find the answer to be Not sure it was that way when I got here. Now that is frustrating, having a particular setup to the system with no knowledge or history as to why it is setup a particular way. It is especially true when you see things that you believe are obviously better solutions but have to research to find if there was a legitimate reason it was not implemented. It is important that reasons for decisions be documented if they are pertinent. Like if you decided that you needed to destroy your backups every night because of data retention policies. You should note those policies and any supporting information so any new person to the system will have some type of reference. If you decide to schedule your backups at 1:00 am because of system administrator scheduled jobs then note it. One last note about the backup strategy. You really need to be prepared in the off chance your live production site is destroyed by a disaster like a fire or the sprinkler system going off. Get your backup off-site!Let us Develop a little deeper what do you think should be included in the overall backup strategy? (Get answers from audience) Many times when reviewing this type of information as I come onto a new system I have and often saidWhy did they setup things like that? I would ask around and often find the answer to be Not sure it was that way when I got here. Now that is frustrating, having a particular setup to the system with no knowledge or history as to why it is setup a particular way. It is especially true when you see things that you believe are obviously better solutions but have to research to find if there was a legitimate reason it was not implemented. It is important that reasons for decisions be documented if they are pertinent. Like if you decided that you needed to destroy your backups every night because of data retention policies. You should note those policies and any supporting information so any new person to the system will have some type of reference. If you decide to schedule your backups at 1:00 am because of system administrator scheduled jobs then note it. One last note about the backup strategy. You really need to be prepared in the off chance your live production site is destroyed by a disaster like a fire or the sprinkler system going off. Get your backup off-site!

    31. Architecture Summary Server Configuration Tool Integration Database Configuration How many have became an administrator on a system that did not provide documentation on the server or database configuration? Documenting this type of information is good to do for any system, but the more systems that you over see the more important it becomes. Depending on the environment in which you work the detailed information may be another document. List references to those documents but still summarize the server and database configuration so that you have an overview to help you if you need it. If you use tools to backup your database like RMAN or NetBackup then list how the tools are integrated into the system. Where they reside, how to access them and any information you might need in a pinch. Oracle has always been pretty good about being about to recover from media failure. Since 9i Oracle has continued to make new technologies available that make recovering from data errors much easier. How many have became an administrator on a system that did not provide documentation on the server or database configuration? Documenting this type of information is good to do for any system, but the more systems that you over see the more important it becomes. Depending on the environment in which you work the detailed information may be another document. List references to those documents but still summarize the server and database configuration so that you have an overview to help you if you need it. If you use tools to backup your database like RMAN or NetBackup then list how the tools are integrated into the system. Where they reside, how to access them and any information you might need in a pinch. Oracle has always been pretty good about being about to recover from media failure. Since 9i Oracle has continued to make new technologies available that make recovering from data errors much easier.

    32. Tools and Technology Available Media Failure Restore Media from Backup Recover using RMAN or SQL Commands Full Partial Tablespace point-in-time (TSPITR) Time-based (PITR) Cancel-based Change-based Human or Software Error Flashback Oracle provides a great overview of Backup and Recovery. The tools and technologies that it references depend on the database version but the information is enlightening. Media failure is one that most people are familiar with and in general only depends on you having a good backup copy of the media you want to restore and the archive logs up and to the point to which you want to restore. You can do full or partial recovery but in general it requires restoring or recovering at least an entire tablespace to a point in time. It includes all the activity on the tablespace or database to the point in time. In earlier releases recovering from human or software errors could be complicated. This is especially true when system design or business needs prevented restoration of the entire tablespace or database to a pervious point in time. The process would involve restoring a tablespace or entire database to another system recovering it to a point in time before the error started, export the tables involved and import them back into the production database. Not only was this complicated but often introduced data mismatches and more system problems. In the more recent versions, Oracle has made available technological advances in the area of database recovery due to human errors. The overall technology is called Flashback Technology and it provides a set of new features to view and rewind data back and forth in time. This technology does require setup and you really need to work with it to be able to accomplish things quickly but it is well worth learning.Oracle provides a great overview of Backup and Recovery. The tools and technologies that it references depend on the database version but the information is enlightening. Media failure is one that most people are familiar with and in general only depends on you having a good backup copy of the media you want to restore and the archive logs up and to the point to which you want to restore. You can do full or partial recovery but in general it requires restoring or recovering at least an entire tablespace to a point in time. It includes all the activity on the tablespace or database to the point in time. In earlier releases recovering from human or software errors could be complicated. This is especially true when system design or business needs prevented restoration of the entire tablespace or database to a pervious point in time. The process would involve restoring a tablespace or entire database to another system recovering it to a point in time before the error started, export the tables involved and import them back into the production database. Not only was this complicated but often introduced data mismatches and more system problems. In the more recent versions, Oracle has made available technological advances in the area of database recovery due to human errors. The overall technology is called Flashback Technology and it provides a set of new features to view and rewind data back and forth in time. This technology does require setup and you really need to work with it to be able to accomplish things quickly but it is well worth learning.

    33. Flashback 9i and 10g R1 Oracle 9i Flashback Query Oracle Database 10g R1 Flashback Database Flashback Table Flashback Drop Flashback Version Query Flashback Transaction Query Oracle9i introduced Flashback Query to provide a simple, powerful and completely non-disruptive mechanism for recovering from human errors. It allows users to view the state of data at a point in time in the past without requiring any structural changes to the database. Oracle Database 10g extended the Flashback Technology to provide fast and easy recovery at the database, table, row, and transaction level. Flashback Technology revolutionizes recovery by operating just on the changed data. The time it takes to recover the error is now equal to the same amount of time it took to make the mistake. Oracle 10g Flashback Technologies includes Flashback Database, Flashback Table, Flashback Drop, Flashback Versions Query, and Flashback Transaction Query. Oracle9i introduced Flashback Query to provide a simple, powerful and completely non-disruptive mechanism for recovering from human errors. It allows users to view the state of data at a point in time in the past without requiring any structural changes to the database. Oracle Database 10g extended the Flashback Technology to provide fast and easy recovery at the database, table, row, and transaction level. Flashback Technology revolutionizes recovery by operating just on the changed data. The time it takes to recover the error is now equal to the same amount of time it took to make the mistake. Oracle 10g Flashback Technologies includes Flashback Database, Flashback Table, Flashback Drop, Flashback Versions Query, and Flashback Transaction Query.

    34. Flashback 10g R2 and 11g Oracle Database 10g R2 Restore Points Flashback Database Through Resetlogs Oracle Database 11g R1 Flashback Transaction Flashback Data Archive Oracle Database 11g R2 Flashback Data Archive tracks most DDL With Release 2 of Oracle 10g you gained the ability to create restore points that would ensure that sufficient flashback logs are always maintained to get back to the restore point. This eliminates the need to try and figure what SCN or clock time that you need to restore to if you are going to do a major upgrade or system change and want to use Flashback Database to return undo the changes. The flashback database was extended to be able to allow Flashback Database operations to use flashback logs created prior to a RESETLOGS operations. Very useful if you have a logical error that was not discovered for a substantial amount of time and a RESETLOGS was performed. New to 11g is Flashback Transaction where a single transaction and all of its dependent transactions can be flashed back. It can be completed with a simple PL/SQL operation or through an Enterprise Manager wizard. Oracle Database 11g has extended Flashback technology even further with Flashback Data Archive. This can be used to automatically track and maintain historical changes to some or all Oracle data. Oracle Database 11g Release 2 (11.2) users can now track most DDL commands on tables that are being tracked with Flashback Data Archive. This includes: Add, Drop, Rename, Modify Column, Drop, Truncate Partition, Rename, Truncate Table, Add, Drop, Rename, Modify ConstraintWith Release 2 of Oracle 10g you gained the ability to create restore points that would ensure that sufficient flashback logs are always maintained to get back to the restore point. This eliminates the need to try and figure what SCN or clock time that you need to restore to if you are going to do a major upgrade or system change and want to use Flashback Database to return undo the changes. The flashback database was extended to be able to allow Flashback Database operations to use flashback logs created prior to a RESETLOGS operations. Very useful if you have a logical error that was not discovered for a substantial amount of time and a RESETLOGS was performed. New to 11g is Flashback Transaction where a single transaction and all of its dependent transactions can be flashed back. It can be completed with a simple PL/SQL operation or through an Enterprise Manager wizard. Oracle Database 11g has extended Flashback technology even further with Flashback Data Archive. This can be used to automatically track and maintain historical changes to some or all Oracle data. Oracle Database 11g Release 2 (11.2) users can now track most DDL commands on tables that are being tracked with Flashback Data Archive. This includes: Add, Drop, Rename, Modify Column, Drop, Truncate Partition, Rename, Truncate Table, Add, Drop, Rename, Modify Constraint

    35. Cheat Sheet I work on three different projects and each of the systems are setup differently. They all have different locations for their ORACLE_HOME, administration scripts, sql scripts, backup logs, and pretty much everything else. I barely touch one of the systems, but I am still on call for it. So I have a cheat sheet that I use. If there is a problem I can quickly get to the information, scripts, and logs I need. The Excel Spreadsheet that I keep would not fit onto a slide very well but I wanted to give you an idea of the type information I keep. I work on three different projects and each of the systems are setup differently. They all have different locations for their ORACLE_HOME, administration scripts, sql scripts, backup logs, and pretty much everything else. I barely touch one of the systems, but I am still on call for it. So I have a cheat sheet that I use. If there is a problem I can quickly get to the information, scripts, and logs I need. The Excel Spreadsheet that I keep would not fit onto a slide very well but I wanted to give you an idea of the type information I keep.

    36. Cheat Sheet Continued..Locations ORACLE_HOME Oracle User Home Administration SQL scripts Administration Shell scripts RMAN/backup scripts Backup Logs Backup Storage contol files Archive Logs What do you think.if you have a new Database Administrator come onto the job dont you think they would find this valuable? Do you think it would take you much time to fill something like this out for your system? Would it benefit you? What do you think.if you have a new Database Administrator come onto the job dont you think they would find this valuable? Do you think it would take you much time to fill something like this out for your system? Would it benefit you?

    37. Script Listing and Description Location Usage Execution Syntax Parameters with Descriptions If you have created scripts to perform backups and restores then list them in your B&R Document. Give the name, location, usage, execution syntax and any parameters that it takes with descriptions. If you have created scripts to perform backups and restores then list them in your B&R Document. Give the name, location, usage, execution syntax and any parameters that it takes with descriptions.

    38. Test Documentation Backup Procedures Recovery Scenarios To Test Document Restore Procedures We said earlier that a backup is only good if you can use it to restore. So test, test and then test some more. Get some stakeholders together and brain storm some scenarios that might cause your system data loss. Use the ones mentioned earlier in this slide presentation and those I put at the end of this slide show. Take those scenarios and see if you can recover from the problem. Take the time to document it and have another DBA attempt to do the recovery with your instructions. Take the time and record the steps, you will learn things you dont know, get to know the system better and have a blueprint if the scenario ever occurs. However, remember things always change and you will never be able to document everything that can go wrong, but you will at least have a better chance of dealing with problems when they do occur. I speak from experience. I created a B&R test documentation on a system, I left the system for a few years and recently returned to work on it. One of the DBAs that had remained one the project got onto my case saying that they had problems and tried to use the documentation to do a recovery and it failed. After talking to her I found two things. No one had practiced with the procedures since I had left They had upgraded from 9i to 10g while I was gone as well. So the commands that worked just find for 9i did not do so well for 10g. So remembercontinue practice with the procedures and review them before upgrades. The features and commands do change sometimes between versions. As a last thought there was a story I saw about a company manager whom had made sure that the database administrator was backing up the system. The DBA assured him that the system was backed up constantly and they did not need to worry about data loss. Evidently the backup the DBA was referring to was that the data disks were mirrored. Guess what happened when the manager and DBA had a falling out. On the way out the door the DBA deleted all the files on the system. The company did not survive. Now do you think if the company manager had requested the documentation we have talked about today that the results would have been the same? We said earlier that a backup is only good if you can use it to restore. So test, test and then test some more. Get some stakeholders together and brain storm some scenarios that might cause your system data loss. Use the ones mentioned earlier in this slide presentation and those I put at the end of this slide show. Take those scenarios and see if you can recover from the problem. Take the time to document it and have another DBA attempt to do the recovery with your instructions. Take the time and record the steps, you will learn things you dont know, get to know the system better and have a blueprint if the scenario ever occurs. However, remember things always change and you will never be able to document everything that can go wrong, but you will at least have a better chance of dealing with problems when they do occur. I speak from experience. I created a B&R test documentation on a system, I left the system for a few years and recently returned to work on it. One of the DBAs that had remained one the project got onto my case saying that they had problems and tried to use the documentation to do a recovery and it failed. After talking to her I found two things. No one had practiced with the procedures since I had left They had upgraded from 9i to 10g while I was gone as well. So the commands that worked just find for 9i did not do so well for 10g. So remembercontinue practice with the procedures and review them before upgrades. The features and commands do change sometimes between versions. As a last thought there was a story I saw about a company manager whom had made sure that the database administrator was backing up the system. The DBA assured him that the system was backed up constantly and they did not need to worry about data loss. Evidently the backup the DBA was referring to was that the data disks were mirrored. Guess what happened when the manager and DBA had a falling out. On the way out the door the DBA deleted all the files on the system. The company did not survive. Now do you think if the company manager had requested the documentation we have talked about today that the results would have been the same?

    39. Media Loss Loss of a Control File Loss of a data file for a tablespace System, rollback segment, UNDO, user data, Index, read-only, partition Loss of Redo Log file Inactive Online, Current Online, Archived Loss of entire redo group Inactive Online, Current Online, Archived Data Block Corruption Physical Logical In Backup These scenarios are not all encompassing but gives you a good starting point.These scenarios are not all encompassing but gives you a good starting point.

    40. Recovery of Entire Database Recovery with No RMAN catalog With / Without controfile With / Without redo logs Recovery to New Machine Recovery to New File System. Point in Time Recovery of Entire Database Recovery of RMAN catalog Creation of Standby Database Creation of Duplicate Database on Test System

    41. More Than Just One File If database crashes during backup. If binaries are destroyed. If entire database server has to be replaced. If SAN loses multiple drives. If database crashes during table movement. If database crashes during use of Flashback Technology If Read-Only tablespace was created before last backup. If Read-only tablespace was created after last backup

    42. User / Software Error (Flashback) Recovery of Dropped Schema Recovery of Dropped Table Data Corruption in Row Transaction Flashback Single All resulting transactions Software Installation Failure Data Corruption in entire schema Data Corruption in schema 5 hours old but reset of database needs to remain. Trigger or procedure is recompiled with wrong code

    43. 43 Visit the IOUG Booth This Week Located in the User Group Pavilion - Moscone West, 2nd Floor Learn why over 23,000 have joined IOUG and what it can do for you Chat with the IOUG Board of Directors Hear about new regional IOUG BI user communities Find out how to submit an abstract for COLLABORATE 11 IOUG Forum Enter for a chance to win a COLLABORATE 11 registration Stock up on IOUG gear and educational materials!

    44. I hope that when you leave here you will be able to either say.Yes I am sure that I can restore in any circumstance. or perhaps.You know I did not think of that and the best possible answer would be Humm, I am going to go back to work and make sure I can restore in any circumstance. I hope that when you leave here you will be able to either say.Yes I am sure that I can restore in any circumstance. or perhaps.You know I did not think of that and the best possible answer would be Humm, I am going to go back to work and make sure I can restore in any circumstance.

More Related