This is my last post in a series called Preventing a Disaster: My Methodology. In Part 1 I discussed the importance of running DBCC CHECKDB on your databases and provided tips on how to do this in VLDB and very busy systems. In Part 2: Backups, I discussed the importance of a DBA knowing the RPO/RTO (Recovery Point Objective/Recovery Time Objective) of the business. It is the RPO/RTO of a company that should determine your backup policies and procedures. The 3rd installments discussed Off-site Locations and the fact that your backup strategy is not complete until you have a copy of your backups off the physical site.
In this installment I would like to discuss my last point in Preventing a Disaster and that is to ask yourself, “are my backups valid?” Earlier, in post 2, the concept of “verifying” your databases was introduced and it should be a part of your backup process. This basically verifies that what was written to the disk is equal to what is in the database.
To properly validate your backups, a DBA must perform a RESTORE of that backup to ensure that 1) it can be done, 2) that the RESTORE process works and 3) to validate the backup process.
This can be done to a development box, your desktop, a VM server that you can blow away later, it really doesn’t matter. The point is to restore the database to a SQL Server. Typically, after a restore is complete you should run DBCC CHECKDB on the database to validate the integrity of the database.
To properly test your entire backup procedures, a DBA should get a backup file or files from archive, tape or off-site: i.e from the final resting place and restore that backup. Restoring last nights backups is only half the process.
I once worked in a shop where a SQL server backed up to a network share where the 3rd party file backup solution then archived it tape. Their policy was to keep 1 month’s worth of tape. So just for kicks, I asked to get access to a 29 day old backup file to test restores. Unfortunately, the Backup Administrator did not know how to retrieve a file from tape and place it on a network share. He was competent in getting all the necessary files to write to tape; but was unsure how to retrieve data (in his defense, he was new and was never asked to do a restore from tape). The “backup procedures” as a whole was broken.
In Summary
In Preventing Disasters: My Methodology, I hope I explained what DBAs should do and why it is important to not skip a step. Be aware of who else is involved in the process and work closely with them to execute and test the process.