Disaster Recovery

Welcome back to part 3 of my 4 part series on Preventing a Disaster. In Part 1, I discussed the importance of Database Integrity checks and possibly ways to run them against VLDB (very large databases). The second article was an explanation on Backups in general.

In this entry, I would like to discuss the concept of “Off-Site Locations”.

It is my belief that to truly say you have a effective disaster recovery plan, your plan needs to include how to get the SQL backups off-site.

There are a couple of options that I prefer and then there are less desirable options that are still technically effective.

Option 1 – The Cloud

Several Cloud vendors proved storage containers, i.e. hard drive space. Of course MS SQL works best with MS Azure. SQL Server 2012 SP1-CU2 and SQL Server 2014 provided the ability to backup directly to an Azure storage container. This pretty much combines steps 2 and 3 into one efficient step. In April of 2014 Microsoft provided a secondary tool that allows previous versions of SQL to backup directly to Azure as well.

However, there are some downsides in my opinion. Your server obviously needs an internet connection to the outside world and you have to have purchased an Azure account with the appropriate storage size. And as that storage blob grow, so does your monthly bill.

The benefits of using Azure storage include: compression, encryption and seamless integration into SQL Server.

Option 2 – SAN Replication

One option that I have seen to be successful is what I am calling SAN Replication. If your company has a backup datacenter in a different location, then chances are you have a SAN storage array their.

In this configuration, you would use native SQL compressed backups to a local SAN, other than your data SAN of course! Then that SAN is replicated to your secondary Data Center SAN, either using SAN snapshot or true block-by-block replication.

This method can be very effective in getting your data off-side. This method may take a little longer however it usually is effective. The major downside to this is cost. The cost of running a 2nd datacenter and a 2nd comparable SAN is enormous. That is one of the reasons why cloud storage is becoming a more viable option as time goes by.

Option 3 – Personal Relocation

This is personally my favorite! (just kidding) In this method you would use native SQL backup with compression targeted to an external drive. And then at the end of the day personally take that external drive off-site.

Now, I am sure some of you are laughing at this method but with the cost of consumer hard drive rapidly decreasing this is a very viable option for some smaller companies. I actually knew a company that did this every Friday and the IT manager relocated the USB external drive to a bank vault. The company purchased 5 drives large enough to hold a weeks worth of backup files and rotated them out weekly. This method allowed them to keep 30 days worth of backups at all times off-site.

This method is probably the cheapest; however not necessarily the safest.

Wrap-up

I am sure there are many other scenarios that are effective to getting SQL backups off-site, these are the ones that I have seen work in the real world successfully.

The important thing to remember and what to take away from this post, is to get your backups off-site. In the event of your primary datacenter crashing, you need to be able to get your data restored ASAP. And if your most recent backup is on a server/SAN in that datacenter, your recovery time has just been exponentially increased.

Your backup plan is not complete until your backups are off-site!

I was hoping a career in consulting would possibly spark blog ideas!

One thing I am passionate about with SQL Server is Disaster Recovery. Having worked for a hospital during hurricane season as a DBA, I truly had some sleepless night wondering if I could bring a 2nd system up successfully in the event of a total disaster. Medical data is of the utmost importance when it comes to the field of health care. And I am sure, everyone has the stance that “their data is the most important data”! That is why you must protect it at all costs.

Either way, as a DBA it is my job to be able to stand up a 2nd server as soon as possible in the event of a total disaster. If it was only as easy as pushing a button, our job as DBAs would be much easier. Unfortunately though, proper disaster recovery requires fore thought, planning and testing.

I have a 4-fold plan of Disaster Recovery and I would like to discuss my thoughts on this. Part one will be discussed here with later parts in subsequent posts.

Integrity Checks
Backups
Off-Site duplicates
Recovery Testing

1. Integrity Checks

Most everyone is aware of DBCC CHECKDB commands, and they are vital to the stability of your database. If you are not familiar with them, then please for the love of the SQL Gods learn about it!

It is of utmost importance to run these as often as feasible. I typically run integrity checks once a day during non-business hours or downtime. This process will be resource intensive which is why it needs to be done during downtime.

Now some of you are going to say, I tried running DBCC CHECKDB on my 350 GB database and it brought my server to its knees so I stopped doing them. All I can say is I hope your data pages are not corrupt.

Per MSDN DBCC CHECKDB does the following:

Runs DBCC CHECKALLOC on the database.

Runs DBCC CHECKTABLE on every table and view in the database.

Runs DBCC CHECKCATALOG on the database.

Validates the contents of every indexed view in the database.

Validates link-level consistency between table metadata and file system directories and files when storing varbinary(max) data in the file system using FILESTREAM.

Validates the Service Broker data in the database.

Now that’s a whole lot of checking! If CHECKDB command does all these commands, then possibly we can shorten the duration by manually executing CHECKALLOC one night, then CHECKTABLE another, and maybe CHECKCATALOG a 3rd night. It is a thought.

There are many options to integrity checks that can shorten the execution time. Many blogs posts by Paul Randal, Aaron Bertrand (to name a few) have written many articles about DBCC CHECKDB and how to effectively use the different options and actually the different commands of DBCC to shorten the duration of the integrity check.

The one thing to remember here is if you databases is corrupt, so are your backups! SQL Backups are only copies of what is in the databases, if the data pages are corrupt so is your backup! This is why it is impetrative to regularly perform integrity checks on your databases.

One of my favorite methods because I can script it out right after Step #4 of My Methodology is off-load integrity check. If your databases are too large or too busy to do checks in production, after you have test restored your backups (hopefully you are doing this), that is an ideal time to run DBCC CHECKDB. The restored database is ideally on another server that will not have any impact on users if you run the DBCC CHECKDB commands.

In my next post, I will discuss my thoughts on SQL Backups: discussing native vs. 3rd Party and local vs. network backups.

Padre's SQL Resort

Inside the mind of a MS SQL DBA (it's scary in there)

Preventing a Disaster: My Methodology–Part 3