Preventing a Disaster: My Methodology Part 4

This is my last post in a series called Preventing a Disaster: My Methodology. In Part 1 I discussed the importance of running DBCC CHECKDB on your databases and provided tips on how to do this in VLDB and very busy systems.  In Part 2: Backups, I discussed the importance of a DBA knowing the RPO/RTO (Recovery Point Objective/Recovery Time Objective) of the business.  It is the RPO/RTO of a company that should determine your backup policies and procedures.  The 3rd installments discussed Off-site Locations and the fact that your backup strategy is not complete until you have a copy of your backups off the physical site.

In this installment I would like to discuss my last point in Preventing a Disaster and that is to ask yourself, “are my backups valid?”  Earlier, in post 2, the concept of “verifying” your databases was introduced and it should be a part of your backup process.  This basically verifies that what was written to the disk is equal to what is in the database.

To properly validate your backups, a DBA must perform a RESTORE of that backup to ensure that 1) it can be done, 2) that the RESTORE process works and 3) to validate the backup process.

This can be done to a development box, your desktop, a VM server that you can blow away later, it really doesn’t matter.  The point is to restore the database to a SQL Server. Typically, after a restore is complete you should run DBCC CHECKDB on the database to validate the integrity of the database.

To properly test your entire backup procedures, a DBA should get a backup file or files from archive, tape or off-site: i.e from the final resting place and restore that backup.  Restoring last nights backups is only half the process.

I once worked in a shop where a SQL server backed up to a network share where the 3rd party file backup solution then archived it tape.  Their policy was to keep 1 month’s worth of tape.  So just for kicks, I asked to get access to a 29 day old backup file to test restores.  Unfortunately, the Backup Administrator did not know how to retrieve a file from tape and place it on a network share. He was competent in getting all the necessary files to write to tape; but was unsure how to retrieve data (in his defense, he was new and was never asked to do a restore from tape).  The “backup procedures” as a whole was broken.

In Summary

In Preventing Disasters: My Methodology, I hope I explained what DBAs should do and why it is important to not skip a step. Be aware of who else is involved in the process and work closely with them to execute and test the process.

Preventing a Disaster: My Methodology–Part 3

Welcome back to part 3 of my 4 part series on Preventing a Disaster. In Part 1, I discussed the importance of Database Integrity checks and possibly ways to run them against VLDB (very large databases).  The second article was an explanation on Backups in general.

In this entry, I would like to discuss the concept of “Off-Site Locations”.

It is my belief that to truly say you have a effective disaster recovery plan, your plan needs to include how to get the SQL backups off-site.

There are a couple of options that I prefer and then there are less desirable options that are still technically effective.

Option 1 – The Cloudcloud

Several Cloud vendors proved storage containers, i.e. hard drive space.  Of course MS SQL works best with MS Azure.  SQL Server 2012 SP1-CU2 and SQL Server 2014 provided the ability to backup directly to an Azure storage container.  This pretty much combines steps 2 and 3 into one efficient step.  In April of 2014 Microsoft provided a secondary tool that allows previous versions of SQL to backup directly to Azure as well.

However, there are some downsides in my opinion.  Your server obviously needs an internet connection to the outside world and you have to have purchased an Azure account with the appropriate storage size. And as that storage blob grow, so does your monthly bill.

The benefits of using Azure storage include: compression, encryption and seamless integration into SQL Server.

Option 2 – SAN Replication

One option that I have seen to be successful is what I am calling SAN Replication. If your company has a backup datacenter in a different location, then chances are you have a SAN storage array their. 

In this configuration, you would use native SQL compressed backups to a local SAN, other than your data SAN of course!  Then that SAN is replicated to your secondary Data Center SAN, either using SAN snapshot or true block-by-block replication.

This method can be very effective in getting your data off-side.  This method may take a little longer however it usually is effective. The major downside to this is cost.  The cost of running a 2nd datacenter and a 2nd comparable SAN is enormous.  That is one of the reasons why cloud storage is becoming a more viable option as time goes by.filecluster

Option 3 – Personal Relocation

This is personally my favorite! (just kidding)  In this method you would use native SQL backup with compression targeted to an external drive.  And then at the end of the day personally take that external drive off-site.  External drive data recovery

Now, I am sure some of you are laughing at this method but with the cost of consumer hard drive rapidly decreasing this is a very viable option for some smaller companies.  I actually knew a company that did this every Friday and the IT manager relocated the USB external drive to a bank vault.  The company purchased 5 drives large enough to hold a weeks worth of backup files and rotated them out weekly.  This method allowed them to keep 30 days worth of backups at all times off-site.

This method is probably the cheapest; however not necessarily the safest. 

Wrap-up

I am sure there are many other scenarios that are effective to getting SQL backups off-site, these are the ones that I have seen work in the real world successfully.

The important thing to remember and what to take away from this post, is to get your backups off-site.  In the event of your primary datacenter crashing, you need to be able to get your data restored ASAP. And if your most recent backup is on a server/SAN in that datacenter, your recovery time has just been exponentially increased.

Your backup plan is not complete until your backups are off-site!

Preventing a Disaster: My Methodology Step 2

In my previous post I introduced My Methodology in Disaster Recovery along with Step 1: Integrity Checks.

Today, let’s talk about Step 2: Backups; but first we have to answer 2 questions.  How much data can are you willing to lose and how long are you willing to be down after a disaster?  In this line of business we call these Recovery Point Object and Recovery Time Object (RPO/RTO).  

Recovery Point Objective

RPO is the “line in the sand” where you say, I can’t lose XXXX amount of data.  Of course this line is determined by what type of data it is, how often the data is changed and is it OK to lose any of the data or can the data be rebuilt.

Example: A hospital would not want to lose any of their data.  I would like to believe my medical history is safe and I do not have to keep my own copies.  This type of data is crucial to keep, so basically acceptable loss of data is Zero to none! This type of RPO usually requires High Availability Disaster plans which is outside the scope of this article, but lets just say you will take many, many transactional log backups!

Recovery Time Objective

RTO is the “line in the sand” whey you say, “I can’t be down more than XXXX amount of time”.  How long does it take you to get a restorable server up and running. Again, the type of data determines this as much as anything else.

Using a previous example of a hospital:  electronic medical records could possibly be down a little while because there are such things a paper medical records that can be scanned/typed in later. It is not ideal situation; but hey following a disaster I would be willing nothing is working ideally!

Backups

Backing up your databases will (not may) save your job one day, it has mine! The most important thing to remember about backups is to ensure that they are done.  They second most important thing to remember about backups is to ensure that they are “verified”.  And finally ensure your backups are valid.

First let’s talk about methods of backing up databases. Then I will discuss my thoughts on where to put backup files.

There are two main ways to backup databases: Native and 3rd Party tools. I hope to break down the pros and cons of each.

Steps of the Backup Process

1.  Perform the Backup – using what ever method your company uses; make sure the backups are being done.

2. Verify the Backup — SQL provides a way to verify the file written to the drive is verified against the live data file. This process basically ensures what was read was truly written to the disk.  This doesn’t verify IF your data is valid; that is where Step 1 of Disaster Recovery Integrity Checks.

USE master
GO

RESTORE VERIFYONLY [MyDatabase]
	FROM  DISK = N'C:\Backups\>MyDatabase.bak' 
GO

BackupVerify

Backup Methods

Native SQL Backups

Native backups are built into SQL Server, i.e. no further costs.  SQL Server is expensive enough you shouldn’t have to spend more money to just backup your data.  Native Backups (NBs) have been around since at least SQL 2000, that’s the beginning of my SQL experience.  NBs functionality has grown up right along with SQL.  The ability to write to network shares, ability to compress, ability to verify are all options that make NBs an excellent choice.

If your SQL Server is experiencing CPU pressure, you may want to reconsider using compression.  The compression is actually done with CPU cycles before writing the data to a file; so using compression will increase your CPU usage during your backup window.

3rd Party Backup Tools

Prior to SQL Server 2008 R2, I could make an argument for using 3rd Party Tools because of the importance of saving space with backup compression.  However, since 2008 R2, backup compression is an available feature in all but Express editions.

Some 3rd Party Tools, are nothing more than a windows service that manages backups.  Some 3rd Party Tools are actually a windows service that will perform proprietary backups these are the ones that usually provide a variety compression levels to choose from.

If your SQL Server is experiencing CPU pressure, you may want to reconsider using compression.  The compression is actually done with CPU cycles before writing the data to a file; so using compression will increase your CPU usage during your backup window.

Your choice of methods is completely up to you and your business requirements.  I personally see no reason to spend more money on a product that another product already does. My method of choice is Native Backups.

Location of Backups

This is always a hotly contested arguments.  Do I backup my files to a local drive, attached SAN drive, network share or cloud?

To start with, I think the majority of the SQL Community will agree that the faster you get the BAK and TRN files off of the SQL Server the better off you will be.

Using a network share as your backup folder destination is just as valid as using a local drive or an attached SAN drive or heaven forbid an attached external USB drive. Remember, the destination should not be the same drive that your MDF and/or LDF files are on.

The real point of the discussion of Location of Backups should not be where my backups should be stored; but what is the final resting place of those backups?  Does my company have a plan to move the files off site to a secure location?  Is that plan automatic or manual?

Once your daily backup routines are complete, there should be a process in place to archive these flat files off of the SAN and relocate them off site. I have seen everything from an external USB drive carried home to SAN replication to a completely different data center on the other side of the country.  Either method is functional in getting the backups off site.

One very reasonable and possible location is cloud storage.  This concept, if enough space is purchased, kills the preverbal “two birds with one stone”. It not only backups up your files off of the same drive as your live data but it also by its very nature ensures that the backups are indeed offsite.

My Personal Ideal Scenario

My personal opinion is to use native SQL Backup with Compression and write the files to a network share, preferably a different SAN/NAS than your data. This does two things: 1) compression reduces the size of the backup files going across the network and 2) physically writes the files off of your SQL server as fast as possible and allows your immediate backups to be stored on a different physical drive than your live data.

Some will argue and say “it will blow up my network”; however, with the current specs of modern NIC cards and routers, it is highly unlikely that you will cause network issues.

Offsite Archiving

The most difficult part of this planning is how to get the backup files offsite.  Replicated SANs can be VERY expensive, archiving to Tape is an option, cloud storage is available as an option or just copying files to external drive. The thing to remember is the amount of time it takes you to get the files to their final resting place is the amount of time it takes to recover them (RTO).

I believe as cloud storage gets cheaper and cheaper (except for you Microsoft), the feasibility of backing up directly to the cloud will be more and more popular.

Bottom Line

Know your RPO/RTO of your company, have a plan to accomplish those goals and know how to make that happen in the cheapest, fastest, safest way possible.

Preventing a Disaster: My Methodology

I was hoping a career in consulting would possibly spark blog ideas! 

disaster-recoveryOne thing I am passionate about with SQL Server is Disaster Recovery.  Having worked for a hospital during hurricane season as a DBA, I truly had some sleepless night wondering if I could bring a 2nd system up successfully in the event of a total disaster. Medical data is of the utmost importance when it comes to the field of health care.  And I am sure, everyone has the stance that “their data is the most important data”!  That is why you must protect it at all costs.

Either way, as a DBA it is my job to be able to stand up a 2nd server as soon as possible in the Desktop-Disaster-Recoveryevent of a total disaster.  If it was only as easy as pushing a button, our job as DBAs would be much easier.  Unfortunately though, proper disaster recovery requires fore thought, planning and testing.

I have a 4-fold plan of Disaster Recovery and I would like to discuss my thoughts on this.  Part one will be discussed here with later parts in subsequent posts.

  1. Integrity Checks
  2. Backups
  3. Off-Site duplicates
  4. Recovery Testing

1. Integrity Checks

Most everyone is aware of DBCC CHECKDB commands, and they are vital to the stability of your database.  If you are not familiar with them, then please for the love of the SQL Gods learn about it!

It is of utmost importance to run these as often as feasible.  I typically run integrity checks once a day during non-business hours or downtime. This process will be resource intensive which is why it needs to be done during downtime.

Now some of you are going to say, I tried running DBCC CHECKDB on my 350 GB database and it brought my server to its knees so I stopped doing them. All I can say is I hope your data pages are not corrupt.

Per MSDN DBCC CHECKDB does the following:

  • Runs DBCC CHECKALLOC on the database.

  • Runs DBCC CHECKTABLE on every table and view in the database.

  • Runs DBCC CHECKCATALOG on the database.

  • Validates the contents of every indexed view in the database.

  • Validates link-level consistency between table metadata and file system directories and files when storing varbinary(max) data in the file system using FILESTREAM.

  • Validates the Service Broker data in the database.

Now that’s a whole lot of checking!  If CHECKDB command does all these commands, then possibly we can shorten the duration by manually executing CHECKALLOC one night, then CHECKTABLE another, and maybe CHECKCATALOG a 3rd night.  It is a thought.

There are many options to integrity checks that can shorten the execution time.  Many blogs posts by Paul Randal, Aaron Bertrand (to name a few) have written many articles about DBCC CHECKDB and how to effectively use the different options and actually the different commands of DBCC to shorten the duration of the integrity check.

The one thing to remember here is if you databases is corrupt, so are your backups!  SQL Backups are only copies of what is in the databases, if the data pages are corrupt so is your backup!  This is why it is impetrative to regularly perform integrity checks on your databases.

One of my favorite methods because I can script it out right after Step #4 of My Methodology is off-load integrity check.  If your databases are too large or too busy to do checks in production, after you have test restored your backups (hopefully you are doing this), that is an ideal time to run DBCC CHECKDB.  The restored database is ideally on another server that will not have any impact on users if you run the DBCC CHECKDB commands.

In my next post, I will discuss my thoughts on SQL Backups: discussing native vs. 3rd Party and local vs. network backups.