Top 5 Ways to Ensure a Successful DR Strategy in the Public Cloud

By Megan Berkery

One of the most common challenges for companies today is to understand what Disaster Recovery (DR) really means and how to do it successfully. Some companies put a strategy together that is functionally more like an offsite backup or even a high availability (HA) solution while others trust an MSP to build it for them, but never fully test the solution built for them. These scenarios happen far more than you can imagine, and even more often when going to the public cloud, as many of the great tools available for traditional replication don’t currently work in the public cloud today. With all of this in mind, let’s discuss the top 5 ways to ensure a successful DR Strategy in the Public Cloud.

1. Understand the Difference Between DR and Backups

Many companies out there think that having a DR solution means setting up an offsite backup in AWS or Azure that they can use to stand up an environment quickly in the case of a declared disaster. Let’s be very clear, this is not a DR strategy, or at least not a good one. Now, if your Recovery Time Objective (RTO) is over 24 hours, you might be able to do this for a small environment.  However, unless your initial environment was built in less than 24 hours, this is unrealistic. Keep in mind that even though you can spin up servers in minutes this is only a small part of the equation. Recreating all the settings on each server, recreating an entire environment from scratch, and everything in between is not quick and easy, even with the data being readily available. Unless you have devices on standby preconfigured and ready to go, you do not have a true DR strategy.

2. Understand the Difference Between HA and DR

For those of you with a production environment in (or moving to) the public cloud and wanting to build your DR environment in the public cloud, you need to make sure you are not creating HA instead of DR. The simplest way to understand it is to understand Zones versus Regions. If you create redundant servers across zones those are still all tied to a single area that could go out in a disaster or even in an event like the AWS S3 outage earlier this year. This is good for high availability in a scenario where individual hardware or components fail, but that is not DR. It’s more effective when it’s broken out in a completely different Region, which usually means a completely different part of the country. This means, if an entire data center is lost in a disaster, or if storage on an entire coast goes out, you can fail over to a completely different region unaffected by such events.

3. Understand That Once You Go Cloud, There’s No Coming Back

One of the common misconceptions today is that if you failover to the public cloud you can fail back over after the disaster is over. This is absolutely incorrect and has taken several organizations by surprise.

Imagine you have a production environment with 20 web servers and 6 database servers on premise or with a hosting provider. However, you decide your DR environment can be smaller because your production environment will realistically only be down for a maximum of a few days. So, you build your DR environment in the public cloud with only 10 web servers and 3 databases until you can fail back to production (to be clear, this is a valid DR structure (2:1) for many organizations). Now imagine your surprise when you find out you can’t fail back over to your production environment when it comes back up. That’s exactly how it would happen today. Luckily, tools like Zerto are close to releasing a solution to do this within Azure’s cloud, but it is still a few months away and it is the only public cloud this close. This is a perfect example of why having an MSP who specializes in DR to help you avoid the pitfalls like this is critical.

4. Define Exactly What You Need Before You Create Your Plan

Now that you have an understanding of the 3 critical pieces to creating a successful DR strategy, the next step is to decide what you need. Many companies automatically think they need their DR environment to mirror their production environment, 1:1. As previously mentioned in #3, this is not usually necessary. In most cases, a DR event only lasts for a couple of hours or in extreme cases, a couple of days. Evaluate your business and the volume to the applications or sites. A smaller footprint may cover your needs. For those that believe they need or want a 1:1 ratio, go ahead. The point here is to take the time to consider what level of service you really need to provide to your employees and customers a functional environment.

5. Create a Full Runbook, Test Plan, and Then TEST, TEST, TEST

Now that you know what you need, you can build you’re your plan and environment. This plan will be unique based upon your company’s needs. What we are going to discuss here is, how crucial it is to test your plan and environment. Many companies just build their DR environment without testing it or having a clear plan to follow. In some cases, they have someone build it for them and just trust that in an emergency (which they think will never happen) that it will work. This is almost as bad as not creating an environment in the first place. If you do not have a plan and you have not tested the environment fully/successfully, you can spend hours or even days trying to get the DR environment up in a disaster. Imagine the pressure the team would be under knowing production is down and they can’t bring up the DR environment because they don’t have a clear plan, and they can’t figure out why it’s not working. Trust me, you don’t want to be that CIO. Create a clear step by step runbook, create a test plan that validates those runbook steps one by one and test it until it works. This is an area that’s ok to be overzealous and ensure you have seen with your own eyes that the test is 100% successful. Don’t stop until it is! Then test it at least once a year, if not twice.

Hopefully this overview of the key challenges has been helpful. It might also help you and your team understand that doing DR in the public cloud takes some inside knowledge and expertise. If you have that knowledge in house, great! If not, find a MSP and ask them their strategy on DR in the public cloud. If they can’t cover all 5 areas here, find one that can. The public cloud can be great for things like DR, but realize it is still maturing and has a ways to go before it is 100% viable for all environment types. With something as critical as DR don’t wait until it’s too late to get it right, go out there and find the right expertise to get it done right now. Contact us today and we will help you build the best public cloud DR strategy for DR for your company.