Business impacting disasters are serious and can have long-term effects on a company’s health. According to the Federal Emergency Management Agency (FEMA), 75% of companies believe their DR plans are inadequate, 40% of businesses do not reopen after a disaster, and another 25% fail within one year.
Do I have your attention yet? What’s the cost of downtime for your business? It may be higher than you think!
All Disasters Are Not Created Equal
When disaster strikes, you have to start thinking about how to resolve whatever problem caused the disaster in the first place, which could be a wide range of situations such as:
- Human failure that took out the power
- A weather event that’s damaged your building
- Catastrophic events like a fire that’s left you with only a smoking hole in the ground
Each of these scenarios will require a different approach to restore critical business systems. Nevertheless, the primary goal must be restoring your company’s ability to serve customers and generate revenue. In some situations, it may be a better decision to focus on restoring the production environment vs. bringing online the disaster recovery location. Although the best approach, in my opinion, is always a parallel resolution path if you have the available staffing to do so.
There are many distinct types of disasters, but really, anything unexpectedly affecting your production environment is a disaster, which means pretty much everything outside of planned maintenance. Above all, you must remember that not all disasters are created equal, and they must be treated suitably.
Recovery Point Objective (RPO) and Recovery Time Objective (RTO)
RPO and RTO play an important role in disaster declaration. Let’s take a moment to talk about how RPO and RTO are different but work together. A Recovery Point Objective (RPO) is your target for the maximum amount of data loss acceptable in the event of a disaster, which is measured in time. On the other hand, the Recovery Time Objective (RTO) is your goal for how long it takes from declaration of disaster to restoration of critical business systems.
RTO and RPO are associated with business objectives rather than an IT mandate. (Hint: the lower the RPO/RTO, the higher the associated costs.) Determining your RTO based on business need then proving it through DR exercises is key, because it will help drive the decision to declare a disaster. If the production outage was caused by the accidental flip of a power breaker or bump of a cable, it’s likely going to make sense to restore production vs. bringing online the DR location. Alternatively, if you show up for work and that smoking hole in the ground is there, you should probably make your declaration of disaster and get to the closest Internet connection. Rock’n the DR Runbook from Starbucks anyone?
I’m sure you know the saying, “practice makes perfect.” DR is a fitting example of where that applies. It is one thing to say that you want a short RTO, but another to make it happen. Your company must be intentional in its desire to have a working DR solution with a proven RTO, and that requires a significant amount of practice. I have been through hundreds of DR exercises with clients, and each time we learned new settings or processes that needed updating to speed up the whole recovery process. When I am preparing for a DR exercise, I don’t like to think of it as a pass or fail the test, but rather set the goal to learn and document 10 new things that will help reduce your RTO or improve the success ratio.
Earlier I mentioned a DR Runbook. A DR Runbook is your playbook on how to declare a disaster and the steps that follow declaration. It includes your test steps, failover steps, failback steps, and it is critical to your business. Your provider should give you the first draft of your runbook because that includes the steps necessary to declare within their processes. If you are not working with a provider, the process is A LOT harder. When it comes to DR, I do not recommend that you go it alone. I also strongly suggest that you keep a copy of your runbook in the car and at home. Do you remember the smoking hole in the ground at work? The copy of a runbook sitting in your desk drawer isn’t going to be very useful to you in that instance.
When you go through DR exercises you will encounter settings that need adjustment or config files that need updating, and those should be captured in your DR Runbook. Each time you document a setting like this, it takes a little bite out of your RTO. You can also start to think about what changes, if any, you can automate, but either way remember that you must keep any scripts and documentation current as the environment changes.
A successful DR practice is never going to be a “one and done,” and it is going to take both a financial and resource commitment from your entire organization.
Read part 2 here.
Brian Frank is Product Delivery Director at Contegix, owning the vision, execution, and management of the product delivery strategy and roadmap. In this role, Brian works with all functional areas of the operations team to develop product releases.
Brian’s responsibilities also include product selection guidance, leading requirement gathering efforts with key stakeholders, taking part in product solution architecture, and successful delivery of early adopter solutions.