5 Steps for Building a Disaster Recovery Plan (DRP)
Every business with an IT infrastructure needs a disaster recovery plan (DRP). Disaster recovery — the operations undertaken to recover from a critical disruption to IT resources — ensures that you can work through an IT incident of any type while minimizing costly downtime and data loss. DRPs must be tailored to each business’s unique needs and address a range of disaster scenarios.
This article guides you through the disaster recovery planning process, from how to structure a plan to the five key steps necessary to create an effective DRP for your organization.
A DRP details the actions your business will take to recover from unexpected disruptions to its IT infrastructure. It is a formal, structured document that lays out how your organization will respond to various types of incidents, who will carry the necessary actions out, and how resources will be allocated to ensure that those steps can be executed successfully.
DRPs may also specify the timelines within which certain goals need to be achieved in order to maintain business continuity. This is an important consideration because different businesses can tolerate different levels of disruption. A dog-grooming business that is not highly reliant on its databases may be able to achieve continuity even if it lacks access to those databases for up to a week, for example, while an eCommerce company might be able to tolerate only an hour of database unavailability before it would suffer a severe impact to its operations and revenue.
In the context of disaster recovery, the term “disaster” is used broadly. The types of disruptions involved include not just conventional disasters, such as hurricanes or earthquakes, but also disturbances like malware attacks that prevent a business from accessing its data, a power outage that makes servers unavailable temporarily, or even a strike by employees that disrupts access to the systems they support.
The nature of a DRP may vary widely depending on which types of disaster recovery operations a business deems critical to restore continuity. A basic plan could include simple steps for restoring data from backups, with the assumption that systems themselves would remain intact or could be easily rebuilt. More detailed plans might lay out steps for rebuilding an entire IT infrastructure, or for using offsite resources — such as the public cloud — for creating an alternative infrastructure to replace an onsite data center that has failed or become unavailable.
There is no specific requirement for the length of a DRP. A business with complex disaster recovery operations could have a plan of one hundred pages or more, while companies that have small, uncomplicated IT infrastructures might require only a few pages to spell out their DRP. No matter how long your plan is, your main goal should be to ensure that it is concise and easy to follow, even for personnel who may not have reviewed it in detail before a disaster strikes.
Regardless of the details of your DRP, there are five basic steps that you should follow when building it.
The disaster recovery planning process starts with identifying and collecting relevant data. Important information to identify includes:
- Which hardware and software systems are essential to your business and have to be restored as quickly as possible following a disruption.
- How much downtime your business can tolerate before it suffers critical damage. This is known as your recovery time objective (RTO).
- How much data you can stand to lose in the event of a disaster. For instance, if you back up your servers each day at 8:00 A.M., you stand to lose up to 24 hours of data in the event of a crash. In this example, the time between backups would be your recovery point objective (RPO).
- Which threats are most likely to impact your business. Although it's impossible to foresee every potential threat, you should evaluate what types of disruptions are most probable to occur. If your data center is located in a region that is prone to earthquakes, for example, then that is one type of disaster your plan must address. Likewise, if you operate in an industry that is especially prone to ransomware attacks, that type of disruption should feature prominently in your plan.
A DRP is useless if there are no personnel to carry it out. For that reason, your plan should identify which staff members — or outside contractors — are responsible for executing it. Businesses can establish a core disaster recovery team that will be on call when disruptions strike. It's important to designate backup personnel as well, in case the primary recovery team is not available.
The first part of your DRP should specify the initial steps that your recovery team will take when responding to different types of incidents. These may include procedures for containing the impact of a disaster by preventing it from affecting more systems, as well as assessing the damage and determining what will need to be recovered. Consider including multiple sets of initial steps, each tailored to different types of disasters.
The initial response steps should specify not only which technical processes to follow, but also how team members will communicate during the response. Will they use a certain communication tool? How will they keep track of the progress of tasks and let each other know when steps are completed? By answering these questions in your plan, your team members will be able to coordinate in the uncertain time after a disaster strikes.
The second major component of a DRP is the recovery strategy, which focuses on how systems and data will be restored to their normal operating state. As noted above, recovery strategies could entail rebuilding existing systems in the same data center where they originally existed, or creating alternatives in a different location, like the public cloud.
In many cases, recovery depends on working with outside service providers, such as cloud vendors or hardware companies, to obtain the resources necessary to rebuild systems. These parties should be identified in the DRP. The plan should also specify the process for ensuring that the failure triggered by the disaster does not recur soon after systems are restored.
Because the recovery process can be complicated and requires the coordination of a variety of different material and personnel resources, many companies adopt a disaster recovery as a service (DRaaS) solution. DRaaS outsources recovery to a vendor who specializes in containing and resolving IT disasters. With DRaaS, organizations do not need to worry about maintaining the in-house resources necessary to respond to a disaster quickly. Nor do they need to invest their time in staying up-to-date about the latest types of disaster threats and recovery strategies.
The final step in disaster recovery planning is to test your plan on a regular basis. Testing entails running through each of the actions specified in your plan for responding to different types of disasters and assessing whether your team has the resources and capabilities it needs to carry them out.
Disaster recovery testing should be performed systematically. Each aspect of your plan must be tested regularly and in full. Consider running tests without giving advance warning to your recovery team to assess their ability to respond with little notice as well. After each test, your team should generate a report that identifies any problems or unanticipated conditions that occurred during the recovery operations. Use these reports to hone your plan so that these issues can be overcome if a real disaster strikes.
If testing reveals that some parts of the DRP cannot be completed successfully and you lack the resources to remediate this problem, you should devise a failover or failback strategy that will allow the business to keep operating even if systems cannot be fully restored to their pre-disaster state. For example, if you determine that you cannot fix employee workstations quickly enough to restore operations within the desired timeframe, a failover strategy would be to have employees use their personal devices to access a parallel system on the public cloud until the original system can be restored.
Once you begin formulating a DRP — and especially when you start testing it — you may realize that you lack the resources and expertise for handling the many IT risks your business faces. It can be a challenge to accurately anticipate all of the disasters to which your business is prone, let alone implement a realistic and effective strategy for combating them in the stressful, confusing period after they occur.
If this is the case for your organization, consider partnering with a reliable, tested DRaaS provider like Contegix. Contegix has the expertise and resources necessary to make disaster recovery reliable and efficient. By taking advantage of Contegix’s experienced disaster recovery team, you can eliminate the costly and difficult prospect of managing disaster recovery in-house.
Contact Contegix to learn how we can help you maintain business continuity in the face of any type of disaster or disruption.