In my last blog series (part 1/ part 2), we disussed when to declare a disaster and the significance of IT Resilience. Now, let’s walk through what happens once you declare and it’s time to bring your business critical systems back online.
By this point, we have created our DR test plan. The plan contains details around communication, important log in and other systems information, details around how to delclare a disaster, and most importantly, it’s been proven through the DR exercise. Next, we will focus on the time immediately following the disaster declaration
During this time, it is important to open the lines of communication and dig in, making sure to capture anything that might differ from what is documented in your DR Runbook. One of the first things you need to do once you’ve determined what systems you’re going to bring online in the DR location, is delegating tasks. Some of this will likely be covered in your Runbook, but every disaster declaration is going to have its unique challenges, so don’t make any assumptions. Break into teams and start checking things off the list.
At this point, you’ve successfully brought the production systems online at the DR location and your company is once again, back in business. Congratulations! Now, get some rest and get ready to go back because most DR environments are not designed for long-term production use. You’re going to have to either go back to the original production location or, if that’s not possible, build a new one and then migrate to the new environment.
Here are some additional tips and tricks to make declaring a disaster less stressful:
- Utilize Helpful Tools
I mentioned earlier that documenting all changes in the DR location is critical, a tool I’ve used in the past is the Microsoft Steps Recorder. This is a pretty simple tool to use:
– Once you open the tool, click start
– Click run
– Type “psr”
– Click enter to start the recording
The tool will then take a screenshot every time you click your mouse. This report will come in handy when your teams are exhausted, and their memory gets a little fuzzy. Again, communication has to be a TOP priority, especially when you are working the DR Runbook. Be sure to schedule regular check-ins with the teams doing the work. It’s easy to get caught up in minor details, wasting countless hours. Someone has to maintain a clear head and help steer the team members in the most efficient manner.
- Update the Runbook
It’s critical that you make sure to update the DR Runbook throughout the year. Leadership changes, system upgrades, new business applications, all of these can have a significant impact on documentation. Waiting for your annual DR exercise to make these updates significantly reduces the value of your planning and will prolong your recovery time objective (RTO). My suggestion is to incorporate that step into your change management process, so that your teams are constantly reminded to consider the impact on DR.
- Don’t Forget Your Logins
Logins to service accounts or third-party providers/partners are also an easily overlooked piece of the DR Runbook. These need to be stored somewhere that can be accessed in the event of a disaster and continuously maintained for accuracy. Instructions for making DNS updates should also be part of the DR runbook. Who remembers that obscure password that’s set during the Active Directory installation process? Do you know yours? Maybe you joined the organization after the AD was set up and you have no idea what it is. The wise choice is to reset it now vs. in the midst of a disaster recovery exercise or real declaration. I always encourage teams to follow the “hit by a bus plan” and document everything, even the obvious things, because you never know what mindset you’re going to be in or if you’re even going to be able to be part of this recovery effort.
This should leave you feeling prepared to carry out your DR Plan seamlessly. As I’ve mentioned before, communication is the most important thing when it comes to following your DR plan. Be sure to stay in contact with your team and update everyone on the steps.
Having a managed services provider to help you with your disaster recovery is always recommended. Contegix has an experienced team that will assess, design, build, and manage a solution that brings you peace of mind and reliable protection. Contact us today.
Brian Frank is Product Delivery Director at Contegix, owning the vision, execution, and management of the product delivery strategy and roadmap. In this role, Brian works with all functional areas of the operations team to develop product releases.
Brian’s responsibilities also include product selection guidance, leading requirement gathering efforts with key stakeholders, taking part in product solution architecture, and successful delivery of early adopter solutions.