Tis the season for unprecedented snow fall! As children, we obsessively watched the weather forecast when there was the slightest mention of snow, hoping that 67% chance crystallized into a sure thing. Then the next morning, we'd get up earlier than our parents would normally have to drag us out of our beds to race to the window, hoping to see nothing but white.
As MSPs you don't often have snow days. The biggest snow storms actually wreak havoc for the IT environments you monitor and manage. Are you prepared when they do? If you don’t have an actionable plan ready, you might want to look into a strong disaster recovery plan template (see below). In a recent Channele2e post about business continuity, Joe Panettieri recaps the results of Jonas - the blizzard that rocked the east coast this past weekend - and highlights specific MSPs who proactively updated their disaster recovery (DR) plan in preparation. They, like many other MSPs and IT solutions providers, recognized room for improvement and increased responsiveness in the event of an emergency. Have you revisited and revised your own business continuity processes? Is your backup and disaster recovery (BDR) solution Jonas-tested and safe from any other disaster triggered by natural disaster, malicious attack or user error?
Let's see just how robust your disaster recovery plan really is...
1. Have you properly defined and documented what you consider a disaster scenario to be?
You can't know when and how to act when you don't know what to act upon. It may seem like common sense, but you can never be too thorough when disaster recovery planning.
2. Have you identified all of the risks that necessitate an IT disaster recovery plan?
Thinking back to Jonas, we know that by the time Joe had written his article, almost 72,000 customers had already lost power. In this scenario, the risk is systems going offline as a result of the power outage. A snow storm is just one case that tests how strong your IT disaster recovery plan is. Don't forget about the other threats to your clients' network uptime, such as a phishing email that's clicked and installs ransomware or hardware failure.
3. Have you conducted a Business Impact Analysis (BIA)?
With each of these risks, you want to know how likely they are to occur and what the impact would be if they actually did. Just as different geographic locations may be subject to different weather calamities, different IT environments may have varying levels of susceptibility to outside threats. If your client doesn't have a BDR solution in place, for instance, the impact of an unexpected power outage is critical - if their server and local backup fails, their data can't be recovered. If they do have a BDR solution in place, while they'll experience downtime with workstations being offline, there's not as big a risk of data loss. Once you've conducted the BIA for each risk, you can prioritize which vulnerabilities to address first.
4. Have you mapped out a backup and disaster recovery strategy, establishing a clear recovery time objective (RTO) and recovery point objective (RPO), for each of these risks?
As defined in Everything You Need to Know about Backup and Disaster Recovery (BDR), RTO is a benchmark indicating how quickly data must be recovered to ensure business continuity following a disaster or unplanned downtime. Also a critical metric, RPO is a benchmark indicating which data must be recovered in order for normal business operations to resume following a disaster or unplanned downtime. This is often based on file age (i.e. all data backed up before date X must be recovered), and in conjunction with RTO can help administrators determine how frequently backups should execute.
One BDR strategy effective at reducing risk and instrumental in backup and recovery planning is frequently testing backups. Don't assume that simply having a BDR solution is enough to weather the storm. We actually recommended running at least one successful site DR test for every client as one of the 5 Items to Cross Off Your 2016 BDR Planning Checklist. After you run the test, record the results in your DR plan, taking care to include "if workarounds were used to get any standby virtual machines working, and if there were any procedures that end-users may have to run on their machines to connect to the standby environment."
5. Have you assigned disaster recovery teams, leads and responsibilities?
Who will you need on your incident response team? Depending on the size of the operation, you may need to assemble the following:
- Disaster Management Team
- Facilities Team
- Network Team
- Server Team
- Applications Team
- Operations Team
- Management Team
- Communications Team
- Finance Team
Here are a few tasks different teams may be charged with:
Disaster Management Team:
- Determine the magnitude and class of the disaster
- Keep a record of money spent during the disaster recovery process
- Create a detailed report of all the steps undertaken in the disaster recovery process
- Determine which network services are not functioning at the primary facility
- If multiple network services are impacted, the team will prioritize the recovery of services in the manner and order that has the least business impact
- Install and implement any tools, hardware, software and systems required
- Ensure that secondary servers located offsite are kept up-to-date with system and application patches
- Ensure that secondary servers located offsite are kept-up-to-date with redundant data
Just the sheer size of this list demonstrates how many considerations there are in the event of an emergency. By helping clients manage each aspect of their IT disaster recovery plan, you can go from being an IT service provider to a trusted business partner and virtual CIO. They'll turn to you for guidance and support in these crises and by taking the time to develop a comprehensive DR plan, you'll prove yourself worthy of the challenge!
6. Have you established how to communicate a disaster to employees, clients, vendors and the media?
This falls under the Communications team, but I had to highlight it again because it's so important. Particularly with clients, do you know everything you have to inform them of? Will you send an email or get on the phone to tell them how the crisis will impact the services and service delivery you provide? You'll want to be transparent with customers if you think you'll be unable to meet your Service Level Agreement (SLA) with them.
For help mapping out your communication plan with media, definitely check out 8 Steps to Handling a PR Horror Story!
7. Have you included the contact information of everyone involved in your DR plan?
Consider adding a call tree with everyone's contact information so people know who they're supposed to contact or report to. Having all of this information organized in one place increases your ability to act fast in IT disasters. Rather than have to flip through pages to find Bill's phone number and then call up someone else to ask who the next person in the chain to call is if Bill doesn't answer, you can just follow the recovery planning call tree.
8. Does Your DR plan leave room to record the aftermath of the disaster and outline how to restore a client's site to be fully functional?
Take inventory of the various IT systems at your client's site. Then, you can check off which ones went offline so you have a better understanding of the scope of the emergency and how much downtime it may have caused. Additionally, as is stated in How Business Continuity Can Save SMBs from Severe Weather Disasters, the client's site may have suffered physical damage that would require them to replace hardware. You'll want to know the state of their IT infrastructure so you can make sure they have everything they need to stay up, running and profitable after you leave.
The old adage is true—failure to plan is a plan for failure. With it being the dead of winter, you should already be revisiting your disaster recovery plan, should another Jonas hit. If you've not yet already, go back and comb through your DR plans to ensure each of these concerns is addressed. That way, when you are staring down another IT disaster, you'll be well-equipped to resolve any issues that arise, deliver stellar service and maximize client uptime.
Suggested for you: