Earlier this week, Amazon Web Services’ S3 system suffered an outage for about four hours—leaving many websites and web-based services either partially or entirely inaccessible. S3, a cloud storage system (S3 stands for Simple Storage Service), experienced what Amazon referred to as “high error rates” that reportedly impacted websites like Slack, Trello, Imgur and many others.
While these large-scale downtime events don’t happen too often—before this week the most recent sizable S3 outage was in 2015—they do present an opportunity for managed service providers (MSPs) to ensure they’ve got the right systems and processes in place to properly support their clients and minimize end user impact. Here are three reminders MSPs can take from this week’s outage:
Downtime is Inevitable
The best laid plans of mice and men often go awry, and even the best systems and technologies will at some point experience downtime. Even S3, which is designed to deliver 99.999999999 percent durability and is relied on by more than 100,000 websites, can experience the occasional outage. The key here is simply to remember that downtime happens, and how you respond to it can be the difference between a happy client and one who’s ready to move on to another service provider.
Local Backups (and Restores) Are Still Extremely Important
The S3 outage has also reminded us that while large-scale cloud providers like Amazon don’t experience these sorts of issues very often, there is still a need to maintain local backups and avoid relying exclusively on a single cloud for storage, disaster recovery or business continuity efforts.
As the SMB workforce becomes increasingly connected and mobilized, MSPs are under pressure to ensure that end users have continuous access to data and applications. Solutions that combine a local, onsite appliance with cloud replication and redundancy can help ensure that if either a physical or web-based system fails, its counterpart can allow users to maintain full or partial continuity while an outside vendor like Amazon works to recover from an outage.
Communication and Transparency are Essential
Lastly, it’s worth noting that during an event like this, communication and transparency are extremely important—especially for MSPs. During the S3 outage, Amazon quickly took to social media to inform users that they had identified the problem and were actively working to resolve it, and also provided real-time updates and information via the AWS in-product dashboard.
Don’t leave your clients in the dark during any sort of outage or downtime event, and be clear as to exactly what you do and don’t have control over. At the end of the day, your clients just want their technology to work. It’s important to educate them as to why they are experiencing a particular problem, who or what is responsible for the issue, and what steps you’re taking to either resolve it or apply a quick fix while an outside vendor looks to resolve the underlying issue.
If you’ve got any other tips or thoughts around how this week’s S3 outage may have impacted MSPs, sound off in the comments below!
By Atanu Neogi
By Paula Griffin
By Lily Teplow