On July 22, 2019, Slack was in the middle of deploying an update to their desktop app. The update was supposed to decrease memory consumption and increase load time, but instead the company suffered a significant, widespread outage on a global scale. After approximately 40 minutes of downtime, the service was back up. But in the meantime, the company whose motto is ‘where work happens’ essentially stopped working. 

Thousands of users reported problems during the outage, with many taking to Twitter and other social media to gripe when a service they depend on so much came crashing down.

Slack’s Response: Was It Enough?

What happened in July was one of many outages for Slack. In fact, the company lost $8 million in revenue last quarter after failing to meet its uptime commitment. The platform was down for what accumulated to two hours over the course of 92 days.

One likely question Slack users will want to know is how long it takes the company to resolve issues when outages do occur. Slack employs a designated response team to investigate issues and work on preventative measures to avoid future outages. They also maintain a service status page to let users know the status of ongoing issues and downtime, as well as provide updates via social media.

But is it enough? Do their users expect more?

A Refund May Be in Order

A single, standalone outage probably won’t result in a significant business loss. But if it becomes a regular occurrence, it can progressively hurt your brand and cost you revenue over time. 

When outages occur, users may not be satisfied with a simple apology. They often look for refunds for any service downtime or performance issues that they feel affect can affect their business or productivity in a negative way. 

Slack realized that they had to do more. The company awarded $8.2 million worth of credits to users after it managed only 99.9% service uptime — short of its 99.99% commitment. 

Slack isn’t the only company that had to compensate customers when their system went down. Facebook recently had to consider refunding advertisers for lost revenue and exposure when Instagram went down. For nearly 24 hours, millions of users couldn’t access the social media platform and thousands took to Twitter to vent their frustration under the hashtag “#facebookdown”.

Google also found that it wasn’t immune to outages when in July 2018, the company’s YouTube TV service experienced an untimely outage throughout the U.S. during the FIFA World Cup Semifinal. The company immediately went into damage control, offering a free week of service to their 800,000 customers.

How to Prevent Incidents From Snowballing

Recently we discussed the many ways that downtime and service outages can hurt your business. When companies suffer from service downtime or other performance-related issues, their revenue, brand reputation and credibility all take a hit.

But while businesses are fully aware that downtime can cause revenue loss and other issues, they often fail to employ measures to effectively prevent it from happening again. 

By using an automated real-time analytics solution with anomaly detection, companies can gain better insights into what’s causing their availability and performance issues. Organizations can proactively manage outages and disruptions, so that they don’t escalate into the kind of incidents that make headlines. 

Written by Anodot

Anodot leads in Autonomous Business Monitoring, offering real-time incident detection and innovative cloud cost management solutions with a primary focus on partnerships and MSP collaboration. Our machine learning platform not only identifies business incidents promptly but also optimizes cloud resources, reducing waste. By reducing alert noise by up to 95 percent and slashing time to detection by as much as 80 percent, Anodot has helped customers recover millions in time and revenue.

You'll believe it when you see it